C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

For binary files use only read() and write()??

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
chhenning@gmail.com
Guest





PostPosted: Thu Nov 17, 2005 11:59 pm    Post subject: For binary files use only read() and write()?? Reply with quote



Hi there, I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode std::ios:binary,
only the read() and write() should be used? I tried the this:

std::ifstream oFile( "test.dat"
, std::ios::binary );

unsigned short nValue = 0;

oFile >> nValue;

if( !oFile )
return 1;


and it fails. Instead of operator>> I tried, this:

oFile.read( reinterpret_cast< char* >( &nValue )
, sizeof( unsigned short ) );


and now its working.

Thanks,
Christian


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Ulrich Eckhardt
Guest





PostPosted: Fri Nov 18, 2005 10:25 am    Post subject: Re: For binary files use only read() and write()?? Reply with quote



[email]chhenning (AT) gmail (DOT) com[/email] wrote:
Quote:
Hi there, I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode std::ios:binary,
only the read() and write() should be used?

I wouldn't even use a stream but the underlying streambuffer. This plugin is
responsible in character to byte conversion, buffering and doing the actual
IO. The point is that a stream bundles a streambuffer and formatting
information (locale) plus a few flags on its own. However, it still is
completely text-oriented i.e. nothing for unformatted IO.
Anyhow, you need to suppress character to byte conversion in both cases by
calling either iostream::imbue() or streambuf::pubimbue() with
std::locale::classic.

Quote:
I tried the this:

std::ifstream oFile( "test.dat"
, std::ios::binary );
[...]
oFile.read( reinterpret_cast< char* >( &nValue )
, sizeof( unsigned short ) );

and now its working.

Well, it's working until you try to read the file on a machine with a
different short representation - size, endianess etc. Stay with text based
IO or use a real serialization library (Boost comes to mind...).

Uli


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Fri Nov 18, 2005 1:25 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote



[email]chhenning (AT) gmail (DOT) com[/email] wrote:

Quote:
I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode
std::ios:binary, only the read() and write() should be used?

It depends on the format of the binary file. Normally, the
functions istream::read and ostream::write should never be used;
they're only useful if you have bits of preformatted data here
and there.

The flag ios::binary is sort of a misnommer. It has nothing to
do with the format of data in the file, but simply chooses one
of two different modes to access the file on disk. The only
real requirements is that you can read a non binary file as
lines of text, with each line separated by a single 'n'
character, and that that "transparency" (getting exactly the
same bytes back as what you wrote) is only guaranteed for binary
values that actually do represent text characters in the
execution character set.

In practice, you'll use binary anytime the data may be read or
written from another machine.

Quote:
I tried the this:

std::ifstream oFile( "test.dat"
, std::ios::binary );

unsigned short nValue = 0;

oFile >> nValue;

if( !oFile )
return 1;

and it fails.

It fails if the file doesn't contain what is expected, i.e. an
integral numerical value formatted as text in the execution
character set.

Quote:
Instead of operator>> I tried, this:

oFile.read( reinterpret_cast< char* >( &nValue )
, sizeof( unsigned short ) );

and now its working.

By chance. It's undefined behavior, unless the data was written
in exactly the same fashion, by the same program. In practice:

-- if the data was written by another program compiled with the
same version of the same compiler, using the same compiler
options, it will also work,

-- if the data was written by another program on the same
machine, there's a good chance (but not certainty) that it
will work, and

-- on a modern machine, you will get something. Not
necessarily the value that was written, but at least a
value that you can safely access without a core dump.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Mon Nov 21, 2005 10:39 am    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Ulrich Eckhardt wrote:
Quote:
chhenning (AT) gmail (DOT) com wrote:

I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode
std::ios:binary, only the read() and write() should be used?

I wouldn't even use a stream but the underlying streambuffer.

Sort of. I would still use basic_ios, I think, for the error
handling (although the locale handling might become a bother).

Stream buffers are only good for transporting the bytes, not for
formatting. So while I wouldn't use istream and ostream if the
format is a binary one, and possibly not even for certain
specialized text formats, I would write my own [io]berstream or
[io]xdrstream or whatever, to handle the formatting.

Quote:
This plugin is responsible in character to byte conversion,
buffering and doing the actual IO.

The problem, obviously, is that except in some cases of text
formats, you don't want the byte conversions.

Quote:
The point is that a stream bundles a streambuffer and
formatting information (locale) plus a few flags on its own.
However, it still is completely text-oriented i.e. nothing for
unformatted IO.

Strictly speaking, there is no such thing as "unformatted" IO.
Whatever you write has a format. Some format. If you don't
know the format, all that means is that at some point in the
future, you won't be able to read the data.

Quote:
Anyhow, you need to suppress character to byte conversion in
both cases by calling either iostream::imbue() or
streambuf::pubimbue() with std::locale::classic.

I tried the this:

std::ifstream oFile( "test.dat"
, std::ios::binary );
[...]
oFile.read( reinterpret_cast< char* >( &nValue )
, sizeof( unsigned short ) );

and now its working.

Well, it's working until you try to read the file on a machine
with a different short representation - size, endianess etc.

Or simply a compiler update which changes alignment requirements
(or even byte order -- it's happened before).

Or someone compiles with different compiler flags which modify
the packing.

Quote:
Stay with text based IO or use a real serialization library
(Boost comes to mind...).

I've not had the chance to use the Boost library, but something
is necessary. (At one point in the past, I think that Dietmar
Kuehl had some xdr streams available on the network.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Christopher Yeleighton
Guest





PostPosted: Fri Nov 25, 2005 6:21 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote


"Ulrich Eckhardt" <eckhardt (AT) satorlaser (DOT) com> wrote

Quote:
chhenning (AT) gmail (DOT) com wrote:
Hi there, I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode std::ios:binary,
only the read() and write() should be used?

I wouldn't even use a stream but the underlying streambuffer. This plugin
is
responsible in character to byte conversion, buffering and doing the
actual
IO. The point is that a stream bundles a streambuffer and formatting
information (locale) plus a few flags on its own. However, it still is
completely text-oriented i.e. nothing for unformatted IO.
Anyhow, you need to suppress character to byte conversion in both cases by
calling either iostream::imbue() or streambuf::pubimbue() with
std::locale::classic.


I feel that I should use basic_streambuf<unsigned char> for binary I/O. Raw
bytes are not characters.

Chris



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Carl Barron
Guest





PostPosted: Sat Nov 26, 2005 3:00 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Christopher Yeleighton <krixel (AT) qed (DOT) pl> wrote:

Quote:
"Ulrich Eckhardt" <eckhardt (AT) satorlaser (DOT) com> wrote in message
news:s6j053-jfq.ln1 (AT) satorlaser (DOT) homedns.org...
[email]chhenning (AT) gmail (DOT) com[/email] wrote:
Hi there, I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode std::ios:binary,
only the read() and write() should be used?

I wouldn't even use a stream but the underlying streambuffer. This plugin
is
responsible in character to byte conversion, buffering and doing the
actual
IO. The point is that a stream bundles a streambuffer and formatting
information (locale) plus a few flags on its own. However, it still is
completely text-oriented i.e. nothing for unformatted IO.
Anyhow, you need to suppress character to byte conversion in both cases by
calling either iostream::imbue() or streambuf::pubimbue() with
std::locale::classic.


I feel that I should use basic_streambuf<unsigned char> for binary I/O. Raw
bytes are not characters.

Chris


Well the standard does not require an implimentation of

char_traits<unsigned char>. also std::basic_streambuf<T,A> does no
transport! the defaulted virtual functions all report failure.
uflow defaults to use underflow but that reports failure:)

bwst approach is to define a class derived from std::streambuf and
provides the transport via some low level routines like 'unix low level'
read()/write() lseek(),etc...

it is possible to write a specialization of char_trauts<unsigned char>
it is not required in the standard.



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
James Kanze
Guest





PostPosted: Sat Nov 26, 2005 3:07 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Christopher Yeleighton wrote:
Quote:
"Ulrich Eckhardt" <eckhardt (AT) satorlaser (DOT) com> wrote in message
news:s6j053-jfq.ln1 (AT) satorlaser (DOT) homedns.org...

[email]chhenning (AT) gmail (DOT) com[/email] wrote:

Hi there, I have a question regarding accessing binary files. Is it
right to say that for binary files, using the mode std::ios:binary,
only the read() and write() should be used?

I wouldn't even use a stream but the underlying streambuffer. This
plugin is responsible in character to byte conversion, buffering and
doing the actual IO. The point is that a stream bundles a streambuffer
and formatting information (locale) plus a few flags on its own.
However, it still is completely text-oriented i.e. nothing for
unformatted IO. Anyhow, you need to suppress character to byte
conversion in both cases by calling either iostream::imbue() or
streambuf::pubimbue() with std::locale::classic.

I feel that I should use basic_streambuf<unsigned char> for binary
I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The template
basic_streambuf takes two parameters. The second defaults to
char_traits<CharT>. There's no guarantee that that exists in your
implementation, there's no guarantee concerning the semantics if it
does, and there's no way you can implement one yourself. Basically, to
be portable, you have to implement a character traits class which isn't
an instantiation of std::char_traits, and specify it explicitly as
second parameter.

You'll also have to provide any classes from the locale that the
streambuf uses -- for a filebuf, I think this would mean a
codecvt<unsigned char>.

--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Krzysztof Żelechowski
Guest





PostPosted: Sun Nov 27, 2005 5:50 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote


Użytkownik "James Kanze" <kanze (AT) none (DOT) news.free.fr> napisał w wiadomości
news:4387737f$0$5991$636a15ce (AT) news (DOT) free.fr...
Quote:
Christopher Yeleighton wrote:
I feel that I should use basic_streambuf<unsigned char> for binary
I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The template
basic_streambuf takes two parameters. The second defaults to
char_traits<CharT>. There's no guarantee that that exists in your
implementation, there's no guarantee concerning the semantics if it
does, and there's no way you can implement one yourself. Basically, to
be portable, you have to implement a character traits class which isn't
an instantiation of std::char_traits, and specify it explicitly as
second parameter.

You'll also have to provide any classes from the locale that the
streambuf uses -- for a filebuf, I think this would mean a
codecvt<unsigned char>.

I use Microsoft Visual C++ 8. It uses Dincumware STL. It provides generic
generic implementations for char_traits &al. and these classes behave
reasonably for unsigned char so I did not have to implement it myself. But
I think I could easily do it if I needed so I do not quite understand your
concern. Only codecvt<unsigned char> is crippled and I have to use
codecvt<char> instead.

I thought character type-related classes should have generic implementations
but perhaps I am wrong and Mr Plauger is just generous. Not all of those
generic implementations perform correctly - codecvt and time_get are
exceptions here - but I found out that most of them do unless an enumerated
type argument is used. But enumerated types are unpredictable so it is
better not to use them to store persistent information anyway. MicrosoftVC++8
allows the programmer to specify the underlying type of an enumerating
type - an interesting extension that eliminates this drawback; I would
welcome this concept in the Standard.

Chris




[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
James Kanze
Guest





PostPosted: Mon Nov 28, 2005 8:16 am    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Krzysztof Żelechowski wrote:
Quote:
Użytkownik "James Kanze" <kanze (AT) none (DOT) news.free.fr> napisał w
wiadomości news:4387737f$0$5991$636a15ce (AT) news (DOT) free.fr...

Christopher Yeleighton wrote:

I feel that I should use basic_streambuf<unsigned char> for
binary I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The
template basic_streambuf takes two parameters. The second
defaults to char_traits<CharT>. There's no guarantee that
that exists in your implementation, there's no guarantee
concerning the semantics if it does, and there's no way you
can implement one yourself. Basically, to be portable, you
have to implement a character traits class which isn't an
instantiation of std::char_traits, and specify it explicitly
as second parameter.

You'll also have to provide any classes from the locale that the
streambuf uses -- for a filebuf, I think this would mean a
codecvt<unsigned char>.

I use Microsoft Visual C++ 8. It uses Dincumware STL. It
provides generic generic implementations for char_traits &al.
and these classes behave reasonably for unsigned char so I did
not have to implement it myself.

The standard says nothing about the behavior of any char_traits
except the instantiations for char and wchar_t. I know that
both Dinkumware and g++ do provide generic versions. I also
know, from various messages in different newsgroups, that the
two versions are not identical; that code which works with the
g++ verion may not work with the Dinkumware version.

In theory, it shouldn't be that difficult to implement a
char_traits class yourself, but it does require a lot of typing,
and there are a lot of little details which you can screw up on.
And what does it buy you in the end? What can a
basic_streambuf instantiated over unsigned char and your private
traits do that a basic_streambuf<char> doesn't?

Quote:
But I think I could easily do it if I needed so I do not
quite understand your concern. Only codecvt<unsigned char> is
crippled and I have to use codecvt<char> instead.

I'm not sure I understand this. The basic_filebuf<unsigned
char> should request a codecvt<unsigned char>. And that's
library code, which you cannot modify.

Quote:
I thought character type-related classes should have generic
implementations but perhaps I am wrong and Mr Plauger is just
generous.

What should the generic implementation do? How, for example, do
you define int_type generically for a char_traits<T>? The
standard doesn't say, and it doesn't require a generic
implementation. I think most implementations do provide one,
but it's almost certain that the implementation will fail for
certain user defined types. For that matter, it's probably
non-trivial to define a generic implementation which works for
both signed and unsigned types.

Quote:
Not all of those generic implementations perform correctly -

In general, how can you say whether a generic implementation
performs correctly or not, since there is no definition of what
it should do? I imagine that both Plauger's and the g++
impleemntation "perform correctly" in the sense that they do
what their authors thought would be most appropriate. That
doesn't mean that they do the same thing in every case, however.

--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Christopher Yeleighton
Guest





PostPosted: Mon Nov 28, 2005 9:08 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote


"James Kanze" <kanze (AT) none (DOT) news.free.fr> wrote

Quote:
Krzysztof Żelechowski wrote:
Użytkownik "James Kanze" <kanze (AT) none (DOT) news.free.fr> napisał w
wiadomości news:4387737f$0$5991$636a15ce (AT) news (DOT) free.fr...

Christopher Yeleighton wrote:

I feel that I should use basic_streambuf<unsigned char> for
binary I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The
template basic_streambuf takes two parameters. The second
defaults to char_traits<CharT>. There's no guarantee that
that exists in your implementation, there's no guarantee
concerning the semantics if it does, and there's no way you
can implement one yourself. Basically, to be portable, you
have to implement a character traits class which isn't an
instantiation of std::char_traits, and specify it explicitly
as second parameter.

You'll also have to provide any classes from the locale that the
streambuf uses -- for a filebuf, I think this would mean a
codecvt<unsigned char>.

I use Microsoft Visual C++ 8. It uses Dincumware STL. It
provides generic generic implementations for char_traits &al.
and these classes behave reasonably for unsigned char so I did
not have to implement it myself.

The standard says nothing about the behavior of any char_traits
except the instantiations for char and wchar_t. I know that
both Dinkumware and g++ do provide generic versions. I also
know, from various messages in different newsgroups, that the
two versions are not identical; that code which works with the
g++ verion may not work with the Dinkumware version.

In theory, it shouldn't be that difficult to implement a
char_traits class yourself, but it does require a lot of typing,
and there are a lot of little details which you can screw up on.
And what does it buy you in the end? What can a
basic_streambuf instantiated over unsigned char and your private
traits do that a basic_streambuf<char> doesn't?

unsigned char rather a number than a character. I am using unsigned char to
stress that the underlying data is an octet stream. Compare multibyte
character string library functions like mbscpy: they take unsigned char as
well. And I do not use my private traits, I use the generic one.

Quote:

But I think I could easily do it if I needed so I do not
quite understand your concern. Only codecvt<unsigned char> is
crippled and I have to use codecvt<char> instead.

I'm not sure I understand this. The basic_filebuf<unsigned
char> should request a codecvt<unsigned char>. And that's
library code, which you cannot modify.


I derive a buffer from basic_streambuf<wchar_t>. It is a parasite over
basic_filebuf<unsigned char>. codecvt<unsigned char, unsigned char>
converts nothing and neither does codecvt<char, char> so there is no problem
about that. My buffer, apart from doing other things, converts bytes that
are reinterpreted as characters to wide characters. It does it in a similar
way as basic_filebuf<wchar_t> does. My first intuition was to use the facet
codecvt<unsigned char, wchar_t> for that purpose; but my intuition was wrong
because the octets should have been reinterpreted as multibyte characters
first and the standard type for a multibyte character is char, contrary to
the C convention regarding mbscpy &al, which is sounder wrt DBCS IMHO. I
got the result that codecvt<unsigned char, wchar_t> does not work as a side
effect.

Quote:
I thought character type-related classes should have generic
implementations but perhaps I am wrong and Mr Plauger is just
generous.

What should the generic implementation do? How, for example, do
you define int_type generically for a char_traits<T>? The

typedef int int_type;

Quote:
standard doesn't say, and it doesn't require a generic
implementation. I think most implementations do provide one,
but it's almost certain that the implementation will fail for
certain user defined types. For that matter, it's probably
non-trivial to define a generic implementation which works for
both signed and unsigned types.

Why? I thought int_type = char_type + EOF. If EOF is in the character set,
int_type can be equal to char_type; otherwise it must be a signed type
embracing the range of char_type. NB char_traits<char>::to_int_type(c)
returns (unsigned char) c. In general, it can return to_unsigned(c), where
to_unsigned(T t) returns unsigned_variant<T>(t) and unsigned_variant is a
template wrapper class intantiated for all basic integral types. NB
Dinkumware fails to do it correctly: signed char -1 in the underlying stream
is returned as -1 and char_traits<signed char>::eof() is also -1. But the
problem does not appear with unsigned types so we are on the safe side here.

General note to implementors of generic types: if you are not sure you do it
right, do not do it at all. Your explanations that the standard does not
require this or that are unconvincing; common sense should be obeyed where
the standard is silent. It is better to signal your inability to do that at
compile time than at run time because the error may remain there hidden for
a long time before it is triggered and nobody knows what has failed. And it
opens the way for your end user to provide his own implementation if she
feels apt to do it.

Quote:

Not all of those generic implementations perform correctly -

In general, how can you say whether a generic implementation
performs correctly or not, since there is no definition of what
it should do? I imagine that both Plauger's and the g++
impleemntation "perform correctly" in the sense that they do
what their authors thought would be most appropriate. That
doesn't mean that they do the same thing in every case, however.

As I have said, common sense should be used. Common sense involves the
usual mental activities of induction, abstraction and extrapolation

The following example can be used to demonstrate my point: time_put<wchar_t,
ostreambuf_iterator outputs "2005-11-28", whereas
time_put<wchar_t, wchar_t *> extracted FROM THE SAME LOCALE outputs
"11/28/2005". This is most unexpected and is a BUG even if the standard
does not say so. On the other hand, it would not be a bug if class
time_put<wchar_t, wchar_t *> were UNDEFINED or an instance thereof NOT
PRESENT in the locale.

Chris



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Tue Nov 29, 2005 2:00 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Christopher Yeleighton wrote:
Quote:
"James Kanze" <kanze (AT) none (DOT) news.free.fr> wrote in message
news:438a2ca5$0$27892$636a15ce (AT) news (DOT) free.fr...
Krzysztof Żelechowski wrote:
Użytkownik "James Kanze" <kanze (AT) none (DOT) news.free.fr> napisał
w wiadomości news:4387737f$0$5991$636a15ce (AT) news (DOT) free.fr...

Christopher Yeleighton wrote:

I feel that I should use basic_streambuf<unsigned char
for binary I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The
template basic_streambuf takes two parameters. The second
defaults to char_traits that exists in your implementation, there's no guarantee
concerning the semantics if it does, and there's no way you
can implement one yourself. Basically, to be portable, you
have to implement a character traits class which isn't an
instantiation of std::char_traits, and specify it
explicitly as second parameter.

You'll also have to provide any classes from the locale
that the streambuf uses -- for a filebuf, I think this
would mean a codecvt<unsigned char>.

I use Microsoft Visual C++ 8. It uses Dincumware STL. It
provides generic generic implementations for char_traits
&al. and these classes behave reasonably for unsigned char
so I did not have to implement it myself.

The standard says nothing about the behavior of any
char_traits except the instantiations for char and wchar_t.
I know that both Dinkumware and g++ do provide generic
versions. I also know, from various messages in different
newsgroups, that the two versions are not identical; that
code which works with the g++ verion may not work with the
Dinkumware version.

In theory, it shouldn't be that difficult to implement a
char_traits class yourself, but it does require a lot of
typing, and there are a lot of little details which you can
screw up on. And what does it buy you in the end? What can
a basic_streambuf instantiated over unsigned char and your
private traits do that a basic_streambuf<char> doesn't?

unsigned char rather a number than a character.

I didn't ask about conceptual issues. I too prefer unsigned
char when dealing with raw memory. But pragmatically, what do
you expect basic_streambuf<unsigned char> to do that
basic_streambuf<char> doesn't. It expresses your intention
better, perhaps, but how much work is that worth; a simple
comment can also be used to express intention.

Quote:
I am using unsigned char to stress that the underlying data is
an octet stream.

Just a nit, but unsigned char doesn't express that at all. It
says "byte stream", and bytes aren't octets. For octets, use
uint8_t (which will almost certainly be a typedef to unsigned
char -- but it expresses the intent more clearly).

Quote:
Compare multibyte character string library functions like
mbscpy: they take unsigned char as well. And I do not use my
private traits, I use the generic one.

Which means that your code isn't conform. It may not compile
with another compiler, and if it compiles, it may behave
differently (which is IMHO worse).

Depending on circumstances, this may be acceptable. It's up to
you to judge. But you should at least be aware of the
compromise you are making.

Quote:
But I think I could easily do it if I needed so I do not
quite understand your concern. Only codecvt<unsigned char
is crippled and I have to use codecvt
I'm not sure I understand this. The basic_filebuf<unsigned
char> should request a codecvt<unsigned char>. And that's
library code, which you cannot modify.

I derive a buffer from basic_streambuf<wchar_t>. It is a
parasite over basic_filebuf<unsigned char>.

Do you mean that it uses basic_filebuf<unsigned char>?

Quote:
codecvt<unsigned char, unsigned char

Doesn't exist. See table 51 in §22.1.1.1.1. (An
implementation is, of course, free to provide it. Note that in
this case, I don't think that provided a generic codecvt would
work, because you not only need the class, it must also be
registered or known to locale in some way.)

Quote:
converts nothing and neither does codecvt is no problem about that.

The behavior of codecvt<char,char,mbstate_t> is defined in the
standard. If codecvt<unsigned char, unsigned char, mbstate_t>
is present, it's behavior is implementation defined (although I
can hardly imagine it doing anything be a degenerate conversion
either). Generally speaking, it is *not* guaranteed to be
present, and it is not guaranteed to be part of the locale even
if it is present. (I can easily imagine an implementation
providing a generic codecvt which implements a generic
conversion. That won't make it part of the locale.)

Quote:
My buffer, apart from doing other things, converts bytes that
are reinterpreted as characters to wide characters.

Isn't that exactly what wfilebuf does?

Quote:
It does it in a similar way as basic_filebuf<wchar_t> does.
My first intuition was to use the facet codecvt<unsigned char,
wchar_t> for that purpose;

Which, of course, doesn't exist either. The standard requires
no facets for unsigned char what so ever.

Quote:
but my intuition was wrong because the octets should have been
reinterpreted as multibyte characters first and the standard
type for a multibyte character is char, contrary to the C
convention regarding mbscpy &al, which is sounder wrt DBCS
IMHO. I got the result that codecvt<unsigned char, wchar_t
does not work as a side effect.

The C convention and the C++ convention apparently differ. In
C++, all input and output is through char's. Unsigned char's
don't enter into the equation. Conceptually, I'm not really
happy with this situation either, but that's the way it is.

Note that C++ also gives some guarantees concerning char that
aren't present in C, and which make it usable in this case.
Still conceptually wrong, perhaps, but usable in any case.

The situation is simple: either you do a lot of extra work, or
you depend on non-portable features of your implementation, in
order to be conceptually correct, but for no functional
advantage. It's your decision (if for other reasons your code
will never be compiled with another compiler, why not?), but do
be aware of the consequences when making it.

Quote:
I thought character type-related classes should have
generic implementations but perhaps I am wrong and Mr
Plauger is just generous.

What should the generic implementation do? How, for
example, do you define int_type generically for a
char_traits
typedef int int_type;

Not very generic, IMHO. What about char_traits<long>?

Plauger said it in another posting. His implementation provides
a generic implementation which is useful in some cases. It's
not truly "generic", in the sense that it will work with any
legal character type; such a thing would be impossible, I think.

As it happens, I'm a little suspicious about this type even for
unsigned char -- mixing signed and unsigned arithmetic often
leads to unexpected results. Historically, EOF has usually been
-1 (and must be negative), so practically, I think an unsigned
type would not work well here. But I wouldn't expect all of the
functions to work correctly for unsigned types unless there was
a specialization.

Quote:
standard doesn't say, and it doesn't require a generic
implementation. I think most implementations do provide
one, but it's almost certain that the implementation will
fail for certain user defined types. For that matter, it's
probably non-trivial to define a generic implementation
which works for both signed and unsigned types.

Why? I thought int_type = char_type + EOF.

There's actually some ambiguity here. int_type must be able to
hold all of the character values, plus a sentinal value for end
of file. Do we interpret "all of the character values" as all
of the values of char_type, or something else?

Pragmatically, we interpret it as "all of the legal character
values", simply because on a lot of 32 bit systems, wchar_t is a
32 bit type, and there are no built-in integral types wider than
32 bits.

Quote:
If EOF is in the character set,

The implementation isn't legal. EOF must be distinguishable
from a character.

Quote:
int_type can be equal to char_type;

Technically, I think that int_type can be even smaller than
char_type, although it would be ridiculous to do so.

Quote:
otherwise it must be a signed type embracing the range of
char_type.

There's no requirement that it be signed, and there's no
requirement that it embrace the range of char_type. The
requirement (from §21.1.2/2) is "For a certain container type
char_type, a related container type INT_T shall be a type or
class which can represent all of the valid characters converted
from the corresponding char_type values, as well as an
end-of-file value,m eof()." Note "all of the *valid*
characters", and that the end of file value is "as well as" --
it may not be confounded with a valid character.

My guess is that the rationale here is that wchar_t can be the
largest integral type, and that an implementation can still use
a basic integral type for int_type, rather than a class type.

Regretfully, there is a slight problem: the meaning of "all of
the valid characters" depends on the character set encoding, and
not just the types. For pure US ASCII, for example, the valid
encodings are in the range 0...127, and char is a legal type for
int_type. In practice, it is mainly a theorectical problem;
using a 32 bit signed integral type is good for all known
character sets, and doubtlessly all yet to come.

Quote:
NB char_traits<char>::to_int_type(c) returns (unsigned char)
c. In general, it can return to_unsigned(c), where
to_unsigned(T t) returns unsigned_variant<T>(t) and
unsigned_variant is a template wrapper class intantiated for
all basic integral types.

That's one possible solution. It's certainly not required by
the standard, and it is easy to construe cases where it doesn't
work. (I once instantated basic_string over double. Just out
of curiousity, of course, but since it *is* a template...Smile.
And when I was experimenting with Unicode, I used a user defined
type UnicodeCharacter. What should unsigned_variant<T> return
here.)

Quote:
NB Dinkumware fails to do it correctly:

Unless the Dinkumware documentation makes some explicit
guarantees concerning this, there is no issue of "correctly"
here. They provide something in addition to the standard.
Either they document it, and state for which types it is valid,
and what the behavior is in such cases, or they don't document
it, and it is an implementation detail of their implementation,
and not to be used by you.

Quote:
signed char -1 in the underlying stream is returned as -1 and
char_traits<signed char>::eof() is also -1. But the problem
does not appear with unsigned types so we are on the safe side
here.

General note to implementors of generic types: if you are not
sure you do it right, do not do it at all.

For what definition of right? There isn't one, as far as I can
see. And there is no possible generic implementation which will
"work" for all types. Remember, the only requirement on the
character type is that it be a POD.

What an implementation should probably do is treat the issue as
implementation defined behavior, rather than undefined behavior.
That is: carefully document whatever they do do. And in the
absense of documentation, the only safe assumption on the part
of the programmer is that the type doesn't exist.

In the case of unsigned char, this means defining yourself an
appropriate char_traits and codecvt, instantiated explicitly
over your char_traits class (which cannot be a specialization of
std::char_traits), and inserting your codecvt explicitly into
the locale (and I'll admit that I haven't the foggiest idea how
to do this so that it is guaranteed by the standard to work).

Quote:
Your explanations that the standard does not require this or
that are unconvincing;

There's no issue of "convincing". The standard says explicitly
that it isn't required, and it is fairly simple to show that a
truly generic implementation, which works for all legal types of
char_type, isn't possible.

Quote:
common sense should be obeyed where the standard is silent.

But the standard isn't silent here -- and common sense says that
a generic implementation valid for all types isn't possible.

Quote:
It is better to signal your inability to do that at compile
time than at run time because the error may remain there
hidden for a long time before it is triggered and nobody knows
what has failed.

I sort of agree. The fact remains that most implementations do
provide a generic implementation, which works in a few special
cases. The fact also remains that most implementations don't
document this, so you can only guess what those special cases
are.

Quote:
And it opens the way for your end user to provide his own
implementation if she feels apt to do it.

All of the implementations do this; the standard requires it.
You can't call it std::char_traits<unsigned char>, because the
standard forbids it. But you can provide a
MyNamespace::char_traits<unsigned char>, or simply a
UCharTraits. And you can instantiate string, basic_filebuf, or
whatever, with it.

What the standard doesn't provide is a means of supplying
std::codecvt<unsigned char, char, mstate_t>. But what
basic_filebuf uses is std::codecvt<charT, char,
traits::state_type>. If traits::state_type (or charT, for that
matter) is a user defined type, you do have a right to provide a
specialization, even if the type is in std::. And since you
provide the traits, you decide what traits::state_type is.

It can be done. IMHO, it is a lot of work, just for a
principle, and in the end, basic_filebuf<char> will work just as
well.

Quote:
Not all of those generic implementations perform correctly
-

In general, how can you say whether a generic implementation
performs correctly or not, since there is no definition of
what it should do? I imagine that both Plauger's and the
g++ impleemntation "perform correctly" in the sense that
they do what their authors thought would be most
appropriate. That doesn't mean that they do the same thing
in every case, however.

As I have said, common sense should be used. Common sense
involves the usual mental activities of induction, abstraction
and extrapolation

What does common sense say about
char_traits<UnicodeCharacter>::int_type? Where UnicodeCharacter
is a user defined type? And whose common sense should be used:
Plauger has a lot of common sense. So do the authors of the g++
library. Their "common sense" led to different solutions.

Quote:
The following example can be used to demonstrate my point:
time_put<wchar_t, ostreambuf_iterator outputs
"2005-11-28", whereas time_put<wchar_t, wchar_t *> extracted
FROM THE SAME LOCALE outputs "11/28/2005". This is most
unexpected and is a BUG even if the standard does not say so.
On the other hand, it would not be a bug if class
time_put<wchar_t, wchar_t *> were UNDEFINED or an instance
thereof NOT PRESENT in the locale.

I'm not sure I understand your point at all. Obviously,
time_put<wchar_t, wchar_t*> isn't required to be present. What
does the implementation specific documentation says that it
does?

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Christopher Yeleighton
Guest





PostPosted: Wed Nov 30, 2005 1:51 am    Post subject: Re: For binary files use only read() and write()?? Reply with quote


"kanze" <kanze (AT) gabi-soft (DOT) fr> wrote

Quote:
Christopher Yeleighton wrote:
"James Kanze" <kanze (AT) none (DOT) news.free.fr> wrote in message
news:438a2ca5$0$27892$636a15ce (AT) news (DOT) free.fr...
Krzysztof Żelechowski wrote:
Użytkownik "James Kanze" <kanze (AT) none (DOT) news.free.fr> napisał
w wiadomości news:4387737f$0$5991$636a15ce (AT) news (DOT) free.fr...

Christopher Yeleighton wrote:

I feel that I should use basic_streambuf<unsigned char
for binary I/O. Raw bytes are not characters.

Well, it might work on your machine. Or it might not. The
template basic_streambuf takes two parameters. The second
defaults to char_traits that exists in your implementation, there's no guarantee
concerning the semantics if it does, and there's no way you
can implement one yourself. Basically, to be portable, you
have to implement a character traits class which isn't an
instantiation of std::char_traits, and specify it
explicitly as second parameter.

You'll also have to provide any classes from the locale
that the streambuf uses -- for a filebuf, I think this
would mean a codecvt<unsigned char>.

I use Microsoft Visual C++ 8. It uses Dincumware STL. It
provides generic generic implementations for char_traits
&al. and these classes behave reasonably for unsigned char
so I did not have to implement it myself.

The standard says nothing about the behavior of any
char_traits except the instantiations for char and wchar_t.
I know that both Dinkumware and g++ do provide generic
versions. I also know, from various messages in different
newsgroups, that the two versions are not identical; that
code which works with the g++ verion may not work with the
Dinkumware version.

And this would be the right place to redirect me and the discussion to the
"stringbuf question" thread, especially because it is a recent thread, than
to make me google the content you provided.

Quote:

[cut]


Quote:
But I think I could easily do it if I needed so I do not
quite understand your concern. Only codecvt<unsigned char
is crippled and I have to use codecvt
I'm not sure I understand this. The basic_filebuf<unsigned
char> should request a codecvt<unsigned char>. And that's
library code, which you cannot modify.

I derive a buffer from basic_streambuf<wchar_t>. It is a
parasite over basic_filebuf<unsigned char>.

Do you mean that it uses basic_filebuf<unsigned char>?

Yes, as a member variable.

Quote:
codecvt<unsigned char, unsigned char

Doesn't exist. See table 51 in §22.1.1.1.1. (An
implementation is, of course, free to provide it. Note that in
this case, I don't think that provided a generic codecvt would
work, because you not only need the class, it must also be
registered or known to locale in some way.)

It seems that Mr. Plauger's implementation does not need it because the
buffer works and it does not throw.

Quote:
converts nothing and neither does codecvt is no problem about that.

The behavior of codecvt<char,char,mbstate_t> is defined in the
standard. If codecvt<unsigned char, unsigned char, mbstate_t
is present, it's behavior is implementation defined (although I
can hardly imagine it doing anything be a degenerate conversion
either). Generally speaking, it is *not* guaranteed to be

FYI: This an example of a common sense reasoning. There is nothing wrong
about that.

Quote:
present, and it is not guaranteed to be part of the locale even
if it is present. (I can easily imagine an implementation
providing a generic codecvt which implements a generic
conversion. That won't make it part of the locale.)

My buffer, apart from doing other things, converts bytes that
are reinterpreted as characters to wide characters.

Isn't that exactly what wfilebuf does?

I said: apart from doing other things.

[cut]

Quote:
The situation is simple: either you do a lot of extra work, or
you depend on non-portable features of your implementation, in
order to be conceptually correct, but for no functional
advantage. It's your decision (if for other reasons your code
will never be compiled with another compiler, why not?), but do
be aware of the consequences when making it.

Thanks, so I shall.

Quote:
I thought character type-related classes should have
generic implementations but perhaps I am wrong and Mr
Plauger is just generous.

What should the generic implementation do? How, for
example, do you define int_type generically for a
char_traits
typedef int int_type;

Not very generic, IMHO. What about char_traits<long>?

Explicit specialization is needed, or change the generic type to long, or
assume that long(int(x)) == x and -01L is an invalid character.

Quote:
Plauger said it in another posting. His implementation provides
a generic implementation which is useful in some cases. It's
not truly "generic", in the sense that it will work with any
legal character type; such a thing would be impossible, I think.

Please be gentle and say: 'in another posting with the subject "stringbuf
question"'. It would help a lot. My Google search failed here.

Mr. Plauger's position is very impertinent: "I am free to do anything beyond
the realm of what the standard prescribes". The problem is that I am not
free to respond to such an attitude by showing Mr. Plauger's library its
place in my trash can, which would be the obvious thing to do, so I have to
argue with him. I have started addressing Microsoft directly about the
problems in question; I hope they would manage to moderate his liberties.

[cut]
Quote:
standard doesn't say, and it doesn't require a generic
implementation. I think most implementations do provide
one, but it's almost certain that the implementation will
fail for certain user defined types. For that matter, it's
probably non-trivial to define a generic implementation
which works for both signed and unsigned types.

Why? I thought int_type = char_type + EOF.

[cut]

Quote:
If EOF is in the character set,

The implementation isn't legal. EOF must be distinguishable
from a character.

I meant "A value of character type equal to EOF exists and does not
constitute a valid character". It was a mental shortcut.

Quote:

int_type can be equal to char_type;

Technically, I think that int_type can be even smaller than
char_type, although it would be ridiculous to do so.

otherwise it must be a signed type embracing the range of
char_type.

There's no requirement that it be signed, and there's no
requirement that it embrace the range of char_type. The

But it is a reasonable idea. Not that you must; it is a subjunctive
necessity, not a deontic one.

Quote:
Regretfully, there is a slight problem: the meaning of "all of
the valid characters" depends on the character set encoding, and
not just the types. For pure US ASCII, for example, the valid
encodings are in the range 0...127, and char is a legal type for
int_type. In practice, it is mainly a theorectical problem;
using a 32 bit signed integral type is good for all known
character sets, and doubtlessly all yet to come.

NB char_traits<char>::to_int_type(c) returns (unsigned char)
c. In general, it can return to_unsigned(c), where
to_unsigned(T t) returns unsigned_variant<T>(t) and
unsigned_variant is a template wrapper class intantiated for
all basic integral types.

That's one possible solution. It's certainly not required by
the standard, and it is easy to construe cases where it doesn't
work. (I once instantated basic_string over double. Just out
of curiousity, of course, but since it *is* a template...Smile.
And when I was experimenting with Unicode, I used a user defined
type UnicodeCharacter. What should unsigned_variant<T> return
here.)

A user-defined type UnsignedUnicodeCharacter, of course Smile
Talking more seriously, the only reasonable choice for generic
unsigned_variant<T>::unsigned_type is T. If your UnicodeCharacter cannot be
equal to -1, it will not do any harm. If it can, it will not work. Mr.
Plauger's implementation does not support class types as character types
anyhow, I have tried it already for my unicode_t; there are two
contradictory requirements: the type must be a member of a union and must be
constructible from an int_type. I do not complain about this as long as
long as my code does not compile.

Quote:

NB Dinkumware fails to do it correctly:

Unless the Dinkumware documentation makes some explicit
guarantees concerning this, there is no issue of "correctly"
here. They provide something in addition to the standard.
Either they document it, and state for which types it is valid,
and what the behavior is in such cases, or they don't document
it, and it is an implementation detail of their implementation,
and not to be used by you.

The standard does give some guarantees. It does not guarantee that an
implementation is provided, but if it is, it must conform. It is Mr.
Barbati's position - look for arguments there.

Quote:

signed char -1 in the underlying stream is returned as -1 and
char_traits<signed char>::eof() is also -1. But the problem
does not appear with unsigned types so we are on the safe side
here.

General note to implementors of generic types: if you are not
sure you do it right, do not do it at all.

For what definition of right? There isn't one, as far as I can
see. And there is no possible generic implementation which will
"work" for all types. Remember, the only requirement on the
character type is that it be a POD.

Mr. Plauger's implementation does not work for PODs in that the compilation
fails; it is perfectly acceptable. But I cannot accept the situation when
the compilation succeeds and the execution fails.

Quote:
What an implementation should probably do is treat the issue as
implementation defined behavior, rather than undefined behavior.
That is: carefully document whatever they do do. And in the
absense of documentation, the only safe assumption on the part
of the programmer is that the type doesn't exist.

The headers should be written in such a way that when the compiler
encounters a construct that causes undefined behaviour it should fail.
Whereas it certainly cannot detect dereferencing a null pointer, it surely
can detect using an unsupported type as a template parameter. The compiler
should help the programmer, it should not conspire against him.

Quote:
In the case of unsigned char, this means defining yourself an
appropriate char_traits and codecvt, instantiated explicitly
over your char_traits class (which cannot be a specialization of
std::char_traits), and inserting your codecvt explicitly into
the locale (and I'll admit that I haven't the foggiest idea how
to do this so that it is guaranteed by the standard to work).

The compiler should notify that such a definition is necessary. An
unintelligible error message from inside the bowels of template
implementation would be quite acceptable.

How about std::locale(std::locale(), new std::codecvt<anything>)?

Quote:
Your explanations that the standard does not require this or
that are unconvincing;

There's no issue of "convincing". The standard says explicitly
that it isn't required, and it is fairly simple to show that a
truly generic implementation, which works for all legal types of
char_type, isn't possible.

common sense should be obeyed where the standard is silent.

But the standard isn't silent here -- and common sense says that
a generic implementation valid for all types isn't possible.

The standard does not say anything about the availability of a generic
implementation. It does not say it is unavailable. It says something
about the explicit instantiations provided.
A generic implementation is valid when it causes a compile-time error. It
is invalid when it causes a run-time error. Such a generic implementation
is possible. Not providing any generic implementation would be a better
solution than providing an invalid one. Providing an invalid implementation
is a breach of contract.

Quote:

It is better to signal your inability to do that at compile
time than at run time because the error may remain there
hidden for a long time before it is triggered and nobody knows
what has failed.

I sort of agree. The fact remains that most implementations do
provide a generic implementation, which works in a few special
cases. The fact also remains that most implementations don't
document this, so you can only guess what those special cases
are.

And it opens the way for your end user to provide his own
implementation if she feels apt to do it.

All of the implementations do this; the standard requires it.
You can't call it std::char_traits<unsigned char>, because the
standard forbids it. But you can provide a

I can call it std::char_traits<BYTE> if that makes you happier.

Quote:
MyNamespace::char_traits<unsigned char>, or simply a
UCharTraits. And you can instantiate string, basic_filebuf, or
whatever, with it.

What the standard doesn't provide is a means of supplying
std::codecvt<unsigned char, char, mstate_t>. But what

I could supply std::codecvt<BYTE, char, mbstate_t> as above, except that I
do not need it and I cannot fancy what it would be good for so I shall not.

Quote:
basic_filebuf uses is std::codecvt<charT, char,
traits::state_type>. If traits::state_type (or charT, for that
matter) is a user defined type, you do have a right to provide a
specialization, even if the type is in std::. And since you
provide the traits, you decide what traits::state_type is.

It can be done. IMHO, it is a lot of work, just for a
principle, and in the end, basic_filebuf<char> will work just as
well.

I did not have much work with basic_filebuf<unsigned char>; it just works as
it is. On the other hand, basic_filebuf<signed char> fails when it
encounters -1, but it hardly is my problem this time except that I would
prefer this class to be explicitly ill-formed or behave reasonably (it would
require a minor correction to the library header source code).

Quote:
Not all of those generic implementations perform correctly
-

In general, how can you say whether a generic implementation
performs correctly or not, since there is no definition of
what it should do? I imagine that both Plauger's and the
g++ impleemntation "perform correctly" in the sense that
they do what their authors thought would be most
appropriate. That doesn't mean that they do the same thing
in every case, however.

As I have said, common sense should be used. Common sense
involves the usual mental activities of induction, abstraction
and extrapolation

What does common sense say about
char_traits<UnicodeCharacter>::int_type? Where UnicodeCharacter
is a user defined type? And whose common sense should be used:

It is described in 21.1.2 so there is no need to recur to common sense here.

Quote:
Plauger has a lot of common sense. So do the authors of the g++
library. Their "common sense" led to different solutions.

I cannot tell about g++, but the following assertion fails under Dinkumware:

!char_traits<signed char>::eq_int_type(char_traits<signed
char>::to_int_type(-1), char_traits<signed char>::eof())

Do you call it a "solution"? It is an abuse, an unjustified misconception.
It can be seen from the other thread that g++ has a similar problem with
unsigned char. This is even more weird and unexpected.

Quote:

The following example can be used to demonstrate my point:
time_put<wchar_t, ostreambuf_iterator outputs
"2005-11-28", whereas time_put<wchar_t, wchar_t *> extracted
FROM THE SAME LOCALE outputs "11/28/2005". This is most
unexpected and is a BUG even if the standard does not say so.
On the other hand, it would not be a bug if class
time_put<wchar_t, wchar_t *> were UNDEFINED or an instance
thereof NOT PRESENT in the locale.

I'm not sure I understand your point at all. Obviously,
time_put<wchar_t, wchar_t*> isn't required to be present. What
does the implementation specific documentation says that it
does?

The template class describes an object that can serve as a locale facet, to
control conversions of time values to sequences of type Elem.
You could have looked it up yourself:
http://www.dinkumware.com/manuals/reader.aspx?b=p/&h=locale2.html#time_put

Thanks for your time,
Chris



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Wed Nov 30, 2005 6:16 pm    Post subject: Re: For binary files use only read() and write()?? Reply with quote

Christopher Yeleighton wrote:
Quote:
"kanze" <kanze (AT) gabi-soft (DOT) fr> wrote in message
news:1133260316.118637.322400 (AT) g49g2000cwa (DOT) googlegroups.com...
Christopher Yeleighton wrote:
"James Kanze" <kanze (AT) none (DOT) news.free.fr> wrote in message
news:438a2ca5$0$27892$636a15ce (AT) news (DOT) free.fr...

[...]
Quote:
The standard says nothing about the behavior of any
char_traits except the instantiations for char and
wchar_t. I know that both Dinkumware and g++ do provide
generic versions. I also know, from various messages in
different newsgroups, that the two versions are not
identical; that code which works with the g++ verion may
not work with the Dinkumware version.

And this would be the right place to redirect me and the
discussion to the "stringbuf question" thread, especially
because it is a recent thread, than to make me google the
content you provided.

If I had it handy, I would, but I don't keep bookmarks or
references to every thread I read. In this case, the thread I
was thinking about wasn't recent, so it would take me a
considerable amount of time to find it, and it was in
fr.comp.lang.c++, in French, so I'm not sure whether it would
help you even if I did. The gist of it, however, was simple:
someone had written code which used
basic_ios_somthingorother< unsigned char > which worked with one
implementation, and it didn't work with the other. The two
implementations were VC++ (Dinkumware) and g++, but I don't
remember what the exact problem was, nor which direction he was
porting -- from g++ to VC++ or vice versa. All I retained from
the discussion was that both have a generic implementation, but
that they aren't compatible.

Quote:
[cut]
codecvt<unsigned char, unsigned char

Doesn't exist. See table 51 in §22.1.1.1.1. (An
implementation is, of course, free to provide it. Note that
in this case, I don't think that provided a generic codecvt
would work, because you not only need the class, it must
also be registered or known to locale in some way.)

It seems that Mr. Plauger's implementation does not need it
because the buffer works and it does not throw.

So he's providing it. As a plus; you can't count on it being
there. (In this case, providing a generic implementation of the
class is pretty straightforward. Ensuring that use_facet finds
it for a given locale is less so.)

Quote:
converts nothing and neither does codecvt there is no problem about that.

The behavior of codecvt<char,char,mbstate_t> is defined in
the standard. If codecvt<unsigned char, unsigned char,
mbstate_t> is present, it's behavior is implementation
defined (although I can hardly imagine it doing anything be
a degenerate conversion either). Generally speaking, it is
*not* guaranteed to be

FYI: This an example of a common sense reasoning. There is
nothing wrong about that.

Yes and no. From experience, different people have different
ideas of what makes the most common sense.

Quote:
[cut]
I thought character type-related classes should have
generic implementations but perhaps I am wrong and Mr
Plauger is just generous.

What should the generic implementation do? How, for
example, do you define int_type generically for a
char_traits<T>? The

typedef int int_type;

Not very generic, IMHO. What about char_traits<long>?

Explicit specialization is needed, or change the generic type
to long, or assume that long(int(x)) == x and -01L is an
invalid character.

In sum, the generic implementation doesn't work.

If I understand you correctly, you have a special case, and
you're complaining that an implementation doesn't cater for it,
but it doesn't seem to bother you that the implementation
doesn't cater for my special case. I've never instantiated an
iostream class on unsigned char, but I have instantiated
basic_string on UnicodeCharacter. Which in my first
experiments, was a typedef uint32_t. Why is it an error that
the generic instantiation doesn't support your case, but not
that it doesn't support mine?

Quote:
Plauger said it in another posting. His implementation
provides a generic implementation which is useful in some
cases. It's not truly "generic", in the sense that it will
work with any legal character type; such a thing would be
impossible, I think.

Please be gentle and say: 'in another posting with the subject
"stringbuf question"'. It would help a lot. My Google search
failed here.

Because I don't remember the subject. I just remember Plauger's
comment. Vaguely at that -- I use Sun CC and g++
professionally, and neither uses the Dinkumware implementation.

Quote:
Mr. Plauger's position is very impertinent: "I am free to do
anything beyond the realm of what the standard prescribes".

Plauger's position seems fairly reasonable to me: he provided
something in addition to the minimum required, with the hope
(but obviously no guarantee) that it might prove useful to
someone. I've not seriously looked, so I may have missed
something, but I think it is missing some documentation as to
what is and what isn't supported, which you could consider an
error. But it certainly meets all the requirements that it
documents meeting:-).

Quote:
The implementation isn't legal. EOF must be distinguishable
from a character.

I meant "A value of character type equal to EOF exists and
does not constitute a valid character". It was a mental
shortcut.

So. What does Plauger document as the legal character values
for the generic version of char_traits? Or is the presence of
the generic version simply an internal implementation detail,
and not supposed to be used by the user?

[...]
Quote:
For what definition of right? There isn't one, as far as I
can see. And there is no possible generic implementation
which will "work" for all types. Remember, the only
requirement on the character type is that it be a POD.

Mr. Plauger's implementation does not work for PODs in that
the compilation fails; it is perfectly acceptable. But I
cannot accept the situation when the compilation succeeds and
the execution fails.

That's what undefined behavior means, however. I would
certainly agree that the standard leaves far too much behavior
undefined, and that there are a lot of improvements to be done
in this regard. But it is a problem with the standard, and not
a particular implementation.

Quote:
What an implementation should probably do is treat the issue
as implementation defined behavior, rather than undefined
behavior. That is: carefully document whatever they do do.
And in the absense of documentation, the only safe
assumption on the part of the programmer is that the type
doesn't exist.

The headers should be written in such a way that when the
compiler encounters a construct that causes undefined
behaviour it should fail.

Do you realize all of the things that can cause undefined
behavior? What you are requesting just isn't reasonably
possible.

Quote:
Whereas it certainly cannot detect dereferencing a null
pointer, it surely can detect using an unsupported type as a
template parameter.

Not really. At least not reasonably.

Quote:
The standard does not say anything about the availability of a
generic implementation. It does not say it is unavailable.
It says something about the explicit instantiations provided.
A generic implementation is valid when it causes a
compile-time error.

That's your opinion. It's a good ideal. In practice, it's a
lot harder to meet.

Quote:
It is invalid when it causes a run-time error.

Undefined behavior allows for run-time errors. And what is a
"run-time" error. Was the data being processed "valid
characters"? For what definition of "valid character"? (If you
give an invalid value to a function, you have undefined
behavior.)

Quote:
Such a generic implementation is possible. Not providing any
generic implementation would be a better solution than
providing an invalid one. Providing an invalid implementation
is a breach of contract.

The only contract you really have for such things is the
documentation from the implementation.

Quote:
It is better to signal your inability to do that at compile
time than at run time because the error may remain there
hidden for a long time before it is triggered and nobody
knows what has failed.

I sort of agree. The fact remains that most implementations
do provide a generic implementation, which works in a few
special cases. The fact also remains that most
implementations don't document this, so you can only guess
what those special cases are.

And it opens the way for your end user to provide his own
implementation if she feels apt to do it.

All of the implementations do this; the standard requires
it. You can't call it std::char_traits<unsigned char>,
because the standard forbids it. But you can provide a

I can call it std::char_traits<BYTE> if that makes you happier.

If BYTE is a user defined type, you can provide it. If BYTE is
a typedef for a built-in type, you're not allowed to. You can't
do just anything you want in std::.

Quote:
MyNamespace::char_traits<unsigned char>, or simply a
UCharTraits. And you can instantiate string, basic_filebuf,
or whatever, with it.

What the standard doesn't provide is a means of supplying
std::codecvt<unsigned char, char, mstate_t>. But what

I could su