 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Jean-Marc Bourguet Guest
|
Posted: Mon Nov 13, 2006 10:44 pm Post subject: Wide characters and narrow streams |
|
|
My understanding was that char was the type to be used when storing
characters when their code was small enough, and that wchar_t was to be
used in other cases.
I was, somewhat naively apparently, expecting that the conversion done from
the external representation to the internal one depended only on the locale
imbued on the stream and *not* on the width of the stream. IE, the numeric
values returned from call to getc() would be the same, just that wide
stream would be able to return all values for larger character sets, narrow
stream would return an error (badbit set) when the code was outside the
range representable in a char.
That is not what happen in the two implementations I've tried which are
quite independant. When I read from a narrow stream after having imbued an
UTF-8 locale, I just get the encoded representation. When I read from a
wide stream in the same conditions, I get the decoded values.
Reading what I think are the relevant part of the C++ standard, I see
nothing which which mandate either behavior. Did I miss something?
--
Jean-Marc
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
Alberto Ganesh Barbati Guest
|
Posted: Tue Nov 14, 2006 12:14 am Post subject: Re: Wide characters and narrow streams |
|
|
Jean-Marc Bourguet ha scritto:
| Quote: | My understanding was that char was the type to be used when storing
characters when their code was small enough, and that wchar_t was to be
used in other cases.
I was, somewhat naively apparently, expecting that the conversion done from
the external representation to the internal one depended only on the locale
imbued on the stream and *not* on the width of the stream. IE, the numeric
values returned from call to getc() would be the same, just that wide
stream would be able to return all values for larger character sets, narrow
stream would return an error (badbit set) when the code was outside the
range representable in a char.
That is not what happen in the two implementations I've tried which are
quite independant. When I read from a narrow stream after having imbued an
UTF-8 locale, I just get the encoded representation. When I read from a
wide stream in the same conditions, I get the decoded values.
Reading what I think are the relevant part of the C++ standard, I see
nothing which which mandate either behavior. Did I miss something?
|
First of all, only file streams do conversions. The conversion is
performed by a codecvt facet which matches the stream character type.
More precisely, the fstream/codecvt conspiracy provides conversion from
an external sequence (always represented as a sequence of chars) to an
internal sequence of either chars or wide chars. The C++ Standard does
not specify the behavior of any such conversion, except for the trivial
one, where exactly one external char is converted to one internal char
with the same value and vice versa.
So, about your UTF-8 locale, you're deep inside the
implementation-defined realm. Whatever the conversion is doing is ok
from the C++ Standard point of view.
Ganesh
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
James Kanze Guest
|
Posted: Tue Nov 14, 2006 5:16 pm Post subject: Re: Wide characters and narrow streams |
|
|
Jean-Marc Bourguet wrote:
| Quote: | My understanding was that char was the type to be used when storing
characters when their code was small enough, and that wchar_t was to be
used in other cases.
|
Not really. As far as the standard is concerned, about all you
are guaranteed is that:
-- both the narrow character set and the wide character set
will contain all of the characters in the basic character
set (regardless of locale, I think), and
-- all external IO takes place over narrow characters.
There's no guarantee that wchar_t is larger than a char, for
example. And even less that it is Unicode, or anything else one
might expect.
In practice, most of the time I would expect that in most
locales (and in "C" locale), all of the characters in the basic
character set have the same numeric encoding in the two types,
but this is not guaranteed (and on an IBM mainframe, it might
even make sense for narrow characters to be EBCDIC, and wide
characters Unicode).
| Quote: | I was, somewhat naively apparently, expecting that the
conversion done from the external representation to the
internal one depended only on the locale imbued on the stream
and *not* on the width of the stream.
|
It depends on the facet, and different width streams use
different facets.
In practice, I don't quite see how it could be otherwise. The
conversion cannot be the same, since in one case, I either get
multibyte characters, or some characters are not representable,
and in the other, I get wide characters (which are in theory
never multibyte, even if in practice, surrogate characters may
appear).
I wonder if you idea isn't conditioned by the fact that you live
in an area where ISO 8859-1 is widespread, and the fact that all
of the characters in ISO 8859-1 have the same numeric encoding
as in Unicode. Imagine, however, that you lived in eastern
Europe, and imbued an ISO 8859-2 locale. What would you expect
if the file contained the character 0xC8 (a C with caron---the
first letter of Czeck in Czeck) when read as a wide character.
Surely not 0x00C8 (a 'È' in Unicode).
Actually, you don't even have to go as far afar as eastern
Europe. How do you expect to handle the transition to
ISO 8859-15 (necessary for the Euro, and also, in France, for
the OE and oe ligatures)? The Unicode representation for Euro
is 0x20AC, which isn't representable in a char on the machines I
generally work on. Do you really expect some sort of error on
encountering a Euro character when reading a file encoded in
8859-15 with a narrow character stream. The whole point of
ISO 8859-15 is that I don't need to use wide characters when
working in a western European environment. (Or at least some
western European environments---I don't think it covers Catalan,
which is western European to me.)
| Quote: | IE, the numeric
values returned from call to getc() would be the same, just that wide
stream would be able to return all values for larger character sets, narrow
stream would return an error (badbit set) when the code was outside the
range representable in a char.
|
I'm not quite sure how that could be. The locale determines the
encoding, and is mainly relevant for wide characters (here). I
would expect that most locales (with the exception of exotic
locales like EBCDIC) would suppose that the internal encoding of
char corresponds to that of the locale. This is supported by
the fact that changing the locale also changes the behavior of
functions like isalpha. (isalpha( 0xBD ) should be false in an
ISO 8859-1 locale, but true in ISO 8859-15.)
In theory, the same thing may apply to wide characters, of
course, but the intent, I think, is that the wide character
encoding be pretty much locale independant. Unless historical
considerations argue against it, I would expect that wchar_t be
a 32 bit type, using Unicode, regardless of the locale. So that
isalpha(wchar_t) would in fact be locale independant. The
result is, of course, that the character code translation must
then be locale dependant. (In practice, historical
considerations intervene more often than not, and of the three
systems to which I have ready access---Solaris, Linux and
Windows---, only Linux does it this way.)
This philosophy is reflected, at least in Posix, in the naming
conventions of the locale, which reflect the name of the narrow
character encoding, and do not contain any component specifying
wide character encoding (which thus must be assumed to be locale
independant).
| Quote: | That is not what happen in the two implementations I've tried which are
quite independant. When I read from a narrow stream after having imbued an
UTF-8 locale, I just get the encoded representation. When I read from a
wide stream in the same conditions, I get the decoded values.
|
Which is what I would more or less expect. The internal
encoding of char varies according to the locale---otherwise,
there would be no point in making isalpha(char) locale
dependant, and people living in places where ISO 8859-1 is not
appropriate (e.g. anywhere in the Euro zone, today) would be
pretty much screwed.
| Quote: | Reading what I think are the relevant part of the C++ standard, I see
nothing which which mandate either behavior. Did I miss something?
|
Well, there's very little concerning locales (other than "C")
and wide characters that isn't implementation dependant, so
you're probably reading in the wrong place. I think that the
intent, however, is what I just explained. But you'll have to
check the implementation documentation each time to see what
they think.
--
James Kanze (GABI Software) email:james.kanze (AT) gmail (DOT) com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
Jean-Marc Bourguet Guest
|
Posted: Fri Nov 17, 2006 9:08 am Post subject: Re: Wide characters and narrow streams |
|
|
"James Kanze" <james.kanze (AT) gmail (DOT) com> writes:
| Quote: | Jean-Marc Bourguet wrote:
My understanding was that char was the type to be used when storing
characters when their code was small enough, and that wchar_t was to be
used in other cases.
Not really. As far as the standard is concerned, about all you
are guaranteed is that:
-- both the narrow character set and the wide character set
will contain all of the characters in the basic character
set (regardless of locale, I think), and
|
With positive codes. And I think that the code of character in the basic
sets can't be locale specific. (The standard says in 2.2./3 "The value of
the members of the execution character sets are implementation dependant
and any additional members are locale specific.").
| Quote: | -- all external IO takes place over narrow characters.
There's no guarantee that wchar_t is larger than a char, for
example. And even less that it is Unicode, or anything else one
might expect.
In practice, most of the time I would expect that in most
locales (and in "C" locale),
|
My reading of 2.2 is that the codes of the characters in the basic
character set may not dependant of the locale.
| Quote: | all of the characters in the basic character set have the same numeric
encoding in the two types, but this is not guaranteed (and on an IBM
mainframe, it might even make sense for narrow characters to be EBCDIC,
and wide characters Unicode).
|
I agree. And I don't even have a clue about non unicode wide
representation (I know a little about some multibyte representation, but I
don't know how they are transformed in a wide representation).
| Quote: | I was, somewhat naively apparently, expecting that the conversion done
from the external representation to the internal one depended only on
the locale imbued on the stream and *not* on the width of the stream.
It depends on the facet, and different width streams use different
facets.
|
Facets are part of locale... I was assuming that factets for handling a
given encoding would behave essentially the same.
| Quote: | In practice, I don't quite see how it could be otherwise. The
conversion cannot be the same, since in one case, I either get
multibyte characters, or some characters are not representable,
and in the other, I get wide characters (which are in theory
never multibyte, even if in practice, surrogate characters may
appear).
|
If you consider combining characters, you still have multi word characters.
And combining characters are quite old (in some national variants of
ISO-646 sequence of accent, backspace act officially as combining
characters). And you have to handle combining character to get a sensible
user level behaviour.
| Quote: | I wonder if you idea isn't conditioned by the fact that you live
in an area where ISO 8859-1 is widespread, and the fact that all
of the characters in ISO 8859-1 have the same numeric encoding
as in Unicode.
|
It's true that I may be influenced by my context. I've files in ISO
8859-1, I've programs handling them using char. New Unix versions tend to
assume UTF-8. I wanted to change my default locale so that I don't have to
search on how to change the default (GUI make this more complicated than
"LC_ALL=fr_FR.ISO-8859-1; export LC_ALL" in your .profile)
As Unicode kept the encoding of ISO-8859-1, I was assumed that my programs
would work without recompiling with a UTF-8 locale if their data files were
just converted from ISO 8859-1 to UTF-8 (they already set the globale
locale). It didn't work. I started to read more about locale and found
how not only that the little I though I knew was probably false, but that
the situation seemed confused.
In C, narrow streams have to return the encoded form. I haven't see a way
to know if the charset in use was narrow. So you can't do much with char:
you can't give an error message if the encoding is not narrow. you can do
binary IO (if the stream has been open with "b") and use the multibyte
functions.
In C++, I saw nothing either mandating or preventing reading from a narrow
stream to return the encoding. I saw nothing either mandating nor
preventing reading from a narrow stream to return decoded characters, with
run time error if the character was not representable. I saw no way to
know if the charset in use was narrow and so it was possible to savely use
narrow stream. Again, char are not really usable for textual IO in robust
programs. We aren't even in the C situation were you can use narrow stream
for binary reading with a global locale which use a multibyte encoding or
use the multibyte functions. I hope I'm wrong.
| Quote: | Imagine, however, that you lived in eastern Europe, and imbued an ISO
8859-2 locale. What would you expect if the file contained the character
0xC8 (a C with caron---the first letter of Czeck in Czeck) when read as a
wide character. Surely not 0x00C8 (a 'È' in Unicode).
|
No. I still expect either the precomposed character for C caron, or a C
and a combining caron (if the internal encoding for wide character is
Unicode). If I'm wrong and you can't savely read a narrow encoding in a
wide stream, then you can't do robust text IO at all excepted by doing
binary IO in the classic locale and reinterpreting the encoding yourself?
| Quote: | Actually, you don't even have to go as far afar as eastern Europe. How
do you expect to handle the transition to ISO 8859-15 (necessary for the
Euro, and also, in France, for the OE and oe ligatures)? The Unicode
representation for Euro is 0x20AC, which isn't representable in a char on
the machines I generally work on. Do you really expect some sort of
error on encountering a Euro character when reading a file encoded in
8859-15 with a narrow character stream.
|
No. I expected an error when reading an euro character in a narrow stream
with an UTF-8 locale.
| Quote: | The whole point of ISO 8859-15 is that I don't need to use wide
characters when working in a western European environment. (Or at least
some western European environments---I don't think it covers Catalan,
which is western European to me.)
IE, the numeric values returned from call to getc() would be the same,
just that wide stream would be able to return all values for larger
character sets, narrow stream would return an error (badbit set) when
the code was outside the range representable in a char.
I'm not quite sure how that could be. The locale determines the
encoding, and is mainly relevant for wide characters (here).
|
Is it relevant at all for narrow character? I'd prefer the C situation
where we can do very some things with a narrow stream (binary IO, IO then
converting ourself with the multibyte functions) to the apparent situation
in C++ where we can't count on anything with narrow stream without having
assumption on the locale and without being able to check that these
assumption hold.
| Quote: | I would expect that most locales (with the exception of exotic locales
like EBCDIC) would suppose that the internal encoding of char corresponds
to that of the locale.
|
Well, my reading is that an EBCDIC locale in an otherwise ASCII environment
must do remapping in IO to work. And that this is possible in C++ with
narrow stream but not in C.
| Quote: | This is supported by the fact that changing the
locale also changes the behavior of functions like isalpha. (isalpha(
0xBD ) should be false in an ISO 8859-1 locale, but true in ISO 8859-15..)
In theory, the same thing may apply to wide characters, of
course, but the intent, I think, is that the wide character
encoding be pretty much locale independant.
|
I can see an Unix supporting a Japanese locale with an EUC external
encoding put in an traditional internal encoding (I don't have any idea of
what they are) as well as with an UTF-16 external encoding put in an UTF-16
or UTF-32 internal form.
| Quote: | Unless historical considerations argue against it, I would expect that
wchar_t be a 32 bit type, using Unicode, regardless of the locale. So
that isalpha(wchar_t) would in fact be locale independant. The result
is, of course, that the character code translation must then be locale
dependant. (In practice, historical considerations intervene more often
than not, and of the three systems to which I have ready
access---Solaris, Linux and Windows---, only Linux does it this way.)
This philosophy is reflected, at least in Posix, in the naming
conventions of the locale, which reflect the name of the narrow
character encoding, and do not contain any component specifying
wide character encoding (which thus must be assumed to be locale
independant).
|
I wouldn't call UTF-8 a narrow character encoding.
| Quote: | That is not what happen in the two implementations I've tried which are
quite independant. When I read from a narrow stream after having imbued an
UTF-8 locale, I just get the encoded representation. When I read from a
wide stream in the same conditions, I get the decoded values.
Which is what I would more or less expect. The internal
encoding of char varies according to the locale---otherwise,
there would be no point in making isalpha(char) locale
dependant, and people living in places where ISO 8859-1 is not
appropriate (e.g. anywhere in the Euro zone, today) would be
pretty much screwed.
|
You seem to assumed even more constraint here that I did.
What I assumed (see below for my understanding of the situation) was:
- binary IO: do as few transformation on the transmitted bytes as possible
in the given context; those transformations are locale independant.
- text IO:
- '\n' instead of the platform conventions (CR, CR-LF, LF, record length
-- sadly we don't know if '\n' is a line separator or a line terminator)
- check end of file with platform conventions for text (^Z for instance)
- decode the external encoding (which is the same for both narrow and
wide stream) to an internal representation (which may be locale
dependant -- so the localization of character characterization --,
which may be different for narrow and wide char) with an error if the
encoding is meaningless or the decoded result not available in the
choosen representation (this last case not possible for wide
characters)
| Quote: | Reading what I think are the relevant part of the C++ standard, I see
nothing which which mandate either behavior. Did I miss something?
Well, there's very little concerning locales (other than "C")
and wide characters that isn't implementation dependant, so
you're probably reading in the wrong place. I think that the
intent, however, is what I just explained. But you'll have to
check the implementation documentation each time to see what
they think.
|
There is so few constraints put on locale that I wonder what it is possible
to do with narrow IOStreams without assuming things that you can't check
about the imbued locale. My current understanding:
- IOStream text IO:
- '\n' instead of the platform conventions (CR, CR-LF, LF, record length
-- sadly we don't know if '\n' is a line separator or a line terminator)
- check end of file with platform conventions for text (^Z for instance)
- perhaps other transformations?
- wide stream: encode/decode the external encoding to an internal
representation (which may be locale dependant -- so the localization
of character characterization) with an error if the encoding is
meaningless.
- narrow stream: don't know if there is an encoding/decoding or not.
- IOStream binary IO
- we don't have the '\n', end of file and other? transformations.
- we have the encoding/decoding... if there is one
- C stream text IO:
- '\n' instead of the platform conventions (CR, CR-LF, LF, record length
-- sadly we don't know if '\n' is a line separator or a line terminator)
- check end of file with platform conventions for text (^Z for instance)
- perhaps other transformations?
- narrow stream: no transformation whatever the locale is
- wide stream: encode/decode the external encoding (which is the same
for both narrow and wide stream) to an internal representation (which
may be locale dependant).
- C stream binary IO:
- we don't have the '\n', end of file and other? transformations.
- narrow stream: no transformation whatever the locale is
- wide stream: encode/decode the external encoding (which is the same
for both narrow and wide stream) to an internal representation (which
may be locale dependant).
Yours,
--
Jean-Marc
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
Krzysztof Zelechowski Guest
|
Posted: Tue Nov 21, 2006 10:51 pm Post subject: Re: Wide characters and narrow streams |
|
|
Narrow character streams do not decode characters.
And you cannot expect them to do it because it does not work that way.
You cannot decode narrow character sequences.
You can recode them, but it is a two-step process:
decode to wide characters and encode to narrow characters, perhaps using a
different locale.
In theory you can tell what kind of encoding the locale uses by examining
the character length.
If the character length is fixed, you can seek offsets in file buffers;
if it is not, you can only seek positions.
Note that not all implementations of the standard library are reliable:
Dinkumware, for example, tells me that the "C" locale has a variable-length
encoding.
Mr. Plauger is ready to say that he has the right to be fully agnostic;
perhaps it is, but such a doctrine-driven stand does not make much sense to
me as the end user.
If your code contains narrow character literals or narrow string literals,
it must be recompiled because you have to recode the source files.
If you open the source files under a locale that uses an encoding different
from the original,
your literals will probably be unreadable.
No surprise that they do not make sense to your programme either.
Therefore it is always safer to use wide character literals and wide
character streams for all processing purposes.
And if you recode single byte narrow characters, the source code will be
ill-formed
because it will contain multiple bytes between single quotes;
your compiler may accept them silently (especially at Apple),
but then you will get runtime misbehaviour
because you cannot get such characters from the input stream.
It is the core cause of your failure: narrow I/O streams operate on bytes,
not on characters.
And for a good reason: the UTF-8 encoding of a character can take up to 6
bytes; that will not fit into an integer;
you would have to recourse to using UTF-16 surrogates encoded into UTF-8 as
separate characters;
while such a double encoding is technically possible, it is cumbersome and
weird.
Chris
Uzytkownik "Jean-Marc Bourguet" <jm (AT) bourguet (DOT) org> napisal w wiadomosci
news:pxbzmaskcke.fsf (AT) news (DOT) bourguet.org...
It's true that I may be influenced by my context. I've files in ISO
8859-1, I've programs handling them using char. New Unix versions tend to
assume UTF-8. I wanted to change my default locale so that I don't have to
search on how to change the default (GUI make this more complicated than
"LC_ALL=fr_FR.ISO-8859-1; export LC_ALL" in your .profile)
As Unicode kept the encoding of ISO-8859-1, I was assumed that my programs
would work without recompiling with a UTF-8 locale if their data files were
just converted from ISO 8859-1 to UTF-8 (they already set the globale
locale). It didn't work. I started to read more about locale and found
how not only that the little I though I knew was probably false, but that
the situation seemed confused.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
Jean-Marc Bourguet Guest
|
Posted: Thu Nov 23, 2006 4:40 am Post subject: Re: Wide characters and narrow streams |
|
|
krixel (AT) qed (DOT) pl ("Krzysztof Zelechowski") writes:
| Quote: | Uzytkownik "James Kanze" <james.kanze (AT) gmail (DOT) com> napisal w wiadomosci
news:1164201538.184899.241920 (AT) f16g2000cwb (DOT) googlegroups.com...
"Krzysztof Zelechowski" wrote:
Narrow character streams do not decode characters. And you
cannot expect them to do it because it does not work that way.
You cannot decode narrow character sequences.
I've wondered about this. The standard requires all instances
of basic_filebuf, including basic_filebuf<char>, to use the
codecvt for code translation, at least in theory. The standard
also have a requirement that
std::codecvt<char,char,mtstate_t>::do_always_noconv() return
true. Does this mean that it is impossible to create a locale
with a facet std::codecvt<char,char,mtstate_t> where this isn't
true? I don't think so; I can certainly create instances of
other standard facets to do what I want, regardless of what the
default version does. But to tell the truth, I'm not sure. I
find it very difficult to say what is and what is not allowed
when it comes to locale.
You definitely do not have it out of the box,
|
That was my expectation that we got it. I still I see nothing which
prevent a system provided locale to do a decoding for narrow stream, but I
see nothing which mandate it -- an unfortunate situation. Perhaps the fact
that C IO seem unable to do such a decoding (there is an mbstate_t
explicitly associated with a wide stream, nothing is mentionned for a
narrow stream) is the explanation of the behavior I observed.
| Quote: | which is the situation of the OP.
|
My situation is under control. Personnal programs on personnal data.
I started the thread to check if my understanding of the matter -- see the
end of <pxbzmaskcke.fsf (AT) news (DOT) bourguet.org> for the rest of it -- is
correct. The most shocking part is that in locale other than "C", narrow
streams are useless: you may know if that it doesn't do a conversion (with
codecvt<>::always_noconv()) but you don't know if it is because there is no
need for one or because the underlying charset is wide.
Yours,
--
Jean-Marc
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
Kristof Zelechovski Guest
|
Posted: Fri Nov 24, 2006 6:55 pm Post subject: Re: Wide characters and narrow streams |
|
|
Uzytkownik "Jean-Marc Bourguet" <jm (AT) bourguet (DOT) org> napisal w wiadomosci > I
started the thread to check if my understanding of the matter -- see the
| Quote: | correct. The most shocking part is that in locale other than "C", narrow
streams are useless: you may know if that it doesn't do a conversion (with
codecvt<>::always_noconv()) but you don't know if it is because there is
no
need for one or because the underlying charset is wide.
|
Usually because the recoding process
can be conceptually split into two parts: decoding and reencoding.
I admit it could be more efficient if it were done as if by Unix tr tool
but not all recodings can be implemented in such a way.
And the underlying character set is never wide.
The reason is you cannot limit reading from text files to blocks;
you can always read just one byte.
What you may perceive as wide characters stored directly in the file
is wide characters converted to the UTF encoding
of appropriate bit length and direction.
(A single byte does not have any perceivable "direction";
however, a sequence of bytes does.)
This encoding may be simpler to decode,
but it is a narrow character encoding nevertheless,
and, surprise, need not be supported by the locale system.
(This is the present deplorable condition of Microsoft C++).
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ] |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|