 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Frank Wiener Guest
|
Posted: Tue Sep 09, 2003 12:36 pm Post subject: codecvt for converting between character encodings |
|
|
Hi,
I'm reading about the standard library and am finding little
documentation about using codecvt. I think I can accomplish converting
between encodings in other ways, but I'd like to understand how
codecvt is supposed to work, because I'm expecting it to be something
useful to know in working with the standard library.
In the documentation for libstdc++5 for use with g++, it describes
using their extension __enc_traits as a state_type. So for example,
they have:
|
|
| Back to top |
|
 |
Dietmar Kuehl Guest
|
Posted: Thu Sep 11, 2003 10:14 am Post subject: Re: codecvt for converting between character encodings |
|
|
Frank Wiener wrote:
| Quote: | I would like to know how to use codecvt to convert between character
encodings without relying on something proprietary.
|
The first thing to notice is that code conversion facets using a
different state type than 'std::mbstate_t' are pretty close to
useless: the whole code conversion stuff is there for the purpose
of 'std::basic_filebuf' and this class is not guaranteed to work
correctly unless the state type is 'std::mbstate_t'. Of course,
code conversion can be used in other contexts, too...
| Quote: | If using codecvt for character encoding conversion means I must create
my own stateT type, and assuming that I can do that using iconv or
something else, what protocol is implied by a stateT type?
|
The idea behind the state type is it captures information about
the current state of the conversion between converting of some
chunks of memory: obviously, it is impossible to use a source and
a destination buffer which can capture the whole sequence (proof:
streams are capable of dealing with sequences of arbitrary size,
even sizes beyond the maximum file size, although in this case
probably restricted to one-pass reading). As a consequence, it is
necessary to carry over the current state of the conversion.
Since the code conversion facet is essentially stateless (well,
not strictly but when used it only has an immutable state) the
conversion state is passed into the function.
For many encodings and most implementation using the code
conversion facets you can probably get away without using the
state although I think the standard kind of mandates the use of
the state: even if the conversion function gets just one byte
and one character at a time to convert, it is supposed to make
progress. I'm, however, not aware of any implementation really
taking advantage of this. The worst thing happening is that the
function merely gets one character but a successively increasing
number of bytes to operate on. For stateless encodings (ie. ones
not using shift states) this should allow correct processing. I
don't think that it is possible to avoid usage of the state
argument when the encoding uses shift states: just imagine a
huge file which won't fit into the [virtual] memory and where
the state changes are close to the beginning and close to the
end of the file.
Anyway, for the Unicode encodings I'm aware of (UTF-8, UTF-16
and UCS-4; UCS-2 is, of course, only an incomplete encoding of
Unicode) it should be possible to get away without using the
state argument. BTW, neither 'unsigned short' nor 'wchar_t' is
required to be capable of holding a Unicode character while
'unsigned long' is.
--
<mailto:dietmar_kuehl (AT) yahoo (DOT) com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Thu Sep 11, 2003 3:16 pm Post subject: Re: codecvt for converting between character encodings |
|
|
Ulrich Eckhardt <doomster (AT) knuut (DOT) de> wrote
| Quote: | Hmmm, there is a use though: changing the encoding is done via imbuing
with a different locale. However, according to a discussion in the
STLport forum, this must not be done after the first IO took
place. Swithching between UTF-8 and CP1252 of UTF-16 and UCS2 after
writing an XML-header is thus not possible.
|
Do they give specific justification from the standard? I've seen
several suggestions of this, but I am unaware of any text to back it up.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Dietmar Kuehl Guest
|
Posted: Sat Sep 13, 2003 8:32 am Post subject: Re: codecvt for converting between character encodings |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Ulrich Eckhardt <doomster (AT) knuut (DOT) de> wrote in message
news:<bjmhss$kcdkp$1 (AT) ID-178288 (DOT) news.uni-berlin.de>...
Hmmm, there is a use though: changing the encoding is done via imbuing
with a different locale. However, according to a discussion in the
STLport forum, this must not be done after the first IO took
place. Swithching between UTF-8 and CP1252 of UTF-16 and UCS2 after
writing an XML-header is thus not possible.
Do they give specific justification from the standard? I've seen
several suggestions of this, but I am unaware of any text to back it up.
|
There is some stuff there: see 27.8.1.4 paragraph 17:
-17- Precondition: If the file is not positioned at its beginning
and the encoding of the current locale as determined by
a_codecvt.encoding() is state-dependent
(lib.locale.codecvt.virtuals) then that facet is the same as
the corresponding facet of loc.
Actually, I think that the liberty taken by some implementations in
interpreting this paragraph to mean that you cannot even replace a
facet after merely calling open (because this might already process
some characters...) it taking it a little bit far. There was at
least on discussion of this issue in this forum (basically involving
P.J.Plauger and me). However, the XML question is pretty clear:
you are not guaranteed that you can switch the facets and it would
not work in many situations.
--
<mailto:dietmar_kuehl (AT) yahoo (DOT) com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Sep 15, 2003 10:03 pm Post subject: Re: codecvt for converting between character encodings |
|
|
Dietmar Kuehl <dietmar_kuehl (AT) yahoo (DOT) com> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
Ulrich Eckhardt <doomster (AT) knuut (DOT) de> wrote in message
news:<bjmhss$kcdkp$1 (AT) ID-178288 (DOT) news.uni-berlin.de>...
Hmmm, there is a use though: changing the encoding is done via
imbuing with a different locale. However, according to a
discussion in the STLport forum, this must not be done after the
first IO took place. Swithching between UTF-8 and CP1252 of UTF-16
and UCS2 after writing an XML-header is thus not possible.
Do they give specific justification from the standard? I've seen
several suggestions of this, but I am unaware of any text to back
it up.
There is some stuff there: see 27.8.1.4 paragraph 17:
-17- Precondition: If the file is not positioned at its beginning
and the encoding of the current locale as determined by
a_codecvt.encoding() is state-dependent
(lib.locale.codecvt.virtuals) then that facet is the same as
the corresponding facet of loc.
|
That sounds a bit logical. I can imagine that the logic necessary to
reposition the buffer correctly in the middle of a multibyte character
could be rather difficult.
| Quote: | Actually, I think that the liberty taken by some implementations in
interpreting this paragraph to mean that you cannot even replace a
facet after merely calling open (because this might already process
some characters...) it taking it a little bit far.
|
The wording you cite certainly doesn't support this.
| Quote: | There was at least on discussion of this issue in this forum
(basically involving P.J.Plauger and me). However, the XML question is
pretty clear: you are not guaranteed that you can switch the facets
and it would not work in many situations.
|
Well, XML does make it particularly difficult, since they bury the
encoding fairly deep in the file, and require parsing to get at it.
Roughly speaking, in order to obtain the encoding, you have to be able
to read the file correctly, and in order to read the file correctly, you
have to know the encoding. In practice, however, I would be very
surprised if you encountered a problem reading the file in straight
ASCII, treating all bytes not defined in ASCII as unknown character, and
parsing until you find the charset attribute, then rewinding to the
beginning, then parsing everything using the target codeset, wouldn't
work on the file. And on almost all modern machines, locale "C" will be
capable of reading straight ASCII. Which means that the operation IS
supported according to what you have quoted. Of course, if you are
reading the XML from a socket, the rewind might cause a bit of a
problem:-).
If the <meta> with the charset attribute is the first in the <head>, you
can probably just read in locale "C" until you've parsed it. According
to the above you should be able to change the codeset at that location
as well, since locale "C" is guaranteed not to be state dependant.
Of course, regardless of what the standard says, if your implementation
doesn't support it...
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Aaron Bentley Guest
|
Posted: Tue Sep 16, 2003 11:58 am Post subject: Re: codecvt for converting between character encodings |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: |
Well, XML does make it particularly difficult, since they bury the
encoding fairly deep in the file, and require parsing to get at it.
Roughly speaking, in order to obtain the encoding, you have to be able
to read the file correctly, and in order to read the file correctly, you
have to know the encoding.
|
Actually, this covered in the spec:
http://www.w3.org/TR/REC-xml#sec-guessing
Docs not in UTF-8 must begin with "", and these values are enough data
to determine whether they're in UTF16, UCS32, EBCDIC, or an ASCII-based
character set (e.g. UTF- . UTF-7 and punycode are probably not detectible.
I see it as a hack in both senses-- clever and a kludge.
Aaron
(BTW, just wrote my first Unicode string as a UTF-8 file today )
--
Aaron Bentley
www.aaronbentley.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Wed Sep 17, 2003 8:14 pm Post subject: Re: codecvt for converting between character encodings |
|
|
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
Well, XML does make it particularly difficult, since they bury the
encoding fairly deep in the file, and require parsing to get at it.
Roughly speaking, in order to obtain the encoding, you have to be
able to read the file correctly, and in order to read the file
correctly, you have to know the encoding.
Actually, this covered in the spec:
http://www.w3.org/TR/REC-xml#sec-guessing
Docs not in UTF-8 must begin with "", and these values are enough
data to determine whether they're in UTF16, UCS32, EBCDIC, or an
ASCII-based character set (e.g. UTF- . UTF-7 and punycode are
probably not detectible.
|
And we can suppose that no ASCII-based character sets involve state
encoding. I can still see some problems -- some of my meta tags in the
header contain accented characters which might differ according to the
ASCII based character set used. But at least the C++ side is clear.
| Quote: | I see it as a hack in both senses-- clever and a kludge.
|
It doesn't really solve the basic problem: the definition of the
character set is in a meta tag embedded somewhere in the header.
Possibly after fields like title, or other meta tags, which cannot be
fully interpreted without knowing the character set. IMHO, the
character set should be specified in the doctype -- failing that, in
something that is required to precede the head.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Aaron Bentley Guest
|
Posted: Thu Sep 18, 2003 9:59 am Post subject: Re: codecvt for converting between character encodings |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
news:<%3y9b.1615$mv6.235241 (AT) news20 (DOT) bellglobal.com>...
Docs not in UTF-8 must begin with "<?", and these values are enough
data to determine whether they're in UTF16, UCS32, EBCDIC, or an
ASCII-based character set (e.g. UTF- . UTF-7 and punycode are
probably not detectible.
|
[snip -- see below]
| Quote: | I see it as a hack in both senses-- clever and a kludge.
It doesn't really solve the basic problem: the definition of the
character set is in a meta tag embedded somewhere in the header.
|
Not in XML. Without external encoding information, the encoding must be
specified in the first tag, unless the document is in UTF-8 or UTF-16.
| Quote: | Possibly after fields like title, or other meta tags, which cannot be
fully interpreted without knowing the character set.
|
Until the document encoding has been specified, XML does not permit
characters outside of the us-ascii repetoire to be used.
| Quote: | IMHO, the character set should be specified in the doctype -- failing that, in
something that is required to precede the head.
|
That is the requirement--it must be an attribute of the first element.
You must be thinking of HTML. XML may not be perfect, but I think they
did a pretty good job here.
Aaron
--
Aaron Bentley
www.aaronbentley.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Sep 19, 2003 10:21 am Post subject: Re: codecvt for converting between character encodings |
|
|
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr wrote:
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
news:<%3y9b.1615$mv6.235241 (AT) news20 (DOT) bellglobal.com>...
Docs not in UTF-8 must begin with "", and these values are enough
data to determine whether they're in UTF16, UCS32, EBCDIC, or an
ASCII-based character set (e.g. UTF- . UTF-7 and punycode are
probably not detectible.
[snip -- see below]
I see it as a hack in both senses-- clever and a kludge.
It doesn't really solve the basic problem: the definition of the
character set is in a meta tag embedded somewhere in the header.
Not in XML. Without external encoding information, the encoding must
be specified in the first tag, unless the document is in UTF-8 or
UTF-16.
?xml version=1.0 encoding="us-ascii"?
Possibly after fields like title, or other meta tags, which cannot
be fully interpreted without knowing the character set.
Until the document encoding has been specified, XML does not permit
characters outside of the us-ascii repetoire to be used.
IMHO, the character set should be specified in the doctype --
failing that, in something that is required to precede the head.
That is the requirement--it must be an attribute of the first element.
You must be thinking of HTML. XML may not be perfect, but I think
they did a pretty good job here.
|
Correct. In fact, I was thinking of HTML.
In theory, correct HTML is a DTD of XML. In practice, this is one place
where it diverges.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ben Hutchings Guest
|
Posted: Fri Sep 19, 2003 10:23 am Post subject: Re: codecvt for converting between character encodings |
|
|
In article <d6652001.0309170457.5bfaaed1 (AT) posting (DOT) google.com>,
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
news:<%3y9b.1615$mv6.235241 (AT) news20 (DOT) bellglobal.com>...
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Well, XML does make it particularly difficult, since they bury the
encoding fairly deep in the file, and require parsing to get at it.
Roughly speaking, in order to obtain the encoding, you have to be
able to read the file correctly, and in order to read the file
correctly, you have to know the encoding.
Actually, this covered in the spec:
http://www.w3.org/TR/REC-xml#sec-guessing
Docs not in UTF-8 must begin with "<?", and these values are enough
data to determine whether they're in UTF16, UCS32, EBCDIC, or an
ASCII-based character set (e.g. UTF- . UTF-7 and punycode are
probably not detectible.
And we can suppose that no ASCII-based character sets involve state
encoding. I can still see some problems -- some of my meta tags in the
header contain accented characters which might differ according to the
ASCII based character set used. But at least the C++ side is clear.
|
You're failing to distinguish between HTML and XML.
An XML document using a single-byte encoding or a multi-byte encoding
other than UTF-8 must begin with an XML declaration specifying the
encoding. This declaration can always be interpreted as ASCII or
EBCDIC (which can easily be distinguished). There is no possibility
of there being characters before that declaration that cannot be
interpreted correctly without reading the declaration.
An HTML document is a bit harder, but the only tags that might be
needed before the META element specifying encoding are
<HTML> and <HEAD>, none of which could legally use characters outside
ASCII (or EBCDIC, I suspect).
Anyone using an encoding that isn't based on ASCII, EBCDIC or a UTF
can use real SGML tools. Web software doesn't need to deal with
those encodings.
| Quote: | I see it as a hack in both senses-- clever and a kludge.
It doesn't really solve the basic problem: the definition of the
character set is in a meta tag embedded somewhere in the header.
Possibly after fields like title, or other meta tags, which cannot be
fully interpreted without knowing the character set. IMHO, the
character set should be specified in the doctype -- failing that, in
something that is required to precede the head.
|
When fetching a document by HTTP, the encoding should be specified
in the HTTP header. Local files should normally be presumed to use
the local conventional encoding (except that XML specifies
different rules). The META element allows authors to specify
encoding when they don't have control over the web server
configuration - or in some cases to inform the web server which
encoding to declare.
So, to bring this back on topic, it seems that the correct way to
start reading an XML document using the C++ library is:
1. Open the stream in binary mode.
2. If no encoding is specified externally, use the heuristics
suggested in the XML spec to decide the initial encoding,
then rewind the stream.
3. Select the initial encoding.
4. Parse the XML directive, if present. If it specifies
another encoding, rewind and select that other encoding.
For an HTML document, it's a little different:
1. Open the stream in binary mode.
2. If no encoding is specified externally, use heuristics to
decide the initial encoding, then rewind the stream if it
was necessary to read from it.
3. Select the initial encoding.
4. Begin parsing the document. If a META element specifies
another encoding, discard the parsed data, rewind, select
the other encoding, and start again. After this is done
or once the BODY element is reached, it is safe to process
the parsed data since no further restart will be needed.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
James Kanze Guest
|
Posted: Sun Sep 21, 2003 6:56 pm Post subject: Re: codecvt for converting between character encodings |
|
|
Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> writes:
| Quote: | In article <d6652001.0309170457.5bfaaed1 (AT) posting (DOT) google.com>,
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
news:<%3y9b.1615$mv6.235241 (AT) news20 (DOT) bellglobal.com>...
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Well, XML does make it particularly difficult, since they bury
the encoding fairly deep in the file, and require parsing to
get at it. Roughly speaking, in order to obtain the encoding,
you have to be able to read the file correctly, and in order
to read the file correctly, you have to know the encoding.
Actually, this covered in the spec:
http://www.w3.org/TR/REC-xml#sec-guessing
Docs not in UTF-8 must begin with "<?", and these values are
enough data to determine whether they're in UTF16, UCS32,
EBCDIC, or an ASCII-based character set (e.g. UTF- . UTF-7 and
punycode are probably not detectible.
And we can suppose that no ASCII-based character sets involve
state encoding. I can still see some problems -- some of my meta
tags in the header contain accented characters which might differ
according to the ASCII based character set used. But at least the
C++ side is clear.
You're failing to distinguish between HTML and XML.
|
I've realized that. While formally, HTML is supposed to be a document
type of XML, most HTML isn't.
| Quote: | An XML document using a single-byte encoding or a multi-byte
encoding other than UTF-8 must begin with an XML declaration
specifying the encoding. This declaration can always be interpreted
as ASCII or EBCDIC (which can easily be distinguished). There is no
possibility of there being characters before that declaration that
cannot be interpreted correctly without reading the declaration.
An HTML document is a bit harder, but the only tags that might be
needed before the META element specifying encoding are
HTML> and <HEAD>, none of which could legally use characters
outside ASCII (or EBCDIC, I suspect).
|
The only things that are needed. The problem is that the HTML standard
doesn't forbid anything else, so you cannot count on a foreign site not
inserting anything else.
| Quote: | Anyone using an encoding that isn't based on ASCII, EBCDIC or a UTF
can use real SGML tools. Web software doesn't need to deal with
those encodings.
I see it as a hack in both senses-- clever and a kludge.
It doesn't really solve the basic problem: the definition of the
character set is in a meta tag embedded somewhere in the header.
Possibly after fields like title, or other meta tags, which
cannot be fully interpreted without knowing the character set.
IMHO, the character set should be specified in the doctype --
failing that, in something that is required to precede the head.
When fetching a document by HTTP, the encoding should be specified
in the HTTP header. Local files should normally be presumed to use
the local conventional encoding (except that XML specifies different
rules). The META element allows authors to specify encoding when
they don't have control over the web server configuration - or in
some cases to inform the web server which encoding to declare.
So, to bring this back on topic, it seems that the correct way to
start reading an XML document using the C++ library is:
1. Open the stream in binary mode.
2. If no encoding is specified externally, use the heuristics
suggested in the XML spec to decide the initial encoding,
then rewind the stream.
3. Select the initial encoding.
4. Parse the XML directive, if present. If it specifies
another encoding, rewind and select that other encoding.
For an HTML document, it's a little different:
1. Open the stream in binary mode.
2. If no encoding is specified externally, use heuristics to
decide the initial encoding, then rewind the stream if it
was necessary to read from it.
3. Select the initial encoding.
4. Begin parsing the document. If a META element specifies
another encoding, discard the parsed data, rewind, select
the other encoding, and start again. After this is done
or once the BODY element is reached, it is safe to process
the parsed data since no further restart will be needed.
|
As a heuristic, this would probably work. Simply because none of the
multibyte encodings you are likely to encounter can result in falsely
parsing a META element specifying the encoding.
Of course, if the source happens to be a socketstream, that bit about
rewinding might be a bit difficult:-). In practice, you probably need a
special streambuf, which encapsulates the above steps, saving what it
has read in a buffer, and after a rewind, uses the buffer until it has
finished. Or something along those lines.
--
James Kanze mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France +33 1 41 89 80 93
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John G Harris Guest
|
Posted: Mon Sep 22, 2003 10:32 pm Post subject: Re: codecvt for converting between character encodings |
|
|
In article <868yoiax8n.fsf (AT) alex (DOT) gabi-soft.fr>, James Kanze
<kanze (AT) alex (DOT) gabi-soft.fr> writes
| Quote: | Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> writes:
|> In article <d6652001.0309170457.5bfaaed1 (AT) posting (DOT) google.com>,
|> [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
|> > Aaron Bentley <aaron.bentley (AT) utoronto (DOT) ca> wrote in message
|> > news:<%3y9b.1615$mv6.235241 (AT) news20 (DOT) bellglobal.com>...
snip
|> You're failing to distinguish between HTML and XML.
I've realized that. While formally, HTML is supposed to be a document
type of XML, most HTML isn't.
snip |
Actually, HTML is a document type of SGML. It's XHTML that's a document
type of XML. The codeset rules could well be different.
John
--
John Harris
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|