C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Universal character names -- I'm still confused
Goto page 1, 2  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards
View previous topic :: View next topic  
Author Message
Stefan Heinzmann
Guest





PostPosted: Thu Oct 16, 2003 2:21 am    Post subject: Universal character names -- I'm still confused Reply with quote



Hi all,

I'm just reading through the C standard book from Wiley in anticipation
of the C++ standard book. I understand that C and C++ are supposed to
handle universal character names in the same way. And I would appreciate
if someone could explain to me how that is intended to be used in practice.

For example, if I want to include German umlauts (such as ä or ü) in
identifiers, I actually want them to appear in print and on screen as
the proper glyph and not as u1234. This uglification may be appropriate
for the compiler for parsing, but certainly not for human reading. Is my
editor supposed to do the conversion? Or is the language implementation
supposed to provide a preprocessor (pre-preprocessor) to convert
non-ASCII characters to UCNs?

The standard may not mandate any particular way, but surely people must
have an idea of what kind of support should be provided, or else what is
the point of allowing UCNs in identifiers?

Cheers
Stefan

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Back to top
Ross Ridge
Guest





PostPosted: Thu Oct 16, 2003 8:07 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote



Stefan Heinzmann <stefan_heinzmann (AT) yahoo (DOT) com> wrote:
Quote:
For example, if I want to include German umlauts (such as ä or ü) in
identifiers, I actually want them to appear in print and on screen as
the proper glyph and not as u1234. This uglification may be appropriate
for the compiler for parsing, but certainly not for human reading. Is my
editor supposed to do the conversion?

Yes. Ask the people who made your editor to add this feature, they're
undoubtably under the impression that no one actually wants it.

Quote:
Or is the language implementation supposed to provide a preprocessor
(pre-preprocessor) to convert non-ASCII characters to UCNs?

If this was required then there really wouldn't be any point to UCNs
in identifiers would there? It would be simpler to just require the
implementation to accept German umlauts in identifiers as is.

Quote:
The standard may not mandate any particular way, but surely people must
have an idea of what kind of support should be provided, or else what is
the point of allowing UCNs in identifiers?

Java had it, so C++ had to have it too.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] [email]rridge (AT) csclub (DOT) uwaterloo.ca[/email]
-()-/()/ http://www.csclub.uwaterloo.ca/u/rridge/
db //

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Hyman Rosen
Guest





PostPosted: Thu Oct 16, 2003 8:07 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote



Stefan Heinzmann wrote:
Quote:
The standard may not mandate any particular way, but surely people must
have an idea of what kind of support should be provided, or else what is
the point of allowing UCNs in identifiers?

Read 2.1/1. The translation of physical source file characters
to the source character set is implementation-defined. Source
file characters outside of the basic set are translated to the
universal character name equivalent. (Logically, of course. The
implementation is free to represent things any way it wants.)

So it's up to your compiler vendor to decide what kind of source
file encodings it understands, and up to your editor to decide
how these encodings are displayed. The standard doesn't care, and
chooses to explain how things work in terms of UCNs.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Ben Hutchings
Guest





PostPosted: Thu Oct 16, 2003 8:07 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

In article <bme4m3$dvi$05$1 (AT) news (DOT) t-online.com>, Stefan Heinzmann wrote:
<snip>
Quote:
For example, if I want to include German umlauts (such as ä or ü) in
identifiers, I actually want them to appear in print and on screen as
the proper glyph and not as u1234.

When are you expecting identifiers to appear on print or screen? Are
you using an extension like __FUNCTION__ or an assert() implementation
that shows function names?

Quote:
This uglification may be appropriate
for the compiler for parsing, but certainly not for human reading. Is my
editor supposed to do the conversion? Or is the language implementation
supposed to provide a preprocessor (pre-preprocessor) to convert
non-ASCII characters to UCNs?
snip


The language implementation should convert UCNs in *literals* into the
corresponding characters in the execution character set at translation
time. If there is no corresponding character to a UCN, the conversion
is implementation defined. (References: 2.13.2/5, 2.13.4/5.) It may
be necessary to tell your implementation that the execution character
set is what you intend it to be and not, say, ASCII, which has no
accented letters.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
James Kuyper
Guest





PostPosted: Fri Oct 17, 2003 4:12 am    Post subject: Re: Universal character names -- I'm still confused Reply with quote

stefan [email]heinzmann (AT) yahoo (DOT) com[/email] (Stefan Heinzmann) wrote in message news:<bme4m3$dvi$05$1 (AT) news (DOT) t-online.com>...
....
Quote:
The standard may not mandate any particular way, but surely people must
have an idea of what kind of support should be provided, or else what is
the point of allowing UCNs in identifiers?

It was anticipated that some editors would be written to automatically
handle UCN's internally, so the user wouldn't have to think about
them. I've no idea whether any such editors have actually been
written.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Sat Oct 18, 2003 5:29 am    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]hyrosen (AT) mail (DOT) com[/email] (Hyman Rosen) wrote in message
news:<1066312453.893441 (AT) master (DOT) nyc.kbcfp.com>...
Quote:
Stefan Heinzmann wrote:
The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or
else what is the point of allowing UCNs in identifiers?

Read 2.1/1. The translation of physical source file characters to the
source character set is implementation-defined. Source file characters
outside of the basic set are translated to the universal character
name equivalent. (Logically, of course. The implementation is free to
represent things any way it wants.)

So it's up to your compiler vendor to decide what kind of source file
encodings it understands, and up to your editor to decide how these
encodings are displayed. The standard doesn't care, and chooses to
explain how things work in terms of UCNs.

Except that if you want to be sure someone else can compile your files,
they will physically contain u00E4, rather than a ä. Which isn't
really very practical unless your other tools display this as an ä.

In practice, UCN's are about as useful as trigraphs for writing
readable, portable programs. Which is a shame, because they could be
really useful -- C++ has done its part, and both C and Java have
followed, which sounds pretty much like a de facto standard to me.

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Ross Ridge
Guest





PostPosted: Sat Oct 18, 2003 4:51 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

<kanze (AT) gabi-soft (DOT) fr> wrote:
Quote:
In practice, UCN's are about as useful as trigraphs for writing
readable, portable programs. Which is a shame, because they could be
really useful -- C++ has done its part, and both C and Java have
followed, which sounds pretty much like a de facto standard to me.

That makes it anything but a "de facto" standard. "In fact" there is
no standard, because "in fact" on one uses it, and "in fact" no one
who doesn't have to, like C/C++/Java third party tools, supports it.
It's a standard that only exists as words in a document.

And C++ took the idea from Java, ignoring the fact existing practice in
Java had already shown UCNs in identifiers to be as useful as trigraphs
for writing readable, portable programs.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] [email]rridge (AT) csclub (DOT) uwaterloo.ca[/email]
-()-/()/ http://www.csclub.uwaterloo.ca/u/rridge/
db //

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Sat Oct 18, 2003 4:51 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]rridge (AT) csclub (DOT) uwaterloo.ca[/email] (Ross Ridge) wrote in message
news:<bmlrk4$2b6$1 (AT) rumours (DOT) uwaterloo.ca>...

Quote:
The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or else
what is the point of allowing UCNs in identifiers?

Java had it, so C++ had to have it too.

Except that C++ had it (at least in a draft standard) before Java came
along. In this case, I think that it is more C++ had it, so Java had to
have it as well.

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Sun Oct 19, 2003 7:24 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

stefan [email]heinzmann (AT) yahoo (DOT) com[/email] (Stefan Heinzmann) wrote in message
news:<bme4m3$dvi$05$1 (AT) news (DOT) t-online.com>...

Quote:
I'm just reading through the C standard book from Wiley in
anticipation of the C++ standard book. I understand that C and C++ are
supposed to handle universal character names in the same way. And I
would appreciate if someone could explain to me how that is intended
to be used in practice.

How they were intended, or how things are actually working out. For all
intents and purposes, I think that they are basically in the same
situation as trigraphs.

Quote:
For example, if I want to include German umlauts (such as or ) in
identifiers, I actually want them to appear in print and on screen as
the proper glyph and not as u1234.

Of course you do. And I believe that this was also the intent of the
original proposal. That the development environment would understand
them, and treat them correctly.

That was also the original intent with regards to trigraphs. For the
moment, the actual support for this seems to be at about the same level
as it is for trigraphs as well.

The standard also says that an implementation may accept additional,
implementation defined characters, which it maps to the correct
universal character name. If you are working in an ISO 8859-1
environment, there is a good chance that if the compiler accepts
universal character names, it will actually handle ISO 8859-1
correctly. This isn't an optimal situation, since if you transfer your
programs to a non ISO 8859-1 area (say the Czeck Republic or Poland),
then 1) they will look decidedly funny when displayed there, and 2), and
2) they may not even compile. (Actually, if all you use are the
umlauts, I don't think that compilation will be a problem, regardless of
which 8859-n codeset is installed locally.)

Quote:
This uglification may be appropriate for the compiler for parsing, but
certainly not for human reading. Is my editor supposed to do the
conversion?

I think that this was the intent.

Quote:
Or is the language implementation supposed to provide a preprocessor
(pre-preprocessor) to convert non-ASCII characters to UCNs?

A compiler is supposed to do so, see §2.1/1. However, the standard says
nothing about how the compiler decides what codeset the source code
actually uses -- and none of the compilers I know actually document
anything about this either. So you're sort of stuck; g++ (3.3.1) seems
to refuse anything but straight ASCII, regardless of the externally set
locale.

Quote:
The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or else
what is the point of allowing UCNs in identifiers?

I'm not sure myself. I would expect compilers to accept files in many
different codesets, but I'm not too sure as to how this should be
handled; the codeset must depend on the file, and may vary between
include files in a single translation unit, which means that most
classical means of specifying this sort of stuff need extending. (A
global command line option is probably not too useful.)

For the moment, support seems to be about nil, and as you seem to have
noticed, universal character names are pretty much unusable today.

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Stefan Heinzmann
Guest





PostPosted: Sun Oct 19, 2003 7:28 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

Hyman Rosen wrote:
Quote:
Stefan Heinzmann wrote:

The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or else
what is the point of allowing UCNs in identifiers?


Read 2.1/1. The translation of physical source file characters
to the source character set is implementation-defined. Source
file characters outside of the basic set are translated to the
universal character name equivalent. (Logically, of course. The
implementation is free to represent things any way it wants.)

Let me quote the mentioned paragraph from my pdf copy of the C++ standard:
"Physical source file characters are mapped, in an
implementation-defined manner, to the basic source
character set (introducing newline characters for end-of-line
indicators) if necessary. Trigraph sequences (2.3) are replaced by
corresponding single-character internal representations. Any source file
character not in the basic source character set (2.2) is replaced by the
universal-character-name that designates that character. (An
implementation may use any internal encoding, so long as an actual
extended character encountered in the source file, and the same extended
character expressed in the source file as a universal-character-name
(i.e. using the uXXXX notation), are handled equivalently.)"

I'm not yet sure I understand this right. Does this mean that when I've
got an identifier with a german umlaut in it:

o The compiler has to map the umlaut to an internal representation that
is the same as the representation it would use if I had written the
umlaut in uXXXX notation.

o The compiler is free how to do the conversion.

o The compiler is free to choose an internal representation

o The compiler is not allowed to ignore or otherwise choke on the
umlaut, as it has to do the mapping (or does the latitude go as far as
allowing the compiler to behave in any silly way it likes when
encountering a character that isn't in the base character set?)

Quote:
So it's up to your compiler vendor to decide what kind of source
file encodings it understands, and up to your editor to decide
how these encodings are displayed. The standard doesn't care, and
chooses to explain how things work in terms of UCNs.

Well, MS Visual Studio 7.1 does display an umlaut in identifiers without
problems, but the compiler chokes on it. If I use theuXXXX notation,
identifiers are accepted by the compiler, but when they're occurring in
a warning message, the uXXXX is replaced by a question mark.

Is there some consensus what a "reasonable" level of support for UCNs
would be for a compiler? Is there an example of a compiler (and/or IDE)
that can serve as a role model?

Cheers
Stefan

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
James Kanze
Guest





PostPosted: Mon Oct 20, 2003 4:46 am    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]rridge (AT) csclub (DOT) uwaterloo.ca[/email] (Ross Ridge) writes:

Quote:
kanze (AT) gabi-soft (DOT) fr> wrote:
In practice, UCN's are about as useful as trigraphs for writing
readable, portable programs. Which is a shame, because they could
be really useful -- C++ has done its part, and both C and Java have
followed, which sounds pretty much like a de facto standard to me.

That makes it anything but a "de facto" standard. "In fact" there
is no standard, because "in fact" on one uses it, and "in fact" no
one who doesn't have to, like C/C++/Java third party tools, supports
it. It's a standard that only exists as words in a document.

I guess it depends on what you are talking about. It is a de facto
standard (among those specifying languages) to standardize international
characters in character names by means of UCN's. It is also a de facto
standard among those using the languages to ignore such, since there is
an apparent de facto standard among tool providers to do as little as
possible to support them.

This is a shame, because all things considered, UCN's aren't a bad
idea. Or wouldn't be, with correct support. (Of course, the same thing
could have been said about trigraphs in their time. Good ideas without
any good support don't get very far.)

Quote:
And C++ took the idea from Java, ignoring the fact existing practice
in Java had already shown UCNs in identifiers to be as useful as
trigraphs for writing readable, portable programs.

I'm not sure. Trigraphs were present, as far as I can remember, in the
very first C++ drafts that I saw (around 1993), before Java appeared.

--
James Kanze mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France +33 1 41 89 80 93

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
James Kanze
Guest





PostPosted: Mon Oct 20, 2003 4:47 am    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]stefan_heinzmann (AT) yahoo (DOT) com[/email] (Stefan Heinzmann) writes:

Quote:
Hyman Rosen wrote:
Stefan Heinzmann wrote:
The standard may not mandate any particular way, but surely
people must have an idea of what kind of support should be
provided, or else what is the point of allowing UCNs in
identifiers?

Read 2.1/1. The translation of physical source file characters to
the source character set is implementation-defined. Source file
characters outside of the basic set are translated to the
universal character name equivalent. (Logically, of course. The
implementation is free to represent things any way it wants.)

Let me quote the mentioned paragraph from my pdf copy of the C++
standard: "Physical source file characters are mapped, in an
implementation-defined manner, to the basic source character set
(introducing newline characters for end-of-line indicators) if
necessary. Trigraph sequences (2.3) are replaced by corresponding
single-character internal representations. Any source file character
not in the basic source character set (2.2) is replaced by the
universal-character-name that designates that character. (An
implementation may use any internal encoding, so long as an actual
extended character encountered in the source file, and the same
extended character expressed in the source file as a
universal-character-name (i.e. using the uXXXX notation), are
handled equivalently.)"

I'm not yet sure I understand this right. Does this mean that when
I've got an identifier with a german umlaut in it:

o The compiler has to map the umlaut to an internal representation
that is the same as the representation it would use if I had written
the umlaut in uXXXX notation.

Yes. Except that it is implementation specific what the input encoding
is. All of the compilers I know use US ASCII -- the US variant of ISO
646. So there is no such thing as a German Umlaut in their input, no
matter what you see on the screen when viewing the file with other
tools.

Quote:
o The compiler is free how to do the conversion.

o The compiler is free to choose an internal representation

o The compiler is not allowed to ignore or otherwise choke on the
umlaut,

If the compiler "sees" an Umlaut in the input, it must treat as
specified. Neither Sun CC nor g++ are capable of seeing Umlauts in the
input.

Just out of curiousity, how do you expect the compiler to choose the
correct encoding for the file. From the local environment, so that it
interprets them as you see them on the screen. If so, how do you handle
the case where the headers for one library are in UTF-8, and those for
another library are in ISO 8859-1? (This is not to support the laziness
on the part of the compiler writers. It is just to point out that the
problem is perhaps not as simple as you think.)

Quote:
as it has to do the mapping (or does the latitude go as far as
allowing the compiler to behave in any silly way it likes when
encountering a character that isn't in the base character set?)

So it's up to your compiler vendor to decide what kind of source
file encodings it understands, and up to your editor to decide how
these encodings are displayed. The standard doesn't care, and
chooses to explain how things work in terms of UCNs.

Well, MS Visual Studio 7.1 does display an umlaut in identifiers
without problems, but the compiler chokes on it. If I use theuXXXX
notation, identifiers are accepted by the compiler, but when they're
occurring in a warning message, the uXXXX is replaced by a question
mark.

Sounds like they're right up their with the Unix compilers:-).

Quote:
Is there some consensus what a "reasonable" level of support for
UCNs would be for a compiler?

Not that I know of, and this might be the reason why compiler authors
are so hesitant to tackle the problem.

Quote:
Is there an example of a compiler (and/or IDE) that can serve as a
role model?

--
James Kanze mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France +33 1 41 89 80 93

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Niklas Matthies
Guest





PostPosted: Mon Oct 20, 2003 4:47 am    Post subject: Re: Universal character names -- I'm still confused Reply with quote

On 2003-10-19 19:24, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Quote:
stefan [email]heinzmann (AT) yahoo (DOT) com[/email] (Stefan Heinzmann) wrote in message
news:<bme4m3$dvi$05$1 (AT) news (DOT) t-online.com>...
[...]
The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or else
what is the point of allowing UCNs in identifiers?

I'm not sure myself. I would expect compilers to accept files in many
different codesets, but I'm not too sure as to how this should be
handled; the codeset must depend on the file, and may vary between
include files in a single translation unit, which means that most
classical means of specifying this sort of stuff need extending. (A
global command line option is probably not too useful.)

For the moment, support seems to be about nil, and as you seem to have
noticed, universal character names are pretty much unusable today.

It should be rather trivial to have editors like Emacs or Vim apply
appropriate conversion filters upon reading and writing of source
files. I have seen this being done for HTML character entities,
which are not very different from UCNs, apart from the concrete
syntax.

Java IDEs also routinely support UTF-8 source files, so it shouldn't
be too difficult for C++ IDEs to do likewise.

And with regard to the the source file inclusion problem, one solution
would be to use Apache-like filename patterns (e.g. "MyHeader.utf8.h").

My impression is that the lack of support in C++ tools today simply
comes from lack of demand from developers.

-- Niklas Matthies

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Mon Oct 20, 2003 10:40 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]usenet (AT) nmhq (DOT) net[/email] (Niklas Matthies) wrote in message
news:<slrnbp6jra.1kp3.usenet (AT) nmhq (DOT) net>...
Quote:
On 2003-10-19 19:24, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
stefan [email]heinzmann (AT) yahoo (DOT) com[/email] (Stefan Heinzmann) wrote in message
news:<bme4m3$dvi$05$1 (AT) news (DOT) t-online.com>...
[...]
The standard may not mandate any particular way, but surely people
must have an idea of what kind of support should be provided, or
else what is the point of allowing UCNs in identifiers?

I'm not sure myself. I would expect compilers to accept files in
many different codesets, but I'm not too sure as to how this should
be handled; the codeset must depend on the file, and may vary
between include files in a single translation unit, which means that
most classical means of specifying this sort of stuff need
extending. (A global command line option is probably not too
useful.)

For the moment, support seems to be about nil, and as you seem to
have noticed, universal character names are pretty much unusable
today.

It should be rather trivial to have editors like Emacs or Vim apply
appropriate conversion filters upon reading and writing of source
files. I have seen this being done for HTML character entities, which
are not very different from UCNs, apart from the concrete syntax.

I don't know about trivial, but it certainly should be possible. At
least, supposing that these editors either work internally with 32 bit
(or at least 21 bit) characters, or support full UTF-8. (From the
little I've scanned over in the emacs manual, it is limited to 19 bit
characters (and it seems to be using some special encoding of its own
internally).

Quote:
Java IDEs also routinely support UTF-8 source files, so it shouldn't
be too difficult for C++ IDEs to do likewise.

Yes, but who has a UTF-8 editor?

Quote:
And with regard to the the source file inclusion problem, one solution
would be to use Apache-like filename patterns
(e.g. "MyHeader.utf8.h").

One solution, perhaps, but certainly not the only.

Quote:
My impression is that the lack of support in C++ tools today simply
comes from lack of demand from developers.

Well, the start of this thread was precisely a poster who wanted it.

Demand is relative. When most of us are still having problems getting
templates to work correctly, such issues may seem secondary.

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Stefan Heinzmann
Guest





PostPosted: Tue Oct 21, 2003 6:16 pm    Post subject: Re: Universal character names -- I'm still confused Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
[...]
Quote:
My impression is that the lack of support in C++ tools today simply
comes from lack of demand from developers.


Well, the start of this thread was precisely a poster who wanted it.

Demand is relative. When most of us are still having problems getting
templates to work correctly, such issues may seem secondary.

I didn't say I wanted it. In fact, I'm still unsure whether I want it or
not; that certainly would depend on the amount of support I can expect
not just from a single compiler vendor, but across the industry. I
wanted to find out what the intentions of the authors of the holy
standard were and whether it was likely that this would come about. From
your answers I'm pessimistic.

Most projects I work with actually keep the source code in English, as a
lingua franca for computer science, because who knows who will read the
code...

But it always seemed like cultural imperialism to me to actually
*require* the use of English through choice (read: restriction) of
character set (and, by the way, also through reserved words). Why should
someone have to learn English before learning to program? (It sure is a
good idea to learn it, but should it be a requirement?)

And, programming in foreign languages aside, why should I not use the
uppercase greek Gamma symbol for the Gamma function? That's what all the
mathematicians do, anyway. We've got Unicode now, and virtually all
reasonable text processing systems support it. Yet in programming we're
still in ASCII times. Odd, isn't it?

There is a definite trend in C++ to use the language features to
implement domain specific languages. See the Spirit library in boost for
an example. Someone said once that library design is language design. Am
I the only one who thinks that restricting source code to the characters
available in plain ASCII is a hindrance for domain specific languages?

Mathematics seem to cope much better with this problem. Their notation
is far less language specific, because more operators and special
symbols are used. If only programming could be like that! Think about
all those extra operator symbols in Unicode...

Cheers
Stefan

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.