C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Universal character name question

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards
View previous topic :: View next topic  
Author Message
Alberto Ganesh Barbati
Guest





PostPosted: Tue Oct 03, 2006 10:09 pm    Post subject: Universal character name question Reply with quote



Hi Everybody,

in §2.3/2 it is said:

"if the universal character name designates a character in the basic
source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

Doesn't this restriction make the use of universal character names
inherently non-portable? As I am not aware of the basic source character
set of every possible platform, whenever I use a universal character I
run the risk of designating some forbidden character on some platform
and so my code would be ill-formed there.

In Annex E there's a list of universal characters names allowed in
identifiers. According to §2.3/2 I can have an identifier named \u00c0
but not \u0041, although the character U+0041 is listed as valid in the
annex. It seems a strange and gratuitous asymmetry to me.

Not that I'm going to use universal characters all over the place... Wink
It's just a curiosity.

Ganesh

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Guest






PostPosted: Wed Oct 04, 2006 4:45 am    Post subject: Re: Universal character name question Reply with quote



Alberto Ganesh Barbati wrote:
Quote:
Hi Everybody,

in §2.3/2 it is said:

"if the universal character name designates a character in the basic
source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

Doesn't this restriction make the use of universal character names
inherently non-portable? As I am not aware of the basic source character
set of every possible platform,

You should be. It's a list of exactly 96 characters, which is by
definition exactly the same on every conforming implementation of C++.
See 2.2p1 for the list. As a result, there's no danger of the
following:

Quote:
... whenever I use a universal character I
run the risk of designating some forbidden character on some platform
and so my code would be ill-formed there.


---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Greg Herlihy
Guest





PostPosted: Wed Oct 04, 2006 4:09 pm    Post subject: Re: Universal character name question Reply with quote



Alberto Ganesh Barbati wrote:
Quote:
Hi Everybody,

in §2.3/2 it is said:

"if the universal character name designates a character in the basic
source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

Probably to keep identifier names canonical. After all, 512 different
ways to write the "universal" as an identifiier seems a needless
complication.

Quote:
Doesn't this restriction make the use of universal character names
inherently non-portable? As I am not aware of the basic source character
set of every possible platform, whenever I use a universal character I
run the risk of designating some forbidden character on some platform
and so my code would be ill-formed there.

No, there is no portability issue because the source character set (and
the set of allowed universal names in identifiers) is the same across
all implementations. So you'll never have a problem porting a source
file due to its use of universal character names.

The question is moot anyway because the contents of a C++ source file
is completely non-portable to start with. The mapping of a source
file's contents to the source character set is
"implementation-defined." So the Standard doesn't offer much assistance
to get a set of ASCII source files to compile with an implementation
that expects characters in EBCDIC.

Quote:
In Annex E there's a list of universal characters names allowed in
identifiers. According to §2.3/2 I can have an identifier named \u00c0
but not \u0041, although the character U+0041 is listed as valid in the
annex. It seems a strange and gratuitous asymmetry to me.

Annex E explicity excludes the ranges u+0041 - u+0051 and u+0061 -
u+007a from the set of universal names that may be used in an
identifier.

Greg


---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
kanze
Guest





PostPosted: Wed Oct 04, 2006 5:21 pm    Post subject: Re: Universal character name question Reply with quote

Alberto Ganesh Barbati wrote:

Quote:
in §2.3/2 it is said:

"if the universal character name designates a character in the
basic source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

I think that the intent is to allow several different interal
encodings. In particular, to allow the implementation to either
keep the extended characters in the \u0055 format (in the
string), or to translate it into whatever character corresponds
in the native environment. In the first case, the string
"\u0055" doesn't compare equal to "U".

For typical systems, which actually support an extended native
character set, this freedom doesn't buy much, since they already
have to handle the case that "é" and "\u00e5" must compare
equal. The obvious solution is to maintain symbol names as 32
bit integers, with the universal character names replaced
internally by a single Unicode character. But the standard
doesn't mandate this, and on an implementation with very limited
resources, it would be quite reasonable (and legal) to only
accept universal character names for extended characters, and
store the symbol names exactly as they were read. Allowing code
to use "\u0055niversal" instead of "Universal" would require
special handling in such implementations.

Quote:
Doesn't this restriction make the use of universal character
names inherently non-portable?

In some contexts, they are. If you write
std::cout << "\u00E5t\u00E5\n" ;
and output it to a device which only supports the 128 basic
ASCII characters, you're going to loose something.

Quote:
As I am not aware of the basic source character set of every
possible platform,

It's defined in the standard. See §2.2.

Quote:
whenever I use a universal character I run the risk of
designating some forbidden character on some platform and so
my code would be ill-formed there.

Less so than if you write a file using CRLF as line separators,
on Windows, and try to compile it with some implementations
under Unix.

Quote:
In Annex E there's a list of universal characters names allowed in
identifiers. According to §2.3/2 I can have an identifier named \u00c0
but not \u0041, although the character U+0041 is listed as valid in the
annex. It seems a strange and gratuitous asymmetry to me.

U+0041 is not listed as a valid univeral character name in
Appendix E of the current draft.

Any of the 96 characters in the basic source character set
defined in §2.2 must appear literally. In portable code, any
other character (including @ or $) must appear as a universal
character name. Implementations are allowed (and even
encouranged, if I'm reading between the lines in §2.1/1
correctly) to support any other characters in the native
character set, but they must do so in a manner which makes them
indistinguishable from their universal character name
equivalent. And if you copy the file to a system where the
character in question isn't in the native character set, then
there's no guarantee that your code will compile.

The ideal solution, of course, is that the set of tools
associated with the compiler (e.g. editor, etc.) permit entry of
and display the characters as best it can, given the resources
available to it, but that it actually store them in the files as
universal character names. But I don't know of any system which
does this, and it sort of goes against the basic philosophy of
Unix, in which text is text, and any program which handles text
can handle just about any text file.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Alberto Ganesh Barbati
Guest





PostPosted: Wed Oct 04, 2006 9:22 pm    Post subject: Re: Universal character name question Reply with quote

kuyper (AT) wizard (DOT) net ha scritto:
Quote:
Alberto Ganesh Barbati wrote:
Hi Everybody,

in §2.3/2 it is said:

"if the universal character name designates a character in the basic
source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

Doesn't this restriction make the use of universal character names
inherently non-portable? As I am not aware of the basic source character
set of every possible platform,

You should be. It's a list of exactly 96 characters, which is by
definition exactly the same on every conforming implementation of C++.
See 2.2p1 for the list. As a result, there's no danger of the
following:

Ah, yes, sure. I had read 2.2p1 but I missed footnote 15 and probably
also got confused by the many character sets... Thanks.

However, this doesn't answer my original question: why is the
restriction there in the first place? Why can't I write \u0041 in my
source code?

Ganesh

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Guest






PostPosted: Mon Oct 09, 2006 3:55 pm    Post subject: Re: Universal character name question Reply with quote

Alberto Ganesh Barbati wrote:
Quote:
"if the universal character name designates a character in the basic
source character set, then the program is ill-formed."

I wonder what is the rationale for this restriction. I mean, what's
wrong in writing "\u0055niversal" instead of "Universal" (except
obfuscation, of course)?

What about "\\u00750055niversal" ? "\u005c0055niversal"?
Under the current rules we don't have to consider escaped escape
characters.

HTH,
Michiel Salters

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Alberto Ganesh Barbati
Guest





PostPosted: Mon Oct 09, 2006 11:01 pm    Post subject: Re: Universal character name question Reply with quote

Michiel.Salters (AT) tomtom (DOT) com ha scritto:
Quote:

What about "\\u00750055niversal" ? "\u005c0055niversal"?
Under the current rules we don't have to consider escaped escape
characters.

The second string is surely ill-formed because \u005c is in the basic
source character set. About the first one, I'm not sure... if I read the
standard correctly, conversion to the execution character set of escape
sequences and universal-character-names in string literals happens
simultaneously in translation phase 5. So the resulting literal should
be a backslash + "00750055niversal".

Ganesh

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
kanze
Guest





PostPosted: Tue Oct 10, 2006 3:31 pm    Post subject: Re: Universal character name question Reply with quote

Alberto Ganesh Barbati wrote:
Quote:
Michiel.Salters (AT) tomtom (DOT) com ha scritto:

What about "\\u00750055niversal" ? "\u005c0055niversal"?
Under the current rules we don't have to consider escaped escape
characters.

The second string is surely ill-formed because \u005c is in
the basic source character set.

But he gave the example as to why this should be the case.
"\u005c" is a backslash. If you allowed universal character
names for characters in the basic character set, you'd have to
consider this string as equal to "Universal".

Quote:
About the first one, I'm not sure... if I read the standard
correctly, conversion to the execution character set of escape
sequences and universal-character-names in string literals
happens simultaneously in translation phase 5. So the
resulting literal should be a backslash + "00750055niversal".

I think you're right there.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]
Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.