C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Splicing/Concatenation and Undefined Behavior

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards
View previous topic :: View next topic  
Author Message
Greg Hickman
Guest





PostPosted: Fri Jul 02, 2004 5:40 am    Post subject: Splicing/Concatenation and Undefined Behavior Reply with quote



Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
sequence that matches the syntax of a universal character name results from
splicing physical source lines or token concatenation? Does this mean it's
possible to construct a well-formed program that unwittingly contains such
undefined behavior? If so, what might it look like and what can we do to
prevent it?

Thanks,
Greg



---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

Back to top
James Kuyper
Guest





PostPosted: Fri Jul 02, 2004 7:47 pm    Post subject: Re: Splicing/Concatenation and Undefined Behavior Reply with quote



Greg Hickman <greg.hickman (AT) lmco (DOT) com> wrote

Quote:
Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
sequence that matches the syntax of a universal character name results from
splicing physical source lines or token concatenation? Does this mean it's
possible to construct a well-formed program that unwittingly contains such
undefined behavior? If so, what might it look like and what can we do to
prevent it?

// Splice occurs as described in 2.1p2:
int helloU03
88 = 0;

// Splice occurs as described in 2.1p4:
#define STR(a,b) a##b
#define STRING(a,b) STR(a,b)
int STRING(worldU03,89) = 1;

Note: in contrast, the following code would be no problem:

int helloU0388 = 0;
int worldU0399 = 1;

To avoid the problem, just be careful about using string catenation,
and escaped new-lines.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Greg Hickman
Guest





PostPosted: Fri Jul 02, 2004 8:46 pm    Post subject: Re: Splicing/Concatenation and Undefined Behavior Reply with quote




"James Kuyper" <kuyper (AT) wizard (DOT) net> wrote

Quote:

// Splice occurs as described in 2.1p2:
int helloU03
88 = 0;

// Splice occurs as described in 2.1p4:
#define STR(a,b) a##b
#define STRING(a,b) STR(a,b)
int STRING(worldU03,89) = 1;

Note: in contrast, the following code would be no problem:

int helloU0388 = 0;
int worldU0399 = 1;

To avoid the problem, just be careful about using string catenation,
and escaped new-lines.

I thought these might be the kinds of scenarios being described in the
standard, but wanted to be sure. It isn't readily apparent to me why they
can lead to undefined behavior.

Thanks,
Greg



---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
James Kuyper
Guest





PostPosted: Sun Jul 04, 2004 7:22 am    Post subject: Re: Splicing/Concatenation and Undefined Behavior Reply with quote

[email]greg.hickman (AT) lmco (DOT) com[/email] (Greg Hickman) wrote in message news:<cc4enu$lcq3 (AT) cui1 (DOT) lmms.lmco.com>...
Quote:
"James Kuyper" <kuyper (AT) wizard (DOT) net> wrote in message
news:8b42afac.0407020400.300d52d8 (AT) posting (DOT) google.com...

// Splice occurs as described in 2.1p2:
int helloU03
88 = 0;

// Splice occurs as described in 2.1p4:
#define STR(a,b) a##b
#define STRING(a,b) STR(a,b)
int STRING(worldU03,89) = 1;

Note: in contrast, the following code would be no problem:

int helloU0388 = 0;
int worldU0399 = 1;

To avoid the problem, just be careful about using string
catenation,
and escaped new-lines.



I thought these might be the kinds of scenarios being described in the
standard, but wanted to be sure. It isn't readily apparent to me why they
can lead to undefined behavior.


Giving such splices undefined behavior, relieves the implementation of
any responsibility for checking the results of splices for whether or
not they contain UCNs. On an implementation which takes advantage of
that fact, the most likely form of undefined behavior would be a
failure to notice that the spliced-together identifier should have
been identified as a match with one that never needed splicing. Thus,
it wouldn't recognize the two following lines as containing the same
identifier:

int helloU0388;
helloU03
88 = 1;

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Ben Hutchings
Guest





PostPosted: Mon Jul 05, 2004 10:03 pm    Post subject: Re: Splicing/Concatenation and Undefined Behavior Reply with quote

Greg Hickman wrote:
Quote:
Why do 2.1(2) and 2.1(4) say that undefined behavior occurs if a character
sequence that matches the syntax of a universal character name results from
splicing physical source lines or token concatenation?

The justification I see is that this allows implementations to
interpret universal-character-names at whatever stage is most
convenient. This may not be the actual reason for the decision.

Quote:
Does this mean it's possible to construct a well-formed program that
unwittingly contains such undefined behavior? If so, what might it
look like and what can we do to prevent it?

It is possible to contrive a program with undefined behaviour due to
2.1(2):

int u
0100 = 0;

As for 2.1(4), I must admit I have found it useful to construct UCNs
by token concatenation in a macro in order to work with an
implementation that does not support UCNs (VC++ 6), for which the
macro was defined differently. If you only need to support more
standard compilers then I do not see that this would be necessary.

I think it should only take a little self-disicpline to ensure that
whenever you type u you also type a full 4 digits after it. A
search through your source files with the (Perl-style) regexp:

(^|[^\])(\\)*\(\nu|u.{0,3}[^0-9A-Fa-f])

should reveal any deviation from this rule, unless you use trigraphs,
in which case you would need a slightly more complex but rather longer
regex.

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language, library and standards All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.