 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Matthias Hofmann Guest
|
Posted: Sun Apr 17, 2005 6:01 pm Post subject: Confusion about toupper() |
|
|
Hello everybody!
I am a little confused about toupper(). The way I interpret the standard,
including the header <cctype> should make toupper() available within
namespace std, and only within namespace std. As an example, take a look at
the following code:
#include <vector>
#include <algorithm>
#include <functional>
#include <cctype>
#include <iostream>
int main()
{
std::vector<char> v;
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
std::transform( v.begin(), v.end(),
std::ostream_iterator
std::ptr_fun( std::toupper ) ); // Using std::toupper()
std::cout << std::endl;
return 0;
}
The use of toupper() as a member of namespace works fine. What surprises me,
however, is the fact that the following is also accepted by my compiler (MS
Visual Toolkit 2003):
std::transform( v.begin(), v.end(),
std::ostream_iterator
std::ptr_fun( toupper ) ); // Note: using toupper() from global
scope
std::transform( v.begin(), v.end(),
std::ostream_iterator<char>( std::cout, " " ),
std::ptr_fun( ::toupper ) ); // Note: also using global scope
This way of using toupper() as a member of the global scope shouldn't work,
should it? I ran into this phenomenon while studying a textbook on the STL.
The author uses the "::toupper" form in a similar example. I thought the
reason might be that in C++, there is a version of toupper() that takes a
locale as a second argument. Maybe the creators of the standard made the
original (single argument) version available at global scope for that case.
But if that were true, the use of std::toupper in the above example should
create ambiguities - I am surprised it doesn't! Can anyone please explain
that to me? How come that toupper() is available at global scope? Is there
any reason to explicitly use the scope resolution operator (: , as in my
textbook? The standard does not seem to say anything about that, neither
does it give any reason for making toupper() available at global scope!
--
Matthias Hofmann
Anvil-Soft, CEO
http://www.anvil-soft.com - The Creators of Klomanager
http://www.anvil-soft.de - Die Macher des Klomanagers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Pete Becker Guest
|
Posted: Sun Apr 17, 2005 9:19 pm Post subject: Re: Confusion about toupper() |
|
|
Matthias Hofmann wrote:
| Quote: | I am a little confused about toupper(). The way I interpret the standard,
including the header <cctype> should make toupper() available within
namespace std, and only within namespace std.
|
That's what the standard says. It's often not reasonable to implement.
There's an active defect report that's designed to relax this
requirement to match common behavior.
--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ray Lischner Guest
|
Posted: Sun Apr 17, 2005 9:21 pm Post subject: Re: Confusion about toupper() |
|
|
On Sunday 17 April 2005 02:01 pm, Matthias Hofmann wrote:
| Quote: | ...How come that toupper() is available at
global scope?
|
You are correct that <cctype> declares std::toupper. However, all the C
external names are also reserved to the implementation in the global
scope. Thus, ::toupper is reserved. Whether a library implementation
takes advantage of that reservation to make std::toupper the same
as ::toupper is strictly an implementation decision.
| Quote: | Is there any reason to explicitly use the scope
resolution operator (: , as in my textbook?
|
No. Use std::toupper.
| Quote: | The standard does not
seem to say anything about that, neither does it give any reason for
making toupper() available at global scope!
|
See section 17.4.3.1.3 External linkage, paragraph 4.
By the way, C++ does have a version of toupper that takes a locale as a
second argument. It is declared in <locale>.
--
Ray Lischner, author of C++ in a Nutshell
http://www.tempest-sw.com/cpp
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Mon Apr 18, 2005 9:06 am Post subject: Re: Confusion about toupper() |
|
|
Matthias Hofmann wrote:
| Quote: | I am a little confused about toupper(). The way I interpret
the standard, including the header <cctype> should make
toupper() available within namespace std, and only within
namespace std.
|
That's what the standard says. I don't know of any
implementation which conforms to the standard on this point.
The standard also says that the toupper in <cctype> may be
extern "C". I'm not sure how this interacts with the fact that
it may only be visible in std::.
| Quote: | As an example, take a look at the following code:
#include <vector
#include
#include
#include
#include
int main()
{
std::vector
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
std::transform( v.begin(), v.end(),
std::ostream_iterator
std::ptr_fun( std::toupper ) ); // Using std::toupper()
|
Which contains undefined behavior, or at least depends on
unspecified behavior, see below.
| Quote: | std::cout << std::endl;
return 0;
}
The use of toupper() as a member of namespace works fine. What
surprises me, however, is the fact that the following is also
accepted by my compiler (MS Visual Toolkit 2003):
std::transform( v.begin(), v.end(),
std::ostream_iterator
std::ptr_fun( toupper ) ); // Note: using toupper() from
global
scope
std::transform( v.begin(), v.end(),
std::ostream_iterator<char>( std::cout, " " ),
std::ptr_fun( ::toupper ) ); // Note: also using global scope
This way of using toupper() as a member of the global scope
shouldn't work, should it?
|
Not according to the standard. I don't know of an
implementation where it fails, however.
| Quote: | I ran into this phenomenon while studying a textbook on the
STL. The author uses the "::toupper" form in a similar
example. I thought the reason might be that in C++, there is a
version of toupper() that takes a locale as a second argument.
Maybe the creators of the standard made the original (single
argument) version available at global scope for that case.
But if that were true, the use of std::toupper in the above
example should create ambiguities - I am surprised it doesn't!
|
The two parameter version of toupper is declared in <locale>.
You don't include <locale> explicitely. The standard gives the
library implementation explicit permission to include it in any
other non-C header, however. The result is that the declaration
might or might not be present -- rather than requiring a
diagnostic if you attempt to use the function in this case.
The result is that your code above might compile with one
implementation, and fail to compile with another. I think,
formally, it is undefined behavior, although about the only two
possible behavoirs in practice are that it compiles and works,
or that it fails to compile.
| Quote: | Can anyone please explain that to me? How come that toupper()
is available at global scope?
|
Because the standard specified something that is extremely
difficult to implement, with little real benefit, and the
implementors have been busier implementing more useful things.
| Quote: | Is there any reason to explicitly use the scope resolution
operator (: , as in my textbook?
|
To exclude any chance of picking up one of the two operand
forms. It actually violates the standard (unless you include
<ctype.h>, rather than <cctype>), but it works in practice.
| Quote: | The standard does not seem to say anything about that, neither
|
There's what the standard says, and there's what implementations
actually do. Which are, regrettably, two very different things.
On another tack: you do know that toupper only works for
standard US ASCII. There is not necessarily a one to one
mapping in other locales. And if you are only concerned with US
ASCII, it would probably be better to use the two parameter
version, with std::locale( "C" ) as the second parameter. Just
to be sure you don't get anything wierd. (The single parameter
version uses the global C locale. Which could be anything.)
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Thomas Maeder Guest
|
Posted: Tue Apr 19, 2005 10:03 pm Post subject: Re: Confusion about toupper() |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] writes:
| Quote: | std::vector<char> v;
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
|
[snip]
| Quote: | On another tack: you do know that toupper only works for
standard US ASCII. There is not necessarily a one to one
mapping in other locales. And if you are only concerned with US
ASCII, it would probably be better to use the two parameter
version, with std::locale( "C" ) as the second parameter. Just
to be sure you don't get anything wierd. (The single parameter
version uses the global C locale. Which could be anything.)
|
The intention very probably is to initialize v with the 26 letters of
the Latin alphabet (as used in English). For these characters, the
one-argument version of toupper() works nicely, doens't it?
OTOH, using a for loop like the one above to initialize v this way
relies on these characters being contiguously encoded, which isn't
guaranteed.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Matthias Hofmann Guest
|
Posted: Tue Apr 19, 2005 10:05 pm Post subject: Re: Confusion about toupper() |
|
|
<kanze (AT) gabi-soft (DOT) fr> schrieb im Newsbeitrag
news:1113810242.300502.5310 (AT) g14g2000cwa (DOT) googlegroups.com...
| Quote: | The standard also says that the toupper in <cctype> may be
extern "C". I'm not sure how this interacts with the fact that
it may only be visible in std::.
|
May this is for the case when the user includes <ctype.h> instead of
<cctype>.
| Quote: | The result is that your code above might compile with one
implementation, and fail to compile with another. I think,
formally, it is undefined behavior, although about the only two
possible behavoirs in practice are that it compiles and works,
or that it fails to compile.
|
The reason it is undefined (or at least unspecified) behaviour is that the
compiler has to chose between the single parameter version and the one that
takes a locale, isn't it?
| Quote: | Because the standard specified something that is extremely
difficult to implement, with little real benefit, and the
implementors have been busier implementing more useful things.
|
I never implemented a compiler in my life - what is so difficult about
making toupper available in namespace std only?
| Quote: | To exclude any chance of picking up one of the two operand
forms. It actually violates the standard (unless you include
ctype.h>, rather than <cctype>), but it works in practice.
|
I thought there was only a single two operand form?
--
Matthias Hofmann
Anvil-Soft, CEO
http://www.anvil-soft.com - The Creators of Klomanager
http://www.anvil-soft.de - Die Macher des Klomanagers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
P.J. Plauger Guest
|
Posted: Wed Apr 20, 2005 12:13 pm Post subject: Re: Confusion about toupper() |
|
|
"Matthias Hofmann" <hofmann (AT) anvil-soft (DOT) com> wrote
| Quote: | kanze (AT) gabi-soft (DOT) fr> schrieb im Newsbeitrag
news:1113810242.300502.5310 (AT) g14g2000cwa (DOT) googlegroups.com...
The standard also says that the toupper in <cctype> may be
extern "C". I'm not sure how this interacts with the fact that
it may only be visible in std::.
May this is for the case when the user includes <ctype.h> instead of
cctype>.
The result is that your code above might compile with one
implementation, and fail to compile with another. I think,
formally, it is undefined behavior, although about the only two
possible behavoirs in practice are that it compiles and works,
or that it fails to compile.
The reason it is undefined (or at least unspecified) behaviour is that
the
compiler has to chose between the single parameter version and the one
that
takes a locale, isn't it?
Because the standard specified something that is extremely
difficult to implement, with little real benefit, and the
implementors have been busier implementing more useful things.
I never implemented a compiler in my life - what is so difficult about
making toupper available in namespace std only?
|
Interestingly enough, this extremely difficult requirement
was demanded by people who had never implemented a compiler
in their lives. They refused to listen to those of us who
have done so for a living.
P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Allan W Guest
|
Posted: Wed Apr 20, 2005 3:52 pm Post subject: Re: Confusion about toupper() |
|
|
Matthias Hofmann wrote:
| Quote: | I never implemented a compiler in my life - what is so difficult
about
making toupper available in namespace std only?
|
I never implemented a compiler either. But let's pretend that we are
writing -- no, not the main compiler. Let's say that someone else has
created Whiz-C 2005 and has hired us to write the standard libraries.
To keep the discussion simple, I'll use simple one-letter variable
names rather than names reserved to the library -- in practice we would
never name a parameter "t" because even though it would be an unusual
name for a macro, it IS legal.
Remember that the library to go with a compiler is the one place where
it's perfectly okay to write nonportable code. The INTERFACE is
standard,
not the implementation. So if Whiz-C 2005 runs only on ASCII machines,
then we can rely on the fact that letters are contiguous, and write:
namespace std {
inline char toupper(char t) { return char(t-'a'+'A'); }
}
It's easy to see what's wrong with this -- if t is already uppercase,
we're going to get something that isn't a letter to begin with, and
if it's not a letter then we're going to get complete gibberish. So
let's refine it to version 2:
namespace std {
inline char toupper(char t)
{ return (t>='a' && t<='z') ? char(t-'a'+'A') : t; }
}
This is pretty close. One problem is that the first expression in the
conditional operator is more complicated (slow) than it needs to be. If
we really were writing the entire C++ library, rather than just
toupper(), we would have gone out of our way to make isupper() work
more quickly than that -- we'd probably have a block of 256 bitflags,
and isupper() would simply check for the correct bit. Let's rely on
that for version 3:
namespace std {
inline char toupper(char t)
{ return islower(t) ? char(t-'a'+'A') : t; }
}
Now we're there, right? No. The Whiz-C people aren't happy with this
version because it doesn't work correctly in C mode, just C++ mode. The
biggest problem, of course, is that "inline" is a C++ keyword.
We can't get around this by making toupper() a macro -- that isn't
legal
for C libraries (unless this changed recently, I'm not sure). It's
tempting to use conditional compilation -- use inline for C++, and
out-of-line for C:
// cctype
namespace std {
inline char toupper(char t) { return islower(t) ? char(t-'a'+'A') :
t; }
}
// ctype.h
char toupper(char t);
// ctype.c
// Note: Used only with C. (The C++ version is inline)
char toupper(char t) { return islower(t) ? char(t-'a'+'A') : t; }
But this is no good because some programs combine C modules with C++
modules. Surely &toupper should return the same address in either case!
So we bite the bullet and make a single out-of-line implementation of
toupper.
Here's where the real problems start. Clearly we use extern "C"
linkage,
so that we can call it from C modules. (Which raises the question: How
do we call the overloaded version of toupper?)
But now we finally get to the question that started this all -- what is
so difficult about making toupper available in namespace std only?
* If a C module uses #include
available in the global namespace (remember that C doesn't have
namespaces).
* If a C++ module uses #include <ctype.h>, it should make toupper
available in the global namespace, and possibly in namespace std
as well.
* If a C++ module uses #include <cctype>, it should make toupper()
available in namespace std only.
To make the same function available in two different namespaces, all
we need to do is define it in one, and then use a using-declaration
to bring the name into the other. But we can only define it once.
Which namespace do we define it in? No matter which one you pick,
you're going to get criticism.
For C programs to link correctly, you're going to have to put the
function in the global namespace... Ultimately this is what the
Whiz-C people are going to insist on anyway...
THAT's what is so difficult about making toupper available in
namespace std only.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Peter C. Chapin Guest
|
Posted: Wed Apr 20, 2005 4:47 pm Post subject: Re: Confusion about toupper() |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote in news:1113810242.300502.5310
@g14g2000cwa.googlegroups.com:
| Quote: | That's what the standard says. I don't know of any
implementation which conforms to the standard on this point.
|
Open Watcom v1.3 does. When the cname style headers are included, the
facilities of the C library are *only* introduced into namespace std.
For example:
#include <cctype>
int main()
{
int x = toupper('a');
return 0;
}
Causes the compiler to say:
check.cpp(6): Error! E029: col(11) symbol 'toupper' has not been
declared
Using std::toupper('a') fixes the error.
Open Watcom declares the C library names in namespace std in the cname
style headers and then, when a C++ program includes the name.h style
header, those names are hoisted into the global namespace with using
declarations. I believe this follows the standard strictly.
| Quote: | The standard also says that the toupper in <cctype> may be
extern "C". I'm not sure how this interacts with the fact that
it may only be visible in std::
|
You can have extern "C" names in a namespace. There is text about this
in the standard someplace... I believe in the section where extern "C"
is discussed.
Peter
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Matthias Hofmann Guest
|
Posted: Wed Apr 20, 2005 4:55 pm Post subject: Re: Confusion about toupper() |
|
|
"Thomas Maeder" <maeder (AT) glue (DOT) ch> schrieb im Newsbeitrag
news:m2mzrw83au.fsf (AT) madbox2 (DOT) local...
| Quote: | kanze (AT) gabi-soft (DOT) fr writes:
std::vector<char> v;
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
|
[snip]
| Quote: | OTOH, using a for loop like the one above to initialize v this way
relies on these characters being contiguously encoded, which isn't
guaranteed.
|
This means that one has to code as follows:
v.push_back( 'a' );
v.push_back( 'b' );
v.push_back( 'c' );
v.push_back( 'z' );
Then you can iterate over v in order to get all the letters from the latin
alphabet on the usual order. Another solution would be to have an array and
insert it into a vector:
const char letters[26] = { 'a', 'b', ..., 'z' };
for ( const char* p = letters; p != letters + sizeof (letters) / sizeof
letters[0]; ++p )
v.push_back( *p );
However, I don't know if that is any less cumbersome...
--
Matthias Hofmann
Anvil-Soft, CEO
http://www.anvil-soft.com - The Creators of Klomanager
http://www.anvil-soft.de - Die Macher des Klomanagers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Wed Apr 20, 2005 4:56 pm Post subject: Re: Confusion about toupper() |
|
|
Thomas Maeder wrote:
| Quote: | kanze (AT) gabi-soft (DOT) fr writes:
std::vector<char> v;
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
[snip]
On another tack: you do know that toupper only works for
standard US ASCII. There is not necessarily a one to one
mapping in other locales. And if you are only concerned
with US ASCII, it would probably be better to use the two
parameter version, with std::locale( "C" ) as the second
parameter. Just to be sure you don't get anything wierd.
(The single parameter version uses the global C locale.
Which could be anything.)
The intention very probably is to initialize v with the 26
letters of the Latin alphabet (as used in English). For these
characters, the one-argument version of toupper() works
nicely, doens't it?
|
Do you think so? What about toupper( 'i' ) if the global locale
is set to tr_TR.iso_8859_9? (Supposing that 'i' is encoded
0x69 in the basic execution character set, I would expect
toupper to return 0xDD in this locale.)
The C standard (included by reference in the C++ standard, and
referred to for the definition of the contents of
<ctype.h>) states explicitly:
If the argument is a character for which islower is true and
there are one or more corresponding characters, as specified
by the current locale, for which isupper is true, the
toupper function returns one of the corresponding characters
(always the same one for any given locale); otherwise, the
argument is returned unchanged.
I think there is some ambiguity as to whether islower( 'i' ) is
guaranteed to return non-zero, regardless of the locale (e.g.
whether an EBCDIC locale would be legal on a machine where the
"C" locale used ASCII), but otherwise, all of the functions in
<cctype> except isdigit and isxdigit are explicitly defined as
locale dependant.
In quicky programs which you write for your own use, it may be
acceptable to ignore this problem -- I know that I don't even
have the tr_TR.iso_8859_9 locale, nor the necessary fonts to
support it, installed on my machine. But if the code is to be
used in larger applications, or even more so if it is part of a
library, you must wrap the code with something like:
char const* save = setlocale( LC_CTYPE, NULL ) ;
setlocale( LC_CTYPE, "C" ) ;
// code using <cctype>
setlocale( LC_CTYPE, save ) ;
In a multithreaded environment, of course, you also need a lock
around the entire block. In a multithreaded environment, I
would definitly prefer the two parameter versions in <locale>.
(Somewhere in one of my libraries, I have a predicate class
CTypeIs, whose constructor takes an std::ctype_base::mask and an
std::locale const&, with the second parameter defaulting to
std::locale(). IMHO, that is probably the only viable
solution if you're stuck with older compilers which don't
support Boost.)
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Llewelly Guest
|
Posted: Wed Apr 20, 2005 5:19 pm Post subject: Re: Confusion about toupper() |
|
|
Pete Becker <petebecker (AT) acm (DOT) org> writes:
| Quote: | Matthias Hofmann wrote:
I am a little confused about toupper(). The way I interpret the standard,
including the header <cctype> should make toupper() available within
namespace std, and only within namespace std.
That's what the standard says. It's often not reasonable to implement.
There's an active defect report that's designed to relax this
requirement to match common behavior.
[snip] |
What about 17.4.3.1.3/5:
# Each function signature from the Standard C library declared
# with external linkage is reserved to the implementation for use
# as a function signature with both extern "C" and extern "C++"
# linkage,168) or as a name of namespace scope in the global
# namespace.
In the past I've interpreted this as granting library implementators
permission provide a function definition for names such as
::toupper(), regardless of whether <cxxx> or <xxx.h> was included
at all. Yet I can't imagine someone opening issue 456 if that was
the case. So I don't understand it at all, it seems.
What does 17.4.3.1.3/5 mean?
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
|
|
| Back to top |
|
 |
Peter C. Chapin Guest
|
Posted: Thu Apr 21, 2005 8:47 am Post subject: Re: Confusion about toupper() |
|
|
"Allan W" <allan_w (AT) my-dejanews (DOT) com> wrote in
news:1113956438.656074.11940 (AT) g14g2000cwa (DOT) googlegroups.com:
| Quote: | It's
tempting to use conditional compilation -- use inline for C++, and
out-of-line for C:
|
[snip]
| Quote: | But this is no good because some programs combine C modules with C++
modules. Surely &toupper should return the same address in either
case!
|
Does that imply that the C++ library is not allowed to provide any
functions as inline functions? Or does the standard require that
separate occurrences of an inline function appearing in different
translation units be merged into a single occurrence in the executable
file?
| Quote: | Here's where the real problems start. Clearly we use extern "C"
linkage,
so that we can call it from C modules. (Which raises the question: How
do we call the overloaded version of toupper?)
|
It is permissible for one function in an overloaded set to be extern
"C". For example:
extern "C" void f(int);
void f(char *);
void f(float);
| Quote: | For C programs to link correctly, you're going to have to put the
function in the global namespace.
|
Why?
namespace std {
extern "C" void f();
}
In C++ function f() is callable as std::f, but in C it can be called as
simply f (of course it needs to be declared differently in C).
See section 7.3.4/4 and especially 7.5/6.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Thu Apr 21, 2005 12:10 pm Post subject: Re: Confusion about toupper() |
|
|
Allan W wrote:
| Quote: | Matthias Hofmann wrote:
I never implemented a compiler in my life - what is so
difficult about making toupper available in namespace std
only?
|
The problem is, obviously, one for the library authors, not for
the compiler authors:-). (Unless the library authors start
insisting on some special compiler magic.)
Secondly, if you're writing everything from scratch, don't care
about existing C compilers, existing C implementations, sharing
the implementation with any C compiler, etc., it's not
difficult. Practically, of course, those conditions never hold.
| Quote: | I never implemented a compiler either. But let's pretend that
we are writing -- no, not the main compiler. Let's say that
someone else has created Whiz-C 2005 and has hired us to write
the standard libraries.
To keep the discussion simple, I'll use simple one-letter
variable names rather than names reserved to the library -- in
practice we would never name a parameter "t" because even
though it would be an unusual name for a macro, it IS legal.
|
I'm not too sure how serious your discussion is, or rather, what
parts are just there to form the basis of a discussion, and what
parts you think are seriously relevant to how one might actually
write the library, so some of my comments probably address
issues you're already familiar with.
I think the most important issue, that you seem to have
neglected, is that usually, the C compiler was there first. And
that you have to interact with its implementation, if only
because the exact behavior of these functions depends on the
current setting of the C locale. You can't simple implement
them from scratch, ignoring the existing C.
| Quote: | Remember that the library to go with a compiler is the one
place where it's perfectly okay to write nonportable code. The
INTERFACE is standard, not the implementation. So if Whiz-C
2005 runs only on ASCII machines, then we can rely on the fact
that letters are contiguous, and write:
namespace std {
inline char toupper(char t) { return char(t-'a'+'A'); }
}
It's easy to see what's wrong with this -- if t is already
uppercase, we're going to get something that isn't a letter to
begin with, and if it's not a letter then we're going to get
complete gibberish. So let's refine it to version 2:
namespace std {
inline char toupper(char t)
{ return (t>='a' && t<='z') ? char(t-'a'+'A') : t; }
}
This is pretty close.
|
Not really. The standard is very clear: the results depend on
the current C locale.
The usual solution (I think -- it's been ages since I
implemented a standard C library) is to use an array; changing
the locale loads a new array from the disk. So the actual code
would look like:
int toupper( int t )
{
return currentToUpper[ t + 1 ] ;
}
where currentToUpper is a pointer to the array.
Note that I've corrected one or two other errors: the correct
parameter type is int, not char, and the valid range is
0...UCHAR_MAX or EOF. The "classic" solution for handling this
is to define EOF as -1, to add one to the parameter, and to use
an array of UCHAR_MAX + 2 entries.
(But as I said, it's been ages since I last implemented this
sort of stuff. We didn't have wchar_t back then. I'm not sure
that the array solution is viable for a 32 bit wchar_t, and I
can imagine that you might want to use similar code to what you
use in towlower.)
| Quote: | One problem is that the first expression in the conditional
operator is more complicated (slow) than it needs to be. If we
really were writing the entire C++ library, rather than just
toupper(), we would have gone out of our way to make isupper()
work more quickly than that -- we'd probably have a block of
256 bitflags, and isupper() would simply check for the correct
bit.
|
If you look carefully at the set of isxxx functions, you'll find
that they can be implemented with 8 flags. Rather convenient,
that -- one per bit. So a typical implementation of isalpha
becomes :
int isalpha( int ch )
{
return currentCType[ ch + 1 ] & (maskLower | maskUpper) ;
}
(Note that this implementation is *required* for
std::ctype
| Quote: | Let's rely on that for version 3:
namespace std {
inline char toupper(char t)
{ return islower(t) ? char(t-'a'+'A') : t; }
}
Now we're there, right? No. The Whiz-C people aren't happy
with this version because it doesn't work correctly in C mode,
just C++ mode. The biggest problem, of course, is that
"inline" is a C++ keyword.
|
It's also a C keyword now. The biggest problem is that the code
is in a namespace, which doesn't work in C, and that the code
doesn't take locales into effect.
Without the locales, the code is almost simple enough that you
could tolerate separate implementations for C and for C++
(although I imagine that a vendor who had to actually maintain
two different versions of exactly the same code might not agree
here). But in fact, the behavior of the code must depend on the
last call(s) to setlocale.
| Quote: | We can't get around this by making toupper() a macro -- that
isn't legal for C libraries (unless this changed recently, I'm
not sure).
|
Both C and C++ require that the actual function exists, and be
declared in <ctype.h>. C allows the header to then define a
function style macro which hides the function; you can still
access the function if the name isn't immediately followed by a
'(' (say when taking its address, or by writing (isupper)(ch)),
or by using #undef. This is not allowed in a C++ header.
| Quote: | It's tempting to use conditional compilation -- use inline for
C++, and out-of-line for C:
// cctype
namespace std {
inline char toupper(char t) { return islower(t) ? char(t-'a'+'A')
:
t; }
}
// ctype.h
char toupper(char t);
// ctype.c
// Note: Used only with C. (The C++ version is inline)
char toupper(char t) { return islower(t) ? char(t-'a'+'A') : t; }
But this is no good because some programs combine C modules
with C++ modules. Surely &toupper should return the same
address in either case!
|
Why? That's not required.
| Quote: | So we bite the bullet and make a single out-of-line
implementation of toupper.
Here's where the real problems start. Clearly we use extern
"C" linkage, so that we can call it from C modules. (Which
raises the question: How do we call the overloaded version of
toupper?)
|
The overloaded versions are only present in C++, only in
namespace std::, and only if we include <locale>. The library
implementation may include <locale> in any C++ header; it may
not include it in <ctype.h>. (And of course, if you included it
from <ctype.h>, the C compiler is likely to be very unhappy.)
| Quote: | But now we finally get to the question that started this all
-- what is so difficult about making toupper available in
namespace std only?
* If a C module uses #include <ctype.h>, it should make toupper()
available in the global namespace (remember that C doesn't have
namespaces).
* If a C++ module uses #include <ctype.h>, it should make toupper
available in the global namespace, and possibly in namespace std
as well.
* If a C++ module uses #include <cctype>, it should make toupper()
available in namespace std only.
To make the same function available in two different
namespaces, all we need to do is define it in one, and then
use a using-declaration to bring the name into the other. But
we can only define it once. Which namespace do we define it
in? No matter which one you pick, you're going to get
criticism.
|
The problem is mainly the fact that in order to make the
function available in std:: only, you cannot use the C
implementation. If you're writing everything from scratch, you
could probably define the real functions with names like
__isupper(), etc., then in <cctype>, provide inline functions
which call them, and make all of <ctype.h> conditional, with
#include <cctype> and using std::toupper, etc. in the C++ part,
and inline forwarding functions in the C part. That would
almost work; I think it would still cause problems if a C++
program included a C header which tried to use the functions in
a macro. (Which probably means that your implementation
couldn't be used under Windows or Posix based systems. Which
eliminates a rather important customer base.)
In sum:
-- it's a lot of work for very little benefit,
-- it probably still doesn't work reliably with the system
headers of some of the major operating systems, and
-- it requires that you have total control over both the C and
the C++ headers -- I don't know about Windows, but the C
headers are normally bundled with Unix machines (and include
Posix specific additions).
Saying it's impossible may be a slight exageration, but it
certainly isn't reasonable, at least not in general.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Thu Apr 21, 2005 12:12 pm Post subject: Re: Confusion about toupper() |
|
|
Matthias Hofmann wrote:
| Quote: | "Thomas Maeder" <maeder (AT) glue (DOT) ch> schrieb im Newsbeitrag
news:m2mzrw83au.fsf (AT) madbox2 (DOT) local...
[email]kanze (AT) gabi-soft (DOT) fr[/email] writes:
std::vector<char> v;
for ( char c = 'a'; c <= 'z'; ++c )
v.push_back( c );
[snip]
OTOH, using a for loop like the one above to initialize v
this way relies on these characters being contiguously
encoded, which isn't guaranteed.
This means that one has to code as follows:
v.push_back( 'a' );
v.push_back( 'b' );
v.push_back( 'c' );
v.push_back( 'z' );
Then you can iterate over v in order to get all the letters
from the latin alphabet on the usual order. Another solution
would be to have an array and insert it into a vector:
|
That's doing it the hard way.
| Quote: | const char letters[26] = { 'a', 'b', ..., 'z' };
for ( const char* p = letters; p != letters + sizeof (letters) /
sizeof
letters[0]; ++p )
v.push_back( *p );
However, I don't know if that is any less cumbersome...
|
There are two important simplifications. First, if you want an
array with the 26 small letters in sequence:
char const letters[] = "abcdefghijklmnopqrstuvwxyz" ;
does the trick. (You get an extra ' ' at the end, but that's
something I can live with.) Second, if you have a C style
array, it's trivial to construct the vector directly:
std::vector< char > v( letters, letters + 26 ) ;
If I already have the vector, and just want to append, of
course :
v.insert( v.end(), letters, letters + 26 ) ;
None of these alternatives seem particularly more complicated
than the original code to me.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|