 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
John Potter Guest
|
Posted: Fri Sep 23, 2005 1:18 pm Post subject: strlen via algorithm |
|
|
Thinking about the failure of the STL model to implement this simple
function, why not try it?
int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}
Gives a rather strange result. Let's fool it with a simple input
iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0. That simple change allowed the random access case to give
the expected result. Of course, the following terminates in finite time
given s, 0.
int cdist (char const* f, char const* p) {
int n(0);
for (; f != p; ++ f)
++ n;
return n;
}
There is no undefined behavior in the Siter case. Of course, there is
in the random case where 0 - s is computed. The undefined behavior, as
usual, does something reasonable and only the coding style caused the
algorithm to fail. We could also write it to work in most cases by
simply using a large number.
return std::find(s, s + 10000, 0) - s;
This is again undefined behavior.
Should the random access case work? Amusing at least.
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
msalters Guest
|
Posted: Sat Sep 24, 2005 2:18 am Post subject: Re: strlen via algorithm |
|
|
John Potter schreef:
| Quote: | Thinking about the failure of the STL model to implement this simple
function, why not try it?
int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}
Gives a rather strange result.
|
Doesn't surprise me. The range [s,0) is not a proper range. Garbage
In Garbage Out.
| Quote: | Let's fool it with a simple input iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
....
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
Now it gives the expected result.
|
Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?
| Quote: | An optimization for random access iterators prevented the original
from working.
|
In your implementation, at least. Other implementations may fail for
different reasons, but with the same root cause. I.e. I can imagine
that the fastest way to find an 8-bit char on a 32-bit system with
SIMD instructions is to check 4 chars at a time, except at the
boundaries of the range (due to alignment problems). This of course
is perfectly legal, but your call can fail.
HTH,
Michiel Salters
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Howard Hinnant Guest
|
Posted: Sat Sep 24, 2005 2:21 am Post subject: Re: strlen via algorithm |
|
|
In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>,
John Potter <jpotter (AT) lhup (DOT) edu> wrote:
| Quote: | Thinking about the failure of the STL model to implement this simple
function, why not try it?
int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}
Gives a rather strange result.
snip
Should the random access case work? Amusing at least.
|
Imho, no. 24.1 p6-p7 defines a valid range denoted by two iterators and
states:
| Quote: | The result of the application of functions in the library to invalid ranges
is undefined.
|
Except for the case that s == 0, [s, 0) is not a valid range. Therefore
your use of find is undefined.
-Howard
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Andrew Koenig Guest
|
Posted: Sat Sep 24, 2005 2:32 am Post subject: Re: strlen via algorithm |
|
|
"John Potter" <jpotter (AT) lhup (DOT) edu> wrote
| Quote: | Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0.
|
Very entertaining.
I would say that it is not required to work because if p is a pointer to
const char, [p, 0) is not a well-formed range, in the sense that you are not
assured of being able to increment p a finite number of times to reach 0.
So here's my question for you: Suppose I decide to implement find for
random-access iterators by precomputing the number of elements in the range
to be searched and unrolling the search loop. The precomputation is
presumably what causes the failure you noticed, because it involves
evaluating end-begin and end is a 0 pointer in this case. And yet the loop
unrolling might make a significant difference in execution time on some
processors.
Is that an acceptable optimization?
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carl Barron Guest
|
Posted: Sun Sep 25, 2005 1:43 am Post subject: Re: strlen via algorithm |
|
|
In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
| Quote: | Thinking about the failure of the STL model to implement this simple
function, why not try it?
int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}
Well this is ub as (const char *)0 is not reachable from s, without |
overflow... [reachable a finite # of operator ++()'s applied to s
yields 0]
| Quote: | Gives a rather strange result. Let's fool it with a simple input
iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
same problem same ub. [same result which happens to yield correct |
answer with cw9.6 but that is a matter of luck]
if you want an input iterator for a ' ' terminated string. here is
one using boost::iterator_facade:
#include <boost/iterator/iterator_facade.hpp>
class null_terminated:public boost::iterator_facade
<
null_terminated,
const char,
std::input_iterator_tag
friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}
void increment()
{
if(!done)
done = (*++p == ' ');
}
bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}
};
using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Potter Guest
|
Posted: Sun Sep 25, 2005 6:58 pm Post subject: Re: strlen via algorithm |
|
|
On 23 Sep 2005 22:32:37 -0400, "Andrew Koenig" <ark (AT) acm (DOT) org> wrote:
| Quote: | "John Potter" <jpotter (AT) lhup (DOT) edu> wrote in message
news:LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net...
Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0.
Very entertaining.
I would say that it is not required to work because if p is a pointer to
const char, [p, 0) is not a well-formed range, in the sense that you are
not
assured of being able to increment p a finite number of times to reach 0.
|
Agreed in general. The question here is when an implementation provides
the ability to increment any p a finite number of times and reach 0, is
it also required to accept the range defined as valid by ++ and produce
reasonable results?
| Quote: | So here's my question for you: Suppose I decide to implement find for
random-access iterators by precomputing the number of elements in the
range
to be searched and unrolling the search loop. The precomputation is
presumably what causes the failure you noticed, because it involves
evaluating end-begin and end is a 0 pointer in this case. And yet the
loop
unrolling might make a significant difference in execution time on some
processors.
Is that an acceptable optimization?
|
I'm not sure. The problem is size_t_MAX being larger than ptrdiff_t_MAX
for most implementations. Is it possible to have an array of char
larger than ptrdiff_t_MAX? If so, the above problem can be translated
to valid begin and past_end pointers. The standard requires that
addition and subtraction of integrals to pointers within the range must
not overflow. It also states that subtraction of pointers is undefined
behavior if the result will not fit in ptrdiff_t. Since the result will
not fit, is the library allowed to perform the subtraction without
taking precautions to define the behavior?
The amusing coding problem was attempting to apply relational operators
to the number circle which has no order. The case of [p,0) is
entertaining; however, the [p,p+big) case is a serious question.
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Potter Guest
|
Posted: Sun Sep 25, 2005 6:59 pm Post subject: Re: strlen via algorithm |
|
|
On 23 Sep 2005 22:18:41 -0400, "msalters"
<Michiel.Salters (AT) logicacmg (DOT) com> wrote:
| Quote: | John Potter schreef:
Let's fool it with a simple input iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
Now it gives the expected result.
Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?
|
Nope. No UB here. Siter is an input iterator and it must work just
like it must for istream_iterator.
std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');
The only UB for Siter would be the same as for strlen, no nul character
found prior to a system violation. The random case may be interesting,
but the Siter case is simply correct and must work everywhere.
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carl Barron Guest
|
Posted: Mon Sep 26, 2005 1:37 am Post subject: Re: strlen via algorithm |
|
|
In article <pIjZe.3547$QE1.3479 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
| Quote: | On 23 Sep 2005 22:18:41 -0400, "msalters"
[email]Michiel.Salters (AT) logicacmg (DOT) com[/email]> wrote:
John Potter schreef:
Let's fool it with a simple input iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
Now it gives the expected result.
Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?
Nope. No UB here. Siter is an input iterator and it must work just
like it must for istream_iterator.
std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');
The only UB for Siter would be the same as for strlen, no nul character
found prior to a system violation. The random case may be interesting,
but the Siter case is simply correct and must work everywhere.
Siter() is not reachable from Siter(p) with operator ++() and thus |
it still is not a range and the results are undefined. Garbage in
garbage out, whether Siter is an input iterator [after modifications,
it could be].
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze Guest
|
Posted: Tue Sep 27, 2005 8:27 am Post subject: Re: strlen via algorithm |
|
|
Carl Barron wrote:
| Quote: | In article
pIjZe.3547$QE1.3479 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
On 23 Sep 2005 22:18:41 -0400, "msalters"
[email]Michiel.Salters (AT) logicacmg (DOT) com[/email]> wrote:
John Potter schreef:
Let's fool it with a simple input iterator.
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
Now it gives the expected result.
Too bad, it's still undefined behavior. But I don't have to
tell you that UB can include expected behavior, do I?
Nope. No UB here. Siter is an input iterator and it must
work just like it must for istream_iterator.
std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');
The only UB for Siter would be the same as for strlen, no
nul character found prior to a system violation. The random
case may be interesting, but the Siter case is simply
correct and must work everywhere.
Siter() is not reachable from Siter(p) with operator ++()
and thus it still is not a range and the results are
undefined. Garbage in garbage out, whether Siter is an input
iterator [after modifications, it could be].
|
That's an interesting point.
The problem, of course, is that John is trying to emulate poor
programming practice, and the STL just doesn't cope with things
as stupid or as dangerous as doing a look-up on a buffer without
knowing how big the buffer is.
IMHO, the fact that the STL does not allow us to write such code
is a feature, not a flaw.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Potter Guest
|
Posted: Tue Sep 27, 2005 2:33 pm Post subject: Re: strlen via algorithm |
|
|
On 24 Sep 2005 21:43:06 -0400, Carl Barron <cbarron413 (AT) adelphia (DOT) net>
wrote:
| Quote: | In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}
same problem same ub. [same result which happens to yield correct
answer with cw9.6 but that is a matter of luck]
|
No luck involved. The question of reachable can not be answered for an
input iterator (Yes I know the postfix++ is missing). If there is a nul
character in the array starting at s, the find algorithm will find it
with no ub. If not, nothing can work.
| Quote: | if you want an input iterator for a ' ' terminated string. here is
one using boost::iterator_facade:
|
Nice alternative.
| Quote: | #include
class null_terminated:public boost::iterator_facade
null_terminated,
const char,
std::input_iterator_tag
{
friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}
|
Why did you write this? What is wrong with the default?
| Quote: | void increment()
{
if(!done)
done = (*++p == ' ');
}
|
Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.
| Quote: | bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}
|
done(*a == ' ')
To handle the empty string.
| Quote: | };
using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.
|
Same ub as above. It is a race for a nul character or buffer overrun.
The theoritical ub has no play in code which is not allowed to test for
it or malfunction.
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carl Barron Guest
|
Posted: Wed Sep 28, 2005 9:51 am Post subject: Re: strlen via algorithm |
|
|
In article <088_e.5742$q1.1596 (AT) newsread3 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
| Quote: | On 24 Sep 2005 21:43:06 -0400, Carl Barron
wrote:
No luck involved. The question of reachable can not be answered for an
input iterator (Yes I know the postfix++ is missing). If there is a nul
character in the array starting at s, the find algorithm will find it
with no ub. If not, nothing can work.
if you want an input iterator for a ' ' terminated string. here is
one using boost::iterator_facade:
Nice alternative.
#include
class null_terminated:public boost::iterator_facade
null_terminated,
const char,
std::input_iterator_tag
{
friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}
Why did you write this? What is wrong with the default?
what default?
void increment()
{
if(!done)
done = (*++p == ' ');
}
Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.
if done increment does nothing!! |
| Quote: | bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}
done(*a == ' ')
To handle the empty string.
ok quick programming error... thanks,
};
using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.
Same ub as above. It is a race for a nul character or buffer overrun.
The theoritical ub has no play in code which is not allowed to test for
it or malfunction.
Gigo will kill either approach but Siter as written requires the |
pointer to overflow to 0 to terminate the loop. On some systems
this never happens.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Potter Guest
|
Posted: Sat Oct 01, 2005 1:28 am Post subject: Re: strlen via algorithm |
|
|
On 28 Sep 2005 05:51:13 -0400, Carl Barron <cbarron413 (AT) adelphia (DOT) net>
wrote:
| Quote: | In article <088_e.5742$q1.1596 (AT) newsread3 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:
On 24 Sep 2005 21:43:06 -0400, Carl Barron
wrote:
const char &dereference() const
{
return *p;
}
Why did you write this? What is wrong with the default?
what default?
|
Sorry, I confused it with iterator adaptor which provides defaults. In
this example, there is no dereference use and it could be commented out
with no effect.
| Quote: | void increment()
{
if(!done)
done = (*++p == ' ');
}
Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.
if done increment does nothing!!
|
Done indicates an end iterator. It is undefined to increment an end
iterator. The user tried to do it anyway. Since they tried to
increment it and the code did nothing, it is likely that they will be in
an infinite loop. As a design question, why did you choose that form of
undefined behavior rather than not testing which will likely result in a
protection violation or testing and making noise? Maybe I can learn
something from you reasons.
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carl Barron Guest
|
Posted: Sat Oct 01, 2005 4:07 pm Post subject: Re: strlen via algorithm |
|
|
John Potter <jpotter (AT) lhup (DOT) edu> wrote:
| Quote: | Done indicates an end iterator. It is undefined to increment an end
iterator. The user tried to do it anyway. Since they tried to
increment it and the code did nothing, it is likely that they will be in
an infinite loop. As a design question, why did you choose that form of
undefined behavior rather than not testing which will likely result in a
protection violation or testing and making noise? Maybe I can learn
something from you reasons.
|
class null_terminated:public boost::iterator_facade
<
null_terminated,
char,
std::forward_iterator_tag
friend class boost::iterator_core_access;
char *p;
char & dereference( const {return *p;}
void increment() { if(*++p==' ') p=0;}
bool equals(const null_terminated &r)const {return p==r.p;}
public:
null_terminated(char *a=0):p(*a?a:0){}
};
this is even shorter and will fault if user increments an 'end'
iterator.
Thanks this is cleaner and a forward iterator as well.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
John Potter Guest
|
Posted: Sun Oct 02, 2005 3:34 pm Post subject: Re: strlen via algorithm |
|
|
On 1 Oct 2005 12:07:59 -0400, [email]cbarron3 (AT) ix (DOT) netcom.com[/email] (Carl Barron)
wrote:
Nicer.
| Quote: | null_terminated(char *a=0):p(*a?a:0){}
|
Just in case it proves useful.
null_terminated (char* a = 0) : p(a && *a ? a : 0) { }
John
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|