C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

strlen via algorithm

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
John Potter
Guest





PostPosted: Fri Sep 23, 2005 1:18 pm    Post subject: strlen via algorithm Reply with quote



Thinking about the failure of the STL model to implement this simple
function, why not try it?

int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}

Gives a rather strange result. Let's fool it with a simple input
iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}

int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0. That simple change allowed the random access case to give
the expected result. Of course, the following terminates in finite time
given s, 0.

int cdist (char const* f, char const* p) {
int n(0);
for (; f != p; ++ f)
++ n;
return n;
}

There is no undefined behavior in the Siter case. Of course, there is
in the random case where 0 - s is computed. The undefined behavior, as
usual, does something reasonable and only the coding style caused the
algorithm to fail. We could also write it to work in most cases by
simply using a large number.

return std::find(s, s + 10000, 0) - s;

This is again undefined behavior.

Should the random access case work? Amusing at least.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
msalters
Guest





PostPosted: Sat Sep 24, 2005 2:18 am    Post subject: Re: strlen via algorithm Reply with quote




John Potter schreef:

Quote:
Thinking about the failure of the STL model to implement this simple
function, why not try it?

int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}

Gives a rather strange result.

Doesn't surprise me. The range [s,0) is not a proper range. Garbage
In Garbage Out.

Quote:
Let's fool it with a simple input iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
....
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

Now it gives the expected result.

Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?

Quote:
An optimization for random access iterators prevented the original
from working.

In your implementation, at least. Other implementations may fail for
different reasons, but with the same root cause. I.e. I can imagine
that the fastest way to find an 8-bit char on a 32-bit system with
SIMD instructions is to check 4 chars at a time, except at the
boundaries of the range (due to alignment problems). This of course
is perfectly legal, but your call can fail.

HTH,
Michiel Salters


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Howard Hinnant
Guest





PostPosted: Sat Sep 24, 2005 2:21 am    Post subject: Re: strlen via algorithm Reply with quote



In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>,
John Potter <jpotter (AT) lhup (DOT) edu> wrote:

Quote:
Thinking about the failure of the STL model to implement this simple
function, why not try it?

int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}

Gives a rather strange result.
snip
Should the random access case work? Amusing at least.

Imho, no. 24.1 p6-p7 defines a valid range denoted by two iterators and
states:

Quote:
The result of the application of functions in the library to invalid ranges
is undefined.

Except for the case that s == 0, [s, 0) is not a valid range. Therefore
your use of find is undefined.

-Howard

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Andrew Koenig
Guest





PostPosted: Sat Sep 24, 2005 2:32 am    Post subject: Re: strlen via algorithm Reply with quote

"John Potter" <jpotter (AT) lhup (DOT) edu> wrote


Quote:
Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0.

Very entertaining.

I would say that it is not required to work because if p is a pointer to
const char, [p, 0) is not a well-formed range, in the sense that you are not
assured of being able to increment p a finite number of times to reach 0.

So here's my question for you: Suppose I decide to implement find for
random-access iterators by precomputing the number of elements in the range
to be searched and unrolling the search loop. The precomputation is
presumably what causes the failure you noticed, because it involves
evaluating end-begin and end is a 0 pointer in this case. And yet the loop
unrolling might make a significant difference in execution time on some
processors.

Is that an acceptable optimization?


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Carl Barron
Guest





PostPosted: Sun Sep 25, 2005 1:43 am    Post subject: Re: strlen via algorithm Reply with quote

In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

Quote:
Thinking about the failure of the STL model to implement this simple
function, why not try it?

int cstrlen (char const* s) {
return std::find(s, (char const*)0, 0) - s;
}

Well this is ub as (const char *)0 is not reachable from s, without

overflow... [reachable a finite # of operator ++()'s applied to s
yields 0]
Quote:
Gives a rather strange result. Let's fool it with a simple input
iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}

int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

same problem same ub. [same result which happens to yield correct

answer with cw9.6 but that is a matter of luck]
if you want an input iterator for a '' terminated string. here is
one using boost::iterator_facade:
#include <boost/iterator/iterator_facade.hpp>

class null_terminated:public boost::iterator_facade
<
null_terminated,
const char,
std::input_iterator_tag
Quote:

{

friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}
void increment()
{
if(!done)
done = (*++p == '');
}
bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}
};

using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
John Potter
Guest





PostPosted: Sun Sep 25, 2005 6:58 pm    Post subject: Re: strlen via algorithm Reply with quote

On 23 Sep 2005 22:32:37 -0400, "Andrew Koenig" <ark (AT) acm (DOT) org> wrote:

Quote:
"John Potter" <jpotter (AT) lhup (DOT) edu> wrote in message
news:LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net...

Now it gives the expected result. An optimization for random access
iterators prevented the original from working. It was not the
optimization itself that caused the failure, but the coding style used
in the algorithm. A count down loop was implemented with x > 0 rather
than x != 0.

Very entertaining.

I would say that it is not required to work because if p is a pointer to
const char, [p, 0) is not a well-formed range, in the sense that you are
not
assured of being able to increment p a finite number of times to reach 0.

Agreed in general. The question here is when an implementation provides
the ability to increment any p a finite number of times and reach 0, is
it also required to accept the range defined as valid by ++ and produce
reasonable results?

Quote:
So here's my question for you: Suppose I decide to implement find for
random-access iterators by precomputing the number of elements in the
range
to be searched and unrolling the search loop. The precomputation is
presumably what causes the failure you noticed, because it involves
evaluating end-begin and end is a 0 pointer in this case. And yet the
loop
unrolling might make a significant difference in execution time on some
processors.

Is that an acceptable optimization?

I'm not sure. The problem is size_t_MAX being larger than ptrdiff_t_MAX
for most implementations. Is it possible to have an array of char
larger than ptrdiff_t_MAX? If so, the above problem can be translated
to valid begin and past_end pointers. The standard requires that
addition and subtraction of integrals to pointers within the range must
not overflow. It also states that subtraction of pointers is undefined
behavior if the result will not fit in ptrdiff_t. Since the result will
not fit, is the library allowed to perform the subtraction without
taking precautions to define the behavior?

The amusing coding problem was attempting to apply relational operators
to the number circle which has no order. The case of [p,0) is
entertaining; however, the [p,p+big) case is a serious question.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
John Potter
Guest





PostPosted: Sun Sep 25, 2005 6:59 pm    Post subject: Re: strlen via algorithm Reply with quote

On 23 Sep 2005 22:18:41 -0400, "msalters"
<Michiel.Salters (AT) logicacmg (DOT) com> wrote:

Quote:
John Potter schreef:

Let's fool it with a simple input iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

Now it gives the expected result.

Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?

Nope. No UB here. Siter is an input iterator and it must work just
like it must for istream_iterator.

std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');

The only UB for Siter would be the same as for strlen, no nul character
found prior to a system violation. The random case may be interesting,
but the Siter case is simply correct and must work everywhere.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Carl Barron
Guest





PostPosted: Mon Sep 26, 2005 1:37 am    Post subject: Re: strlen via algorithm Reply with quote

In article <pIjZe.3547$QE1.3479 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

Quote:
On 23 Sep 2005 22:18:41 -0400, "msalters"
[email]Michiel.Salters (AT) logicacmg (DOT) com[/email]> wrote:

John Potter schreef:

Let's fool it with a simple input iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

Now it gives the expected result.

Too bad, it's still undefined behavior. But I don't have to tell you
that UB can include expected behavior, do I?

Nope. No UB here. Siter is an input iterator and it must work just
like it must for istream_iterator.

std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');

The only UB for Siter would be the same as for strlen, no nul character
found prior to a system violation. The random case may be interesting,
but the Siter case is simply correct and must work everywhere.

Siter() is not reachable from Siter(p) with operator ++() and thus

it still is not a range and the results are undefined. Garbage in
garbage out, whether Siter is an input iterator [after modifications,
it could be].

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Tue Sep 27, 2005 8:27 am    Post subject: Re: strlen via algorithm Reply with quote

Carl Barron wrote:
Quote:
In article
pIjZe.3547$QE1.3479 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

On 23 Sep 2005 22:18:41 -0400, "msalters"
[email]Michiel.Salters (AT) logicacmg (DOT) com[/email]> wrote:

John Potter schreef:

Let's fool it with a simple input iterator.

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
...
int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

Now it gives the expected result.

Too bad, it's still undefined behavior. But I don't have to
tell you that UB can include expected behavior, do I?

Nope. No UB here. Siter is an input iterator and it must
work just like it must for istream_iterator.

std::find(istream_iterator<char>(cin >> noskipws),
istream_iterator<char>, 'n');

The only UB for Siter would be the same as for strlen, no
nul character found prior to a system violation. The random
case may be interesting, but the Siter case is simply
correct and must work everywhere.

Siter() is not reachable from Siter(p) with operator ++()
and thus it still is not a range and the results are
undefined. Garbage in garbage out, whether Siter is an input
iterator [after modifications, it could be].

That's an interesting point.

The problem, of course, is that John is trying to emulate poor
programming practice, and the STL just doesn't cope with things
as stupid or as dangerous as doing a look-up on a buffer without
knowing how big the buffer is.

IMHO, the fact that the STL does not allow us to write such code
is a feature, not a flaw.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
John Potter
Guest





PostPosted: Tue Sep 27, 2005 2:33 pm    Post subject: Re: strlen via algorithm Reply with quote

On 24 Sep 2005 21:43:06 -0400, Carl Barron <cbarron413 (AT) adelphia (DOT) net>
wrote:

Quote:
In article <LuJYe.2864$QE1.2651 (AT) newsread2 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

struct Siter : public std::iterator<std::input_iterator_tag, char> {
char const* p;
Siter (char const* p = 0) : p(p) {
}
Siter& operator++ () {
++ p;
return *this;
}
char operator* () const {
return *p;
}
char const* base () const {
return p;
}
};
bool operator== (Siter lhs, Siter rhs) {
return lhs.p == rhs.p;
}
bool operator!= (Siter lhs, Siter rhs) {
return lhs.p != rhs.p;
}

int cstrlen (char const* s) {
return std::find(Siter(s), Siter(), 0).base() - s;
}

same problem same ub. [same result which happens to yield correct
answer with cw9.6 but that is a matter of luck]

No luck involved. The question of reachable can not be answered for an
input iterator (Yes I know the postfix++ is missing). If there is a nul
character in the array starting at s, the find algorithm will find it
with no ub. If not, nothing can work.

Quote:
if you want an input iterator for a '' terminated string. here is
one using boost::iterator_facade:

Nice alternative.

Quote:
#include
class null_terminated:public boost::iterator_facade

null_terminated,
const char,
std::input_iterator_tag

{
friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}

Why did you write this? What is wrong with the default?

Quote:
void increment()
{
if(!done)
done = (*++p == '');
}

Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.

Quote:
bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}

done(*a == '')

To handle the empty string.

Quote:
};

using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.

Same ub as above. It is a race for a nul character or buffer overrun.
The theoritical ub has no play in code which is not allowed to test for
it or malfunction.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Carl Barron
Guest





PostPosted: Wed Sep 28, 2005 9:51 am    Post subject: Re: strlen via algorithm Reply with quote

In article <088_e.5742$q1.1596 (AT) newsread3 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

Quote:
On 24 Sep 2005 21:43:06 -0400, Carl Barron wrote:

No luck involved. The question of reachable can not be answered for an
input iterator (Yes I know the postfix++ is missing). If there is a nul
character in the array starting at s, the find algorithm will find it
with no ub. If not, nothing can work.

if you want an input iterator for a '' terminated string. here is
one using boost::iterator_facade:

Nice alternative.

#include
class null_terminated:public boost::iterator_facade

null_terminated,
const char,
std::input_iterator_tag

{
friend class boost::iterator_core_access;
bool done;
const char *p;
const char &dereference() const
{
return *p;
}

Why did you write this? What is wrong with the default?

what default?
void increment()
{
if(!done)
done = (*++p == '');
}

Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.

if done increment does nothing!!


Quote:
bool equal(const null_terminated &r)const
{
return done == r.done;
}
public:
null_terminated():p(0),done(true){}
null_terminated(const char *a):p(a),done(false){}

done(*a == '')

To handle the empty string.

ok quick programming error... thanks,
};

using this iterator with std::distance()
std::distance(null_terminated(s),null_terminated()); gives the result
with no ub.


Same ub as above. It is a race for a nul character or buffer overrun.
The theoritical ub has no play in code which is not allowed to test for
it or malfunction.
Gigo will kill either approach but Siter as written requires the

pointer to overflow to 0 to terminate the loop. On some systems
this never happens.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
John Potter
Guest





PostPosted: Sat Oct 01, 2005 1:28 am    Post subject: Re: strlen via algorithm Reply with quote

On 28 Sep 2005 05:51:13 -0400, Carl Barron <cbarron413 (AT) adelphia (DOT) net>
wrote:

Quote:
In article <088_e.5742$q1.1596 (AT) newsread3 (DOT) news.atl.earthlink.net>, John
Potter <jpotter (AT) lhup (DOT) edu> wrote:

On 24 Sep 2005 21:43:06 -0400, Carl Barron wrote:

const char &dereference() const
{
return *p;
}

Why did you write this? What is wrong with the default?

what default?

Sorry, I confused it with iterator adaptor which provides defaults. In
this example, there is no dereference use and it could be commented out
with no effect.

Quote:
void increment()
{
if(!done)
done = (*++p == '');
}

Why do you prefer an infinite loop when incrementing end? I think that
either not checking to allow a crash, or asserting/throwing would be
prefered.

if done increment does nothing!!

Done indicates an end iterator. It is undefined to increment an end
iterator. The user tried to do it anyway. Since they tried to
increment it and the code did nothing, it is likely that they will be in
an infinite loop. As a design question, why did you choose that form of
undefined behavior rather than not testing which will likely result in a
protection violation or testing and making noise? Maybe I can learn
something from you reasons.

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Carl Barron
Guest





PostPosted: Sat Oct 01, 2005 4:07 pm    Post subject: Re: strlen via algorithm Reply with quote

John Potter <jpotter (AT) lhup (DOT) edu> wrote:

Quote:
Done indicates an end iterator. It is undefined to increment an end
iterator. The user tried to do it anyway. Since they tried to
increment it and the code did nothing, it is likely that they will be in
an infinite loop. As a design question, why did you choose that form of
undefined behavior rather than not testing which will likely result in a
protection violation or testing and making noise? Maybe I can learn
something from you reasons.

class null_terminated:public boost::iterator_facade
<
null_terminated,
char,
std::forward_iterator_tag
Quote:

{

friend class boost::iterator_core_access;
char *p;
char & dereference( const {return *p;}
void increment() { if(*++p=='') p=0;}
bool equals(const null_terminated &r)const {return p==r.p;}
public:
null_terminated(char *a=0):p(*a?a:0){}
};

this is even shorter and will fault if user increments an 'end'
iterator.

Thanks this is cleaner and a forward iterator as well.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
John Potter
Guest





PostPosted: Sun Oct 02, 2005 3:34 pm    Post subject: Re: strlen via algorithm Reply with quote

On 1 Oct 2005 12:07:59 -0400, [email]cbarron3 (AT) ix (DOT) netcom.com[/email] (Carl Barron)
wrote:

Nicer.

Quote:
null_terminated(char *a=0):p(*a?a:0){}

Just in case it proves useful.

null_terminated (char* a = 0) : p(a && *a ? a : 0) { }

John

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.