C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Vindicated? Sutter and COW Strings
Goto page 1, 2, 3, 4, 5  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Joshua Lehrer
Guest





PostPosted: Tue Sep 07, 2004 10:34 pm    Post subject: Vindicated? Sutter and COW Strings Reply with quote



I was happy as I read Sutter's article about COW strings. We here at
FactSet wrote a COW string some time ago. I have always claimed it
was thread safe, when written properly, using atomic integral
operations:

http://snipurl.com/8wsm
http://snipurl.com/8wsj

As I read Sutter's article, I felt vindicated. Our code is very
similar, and behaves in almost exactly the same way.

In fact, our string::swap even returns a reference. This is something
that I threw in because I thought it would be convenient. I'm glad to
see that I was "ahead" of the curve. :)

joshua lehrer
factset research systems
NYSE:FDS

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Wed Sep 08, 2004 10:00 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote



[email]usenet_cpp (AT) lehrerfamily (DOT) com[/email] (Joshua Lehrer) wrote in message
news:<31c49f0d.0409070901.4e7a0aa6 (AT) posting (DOT) google.com>...

Quote:
I was happy as I read Sutter's article about COW strings. We here at
FactSet wrote a COW string some time ago. I have always claimed it
was thread safe, when written properly, using atomic integral
operations:

http://snipurl.com/8wsm
http://snipurl.com/8wsj

As I read Sutter's article, I felt vindicated. Our code is very
similar, and behaves in almost exactly the same way.

The problem is that Herb's code isn't thread safe. At least, not using
the definition of thread safety he uses, and supposing that his class
supports an interface compatible with std::string.

It is quite possible to write a thread safe String class using atomic
increment and atomic decrement. I don't think that it can be done if
the class returns raw non-const references, or if it has an iterator
which returns raw non-const eferences, however. To date, all attempts
that I've seen have failed. (Herb's code is almost exactly like that of
the g++ implementation, and that is only thread-safe for the perverted
definition of thread-safe that g++ uses -- that I need to protect access
even if no thread ever modifies the object.)

In the second message you quote above, there is one point on which I am
not clear. You say "If you have a global std::string, that multiple
threads may be modifying and reading, then it must be locked upon
access." The problems occur when you have a global non-const string
that nobody modifies. My typical example is some configuration
parameter which is initialized in main, from the command line arguments
or from a configuration file, before any threads have been started, and
then is never modified. Logically, IMHO, several threads should be able
to access this variable without external locking. If the string class
is not COW, they can. According to Posix, they can (or could, supposing
that std::string were a type known to Posix). Regretfully, it doesn't
work with Herb's implementation, nor with the g++ implementation.

For an example of the problem, consider the following scenario: the
string has been initialized, as above. You then start three threads.

Thread 1 makes a local copy of the string. The reference count is thus
two.

Thread 2 decides to read a character in the string, by calling
operator[]. Since the string is declared non-const, thread 2 calls
the non-const version of operator[], even though it is not modifying
the string. However, because the implementation of String cannot
know that no modification is intended, it must isolate the
implementation (i.e. execute the copy on write) -- in Herb's code,
this means call ensureUnique(), in the g++ implementation, call
_M_mutate(). Both of these functions do basically the same thing:
they check whether the string is shared (it is), and if (since) it
is, they allocate a new copy, and copy the data.

At this point, however, thread 2 is interrupted by thread 3.

Thread 3 does exactly the same thing as thread 2. Since the string is
still shared, it also allocates a new image, and starts copying the
data. At this point, it gets interrupted.

Thread 1 comes back to life. Its local copy of the string goes out of
scope, decrementing the reference count. The reference count is now
one.

Thread 2 comes back to life. It finishes copying the data, and
"disconnects" the old image -- i.e. it decrements the reference
count (which was one), and if the results are zero, it deletes the
old data. And thread 2 goes on to use the new, reserved data.

Thread 3 comes back to life. By a bit of bad luck, it already had the
pointer to the old data loaded in a register, ready to copy. So it
copies the old data that thread 2 has just deleted. The standard
qualifies this as undefined behavior: that's bad number 1.

When thread 3 finishs copying, it disconnects the "old" image:
depending on whether the compiler has kept the pointer in a
register, or rereads it from the object, it disconnects the image
that thread 2 has already deleted, possibly deleting it twice
(depending on whether the reference counter is signed or not -- this
is bad 2), or it disconnects the image thread 2 has just installed,
deleting it (since its reference count was necessarily one).

Thread 2 comes back to life, and uses the reference which the operator[]
returned. If thread 3 disconnected the image it just installed,
this points to deleted memory -- yet another bad.

Note that if you are doing a deep copy, none of this is any problem. As
long as you don't really modify the string, no part of its
implementation is modified, and the Posix rules apply. Note too that if
all modifications pass through a function in the string (rather than
having a function which returns a pointer or a reference allowing a
possible modification), there is no problem -- the string class knows
for sure when it will be modified (and we all agree that IF any thread
is modifying the object, all threads must protect all of their
accesses). One way of doing this is to have operator[] (and operator*
of the iterator) return a proxy. The standard doesn't allow this, but
I've used it quite successfully on some of my own, pre-standard string
classes -- it also has the advantage that operator[] or obtaining a
non-const iterator don't inhibit copy on write for all of the future.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Kai-Uwe Bux
Guest





PostPosted: Fri Sep 10, 2004 2:01 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote



[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:

Quote:
usenet_cpp (AT) lehrerfamily (DOT) com (Joshua Lehrer) wrote in message
news:<31c49f0d.0409070901.4e7a0aa6 (AT) posting (DOT) google.com>...

I was happy as I read Sutter's article about COW strings. We here at
FactSet wrote a COW string some time ago. I have always claimed it
was thread safe, when written properly, using atomic integral
operations:

http://snipurl.com/8wsm
http://snipurl.com/8wsj

As I read Sutter's article, I felt vindicated. Our code is very
similar, and behaves in almost exactly the same way.

The problem is that Herb's code isn't thread safe. At least, not using
the definition of thread safety he uses, and supposing that his class
supports an interface compatible with std::string.

It is quite possible to write a thread safe String class using atomic
increment and atomic decrement. I don't think that it can be done if
the class returns raw non-const references, or if it has an iterator
which returns raw non-const eferences, however. To date, all attempts
that I've seen have failed. (Herb's code is almost exactly like that of
the g++ implementation, and that is only thread-safe for the perverted
definition of thread-safe that g++ uses -- that I need to protect access
even if no thread ever modifies the object.)

In the second message you quote above, there is one point on which I am
not clear. You say "If you have a global std::string, that multiple
threads may be modifying and reading, then it must be locked upon
access." The problems occur when you have a global non-const string
that nobody modifies. My typical example is some configuration
parameter which is initialized in main, from the command line arguments
or from a configuration file, before any threads have been started, and
then is never modified. Logically, IMHO, several threads should be able
to access this variable without external locking. If the string class
is not COW, they can. According to Posix, they can (or could, supposing
that std::string were a type known to Posix). Regretfully, it doesn't
work with Herb's implementation, nor with the g++ implementation.

For an example of the problem, consider the following scenario: the
string has been initialized, as above. You then start three threads.

Thread 1 makes a local copy of the string. The reference count is thus
two.

Thread 2 decides to read a character in the string, by calling
operator[]. Since the string is declared non-const, thread 2 calls
the non-const version of operator[], even though it is not modifying
the string. However, because the implementation of String cannot
know that no modification is intended, it must isolate the
implementation (i.e. execute the copy on write) -- in Herb's code,
this means call ensureUnique(), in the g++ implementation, call
_M_mutate(). Both of these functions do basically the same thing:
they check whether the string is shared (it is), and if (since) it
is, they allocate a new copy, and copy the data.

At this point, however, thread 2 is interrupted by thread 3.

Thread 3 does exactly the same thing as thread 2. Since the string is
still shared, it also allocates a new image, and starts copying the
data. At this point, it gets interrupted.

Thread 1 comes back to life. Its local copy of the string goes out of
scope, decrementing the reference count. The reference count is now
one.

Thread 2 comes back to life. It finishes copying the data, and
"disconnects" the old image -- i.e. it decrements the reference
count (which was one), and if the results are zero, it deletes the
old data. And thread 2 goes on to use the new, reserved data.

Thread 3 comes back to life. By a bit of bad luck, it already had the
pointer to the old data loaded in a register, ready to copy. So it
copies the old data that thread 2 has just deleted. The standard
qualifies this as undefined behavior: that's bad number 1.

When thread 3 finishs copying, it disconnects the "old" image:
depending on whether the compiler has kept the pointer in a
register, or rereads it from the object, it disconnects the image
that thread 2 has already deleted, possibly deleting it twice
(depending on whether the reference counter is signed or not -- this
is bad 2), or it disconnects the image thread 2 has just installed,
deleting it (since its reference count was necessarily one).

Thread 2 comes back to life, and uses the reference which the operator[]
returned. If thread 3 disconnected the image it just installed,
this points to deleted memory -- yet another bad.


Thank you very much for this detailed account.

What about using more powerful atomic integer operations like atomic
compare_and_swap.

atomic
bool
atomic_compare_and_swap ( size_t& addr,
size_t expected, size_t put_there ) {
if ( addr == expected ) {
addr = put_there;
return( true );
} else {
return( false );
}
}

Now, one could implement ensureUnique() as follows:

void EnsureUnique ( void ) {
std::size_t current_count = atomic_read( data_ptr->ref_count );
// This object has a unique handle on the buffer.
// Nothing to do:
if ( current_count == 1 ) {
return;
}
// Copy the data into a privately held new buffer object.
// Be sure *not* to change the handle yet.
/*
This line really is a dummy for what needs to be done here
to create a local copy with refcount 1.
*/
Data* new_data_ptr = new Data ( data->buffer, this->size(), 1 );
do {
if ( atomic_compare_and_swap( data_ptr->ref_count,
current_count, current_count-1 ) ) {
// The reference count in *data_ptr has been adjusted.
// Note that it did not drop to 0!
// Release the handle:
data_ptr = new_data_ptr;
return;
}
// Something strange happened in the outside world
// and changed the ref_count. We should update our
// knowledge:
current_count = atomic_read( data_ptr->ref_count );
} while ( current_count != 1 );
// Leaving the loop this way, we know that by some events
// happening outside, we became the sole owner of *data_ptr.
// Our copying was not necessary:
delete new_data_ptr;
}


Now your story would read as follows:


Thread 1 makes a local copy of the string. The reference count is 2.

Thread 2 decides to read a character in the string, by calling
operator[]. This triggers EnsureUnique(). New data is allocated and the
buffer is copied. The value for current_count(thread 2) is 2.

At this point, however, thread 2 is interrupted by thread 3.

Thread 3 does exactly the same thing as thread 2. Since the string is
still shared, it also allocates a new image, and starts copying the
data.

current_count(thread 3) == 2

At this point, it gets interrupted.

Thread 1 comes back to life. Its local copy of the string goes out of
scope, decrementing the reference count. The reference count is now
one.

Thread 2 comes back to life. It finishes copying the data, and
enters the loop. Calling

atomic_compare_and_swap( data_ptr->ref_count,
current_count, current_count-1 )

returns *false* since the ref_count has changed to 1. This value
will be read, the loop is left at the bottom exit and thread 2
continues using the old buffer.

Thread 3 comes back to life and faces the same fate as thread 2.


Would that work?


Best

Kai-Uwe Bux

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Bill Wade
Guest





PostPosted: Fri Sep 10, 2004 2:09 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote in message news:<d6652001.0409080501.15bb8721 (AT) posting (DOT) google.com>...

Quote:
... The problems occur when you have a global non-const string
that nobody modifies. ...

I'd say that the problem is the false belief that it is possible to
call a non-const member (such as nc-begin or nc-[]) and pretend that
it is just a read access. I agree that COW std::string cannot be
thread safe. It cannot be thread safe because so many developers
believe the fallacy.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
James Kanze
Guest





PostPosted: Sun Sep 12, 2004 10:06 am    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

[email]wade (AT) stoner (DOT) com[/email] (Bill Wade) writes:

Quote:
kanze (AT) gabi-soft (DOT) fr wrote in message
news:<d6652001.0409080501.15bb8721 (AT) posting (DOT) google.com>...

... The problems occur when you have a global non-const string
that nobody modifies. ...

I'd say that the problem is the false belief that it is possible to
call a non-const member (such as nc-begin or nc-[]) and pretend that
it is just a read access. I agree that COW std::string cannot be
thread safe. It cannot be thread safe because so many developers
believe the fallacy.

The fallacy is more fundamental than that. The problem is that you have
non-const functions that don't modify the string. If I don't modify the
string, I've maintained my end of the contract; the const-ness of the
actual function the compiler choses to call is irrelevant.

The real problem is in the interface defined by the standard. An
interface in which it is fully natural to call non-const functions but
not to modify the object.

--
James Kanze
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Herb Sutter
Guest





PostPosted: Mon Sep 13, 2004 7:58 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Let me jump in to note that most of James' bug report is actually a FAQ,
well known to me and not a bug. The rest is also not a bug in my code, but
a legitimate request for a std::string redesign where the typical solution
to what he wants to do is to use an immutable string type.

On 8 Sep 2004 18:00:05 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Quote:
The problem is that Herb's code isn't thread safe. At least, not using
the definition of thread safety he uses, and supposing that his class
supports an interface compatible with std::string.

That's untrue. Two points:

First, the code is thread-safe. FWIW I've received this particular "bug
report" so often, including even from multiple fellow CUJ columnists, that
this summer I finally decided to write an aricle about this non-bug so
that I wouldn't have to answer it again. Smile It appeared as my September
2004 CUJ column under the title "'Just Enough' Thread Safety."

Second, what you actually want (as you noted) is not a more thread-safe
implementation, but rather a more thread-friendly design. Specifically,
you're really asking for a redesigned std::string that offers operations
that can be knowably non-mutating (unlike op[], which falls into the
category of possibly-mutating operations), so that you can document
guarantees like "you don't need to do external locking if there are only
reader threads and on (possibly-)writer threads" for more useful common
cases.

Incidentally (and I know James probably knows this, but for completeness),
note that if you have _all_ non-mutating operations (i.e., an immutable
string type) then the whole issue of locking disappears. That's why other
languages/frameworks have gone that way. Of course, there are
corresponding disadvantages (e.g., every string operation makes a new
string, which has interesting performance consequences as well as
interesting implementation consequences such as that you implement += in
terms of + instead of the std::string-"normal" other way around).

Finally, it's worth noting that std::string has even more constraints for
COW than you or I have mentioned here, and that are entirely unrelated to
thread safety. In particular, it's not clear to me how any COW
implementation of std::string can be standards-conforming because of this
situation (which I've been pointing out for years in hallway conversations
and talks, but it now occurs to me I don't think I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation may
do a shallow copy here. The reason is that there's no way to know whether
the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2, then
str[0] must do a deep copy, which means it will allocate memory, which
means it could throw a bad_alloc exception, and that is nonconforming
because when the index is within bounds no exception may be thrown. The
standard doesn't allow an exception to be thrown here, but no COW string
implementation I know of could realistically avoid this possibility.

Here COW is nonconforming, but, worse, this violation of the explicit
contract of the type is highly surprising to users when it does happen.

Herb

---
Herb Sutter (www.gotw.ca)

Convener, ISO WG21 (C++ standards committee) (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
Architect, Developer Division, Microsoft (www.gotw.ca/microsoft)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Mon Sep 13, 2004 8:30 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Kai-Uwe Bux <jkherciueh (AT) gmx (DOT) net> wrote

Quote:
kanze (AT) gabi-soft (DOT) fr wrote:
[email]usenet_cpp (AT) lehrerfamily (DOT) com[/email] (Joshua Lehrer) wrote in message
news:<31c49f0d.0409070901.4e7a0aa6 (AT) posting (DOT) google.com>...

[...]
Quote:
What about using more powerful atomic integer operations like atomic
compare_and_swap.

I don't think that a thread safe version could be implemented using this
either. The basic problem is that when you isolate, you must update two
variables, the use count AND the pointer to the image. Compare and swap
won't prevent another thread from interrupting between the two, finding
that the image is isolated, but reading the old, pre-isolated pointer.

While I'm at it, I should mention that it's important to take into
account one or two things that Herb and I have not yet mentioned, for
reasons of simplification. Basically, if the function called is
returning a reference or an iterator, not only does it need a unique
copy of the image, it needs to ensure that this copy remains unique
after leaving the function. G++, for example, uses a special flag value
(-1, I think) in the reference count to signal this. This flag can only
be reset by a function which is documented to possibly invalidate
iterators or references. If we ignore for the moment c_str() and
data(), these are all functions which really do modify the string --
which in our problem case, won't be called. This means that once this
flag is set, no more problems can occur. (This also means that in my
scenario, only the first call do operator[] can cause any problems.)

This introduces some added complexity to any proposed solution.
However, even without this problem, your code fails.

Quote:
atomic
bool
atomic_compare_and_swap ( size_t& addr,
size_t expected, size_t put_there ) {
if ( addr == expected ) {
addr = put_there;
return( true );
} else {
return( false );
}
}

Now, one could implement ensureUnique() as follows:

void EnsureUnique ( void ) {
std::size_t current_count = atomic_read( data_ptr->ref_count );
// This object has a unique handle on the buffer.
// Nothing to do:
if ( current_count == 1 ) {
return;
}
// Copy the data into a privately held new buffer object.
// Be sure *not* to change the handle yet.
/*
This line really is a dummy for what needs to be done here
to create a local copy with refcount 1.
*/
Data* new_data_ptr = new Data ( data->buffer, this->size(), 1 );
do {
if ( atomic_compare_and_swap( data_ptr->ref_count,
current_count, current_count-1 ) ) {
// The reference count in *data_ptr has been adjusted.
// Note that it did not drop to 0!
// Release the handle:

Here's where you might have a problem. See below. (Note that it is the
new pointer which is isolated, not the old one.)

Quote:
data_ptr = new_data_ptr;

return;
}
// Something strange happened in the outside world
// and changed the ref_count. We should update our
// knowledge:
current_count = atomic_read( data_ptr->ref_count );
} while ( current_count != 1 );
// Leaving the loop this way, we know that by some events
// happening outside, we became the sole owner of *data_ptr.
// Our copying was not necessary:
delete new_data_ptr;
}

Now your story would read as follows:

Thread 1 makes a local copy of the string. The reference count is 2.

Nope.

Of course, in real life, we'd need the isolated state. But regardless.
I've still not found any way of eliminating the race condition between
the atomic_compare_and_swap and assigning to the data pointer. In the
end, with four threads, it always fails: 2 threads make copies, pushing
the count up to three. Third thread intervenes, gets through the
atomic_compare_and_swap, which decrements the count to two, but is
interrupted before assigning the data_ptr. Fourth thread comes in, does
the same. Then both threads set the data_ptr. Of course, the last
one overwrites the first; so the first has leaked memory.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Bill Wade
Guest





PostPosted: Mon Sep 13, 2004 8:38 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

James Kanze <kanze (AT) gabi-soft (DOT) fr> wrote


Quote:
The real problem is in the interface defined by the standard. An
interface in which it is fully natural to call non-const functions but
not to modify the object.

No argument there. The std::string interface is dangerous even for
single-threaded COW. About all an implementation can safely do is
copy-on-ptr-or-write (COPOW), where "ptr" is any operation that gives
out an iterator, pointer, or reference (including const_iterator,...).

COPOW works with any std::container<POD>. It has all of the
performance penaly of COW, and not quite all of the performance
benefits of COW. In particluar, for std::string (and certainly for
other std::containers) allocation must occur even in cases where the
deep-copy is avoided. I can't see using it by default for any
std::container (strings are too short, and the other containers rely
too much on the iterator interface). I could see an implementation
making it optionally available (perhaps via an allocator_trait), but I
suspect that most developers with an application that needs COPOW (or
COW) will write their own containers. If I am sure that I need COW
(or I am sure that COW must not be used), then I don't use std::string
for "real" development.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
David Abrahams
Guest





PostPosted: Tue Sep 14, 2004 12:22 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Herb Sutter <hsutter (AT) gotw (DOT) ca> writes:

Quote:
Finally, it's worth noting that std::string has even more constraints for
COW than you or I have mentioned here, and that are entirely unrelated to
thread safety. In particular, it's not clear to me how any COW
implementation of std::string can be standards-conforming because of this
situation (which I've been pointing out for years in hallway conversations
and talks, but it now occurs to me I don't think I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation may
do a shallow copy here. The reason is that there's no way to know whether
the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2, then
str[0] must do a deep copy, which means it will allocate memory, which
means it could throw a bad_alloc exception, and that is nonconforming
because when the index is within bounds no exception may be thrown.

Since when? Standard references please!

I think there may be reasons that a conforming COW string is
impossible, but this ain't one of them, AFAICT.

--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
news@pgxml.net
Guest





PostPosted: Tue Sep 14, 2004 12:23 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] writes:

[...COW strings]

Quote:

Thread 1 makes a local copy of the string. The reference count is
2.

Nope.

Of course, in real life, we'd need the isolated state. But
regardless. I've still not found any way of eliminating the race
condition between the atomic_compare_and_swap and assigning to the
data pointer. In the end, with four threads, it always fails: 2
threads make copies, pushing the count up to three. Third thread
intervenes, gets through the atomic_compare_and_swap, which decrements
the count to two, but is interrupted before assigning the data_ptr.
Fourth thread comes in, does the same. Then both threads set the
data_ptr. Of course, the last one overwrites the first; so the first
has leaked memory.

*sigh*

<my_oafishness>
man stackframe
in the name of KISS, do we need COW-strings?
</my_oafishness>

greets, scnr

chr- "jeder bitte nur ein Kreuz" -istoph

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Tue Sep 14, 2004 10:46 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Herb Sutter <hsutter (AT) gotw (DOT) ca> wrote


Quote:
Let me jump in to note that most of James' bug report is actually a
FAQ, well known to me and not a bug. The rest is also not a bug in my
code, but a legitimate request for a std::string redesign where the
typical solution to what he wants to do is to use an immutable string
type.

On 8 Sep 2004 18:00:05 -0400, [email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
The problem is that Herb's code isn't thread safe. At least, not
using the definition of thread safety he uses, and supposing that his
class supports an interface compatible with std::string.

That's untrue.

Which part is untrue? Obviously, when talking about thread safety, we
have to define what we mean by the term. Your article clearly said that
it should behave as a non-COW string does with regards to thread-safety.
I interpret that to mean that anything that would reliably work for non
COW strings would work with your proposed implementation. You
specifically spoke about an operator[] which returned a reference --
that raises most, if not all of the problems the standard string class
has. In particular, if operator[] returns a reference, you must call
ensureUnique (or something equivalent) in it.

Given that, I presented an example of code which would work with a non
COW string, but which would fail with your implementation. So where is
my error: in my interpretation of the guarantees which you intended, in
my suppositions concerning the interface which you wanted to implement,
or in my example of where it failed. (If it is the latter, you'll have
to show me the exact flaw in my reasoning, because I've poured over that
one out a lot.)

Quote:
Two points:

First, the code is thread-safe.

For what definition of thread-safe. It does not offer the same
guarantees as a non-COW implementation would, which is the definition
you seem to offer. It doesn't work even in some cases where the user
does not actually modify the string, but does use the non-const
operator[]. You even suggested that the const operator[] should return
a reference, and that the user should be able to cast away const on it
and modify the string -- if that is the case, it doesn't even work in
some cases where the user doesn't call any non-const functions.

I spent considerable time in analysing your code, because I know that
you are aware of the issues, and you are capable of finding solutions
where others can't. I took the time to explain my analysis in detail,
including my pre-suppositions concerning the definition (contract) of
thread-safety. (I personally felt that that was a weak point of the
article, that a simple phrase like "the same guarantees as a non COW
implementation" is rather vague, and that those guarantees should have
been spelled out.)

Just saying that I was wrong is not enough. I may be wrong, but if so,
you are going to have to be more explicit, and point out exactly where I
went astray.

Quote:
FWIW I've received this particular "bug report" so often, including
even from multiple fellow CUJ columnists, that this summer I finally
decided to write an aricle about this non-bug so that I wouldn't have
to answer it again. Smile It appeared as my September 2004 CUJ column
under the title "'Just Enough' Thread Safety."

That's exactly the article we are criticizing. See above: you gave a
definition of the desired thread safety (the same as a non-COW class
would offer), you speak at least a little about an operator[] which
returns a reference, and you present code which, if used with an
operator[] which returns a reference, requires an external, user
provided lock in a context where a non COW thread doesn't require one.

Quote:
Second, what you actually want (as you noted) is not a more
thread-safe implementation, but rather a more thread-friendly
design. Specifically, you're really asking for a redesigned
std::string that offers operations that can be knowably non-mutating
(unlike op[], which falls into the category of possibly-mutating
operations), so that you can document guarantees like "you don't need
to do external locking if there are only reader threads and on
(possibly-)writer threads" for more useful common cases.

That too. What I think we really need is that the standard address the
threading issue. My own preference is for the guarantees given by the
SGI implementation and Posix -- maybe because they are the only
guarantees I've actually seen clearly specified. And I'd like to see an
interface to std::string which would allow these guarantees with a COW
implementation, at least in the presence of an atomic increment or some
such.

Quote:
Incidentally (and I know James probably knows this, but for
completeness), note that if you have _all_ non-mutating operations
(i.e., an immutable string type) then the whole issue of locking
disappears. That's why other languages/frameworks have gone that
way. Of course, there are corresponding disadvantages (e.g., every
string operation makes a new string, which has interesting performance
consequences as well as interesting implementation consequences such
as that you implement += in terms of + instead of the
std::string-"normal" other way around).

My personal feeling is that in a correctly *designed* string interface,
the only non const functions would be assignment operators. But that's
from a design point of view -- as you say, it may have certain
implications with regards to performance. (My own personal feeling is
that the performance problems can be addressed and solved without
compromizing the design. But in my own work, performance of strings has
always been more or less irrelevant, so I've not actually studied the
question in detail. So my feeling isn't really much more than simple
intuition.)

The alternative is to use proxy classes for the modification operators.
In theory, at least, a good compiler can optimize the proxy classes out,
so the performance overhead should be minimal. But that's just in
theory. I don't know how good real compilers do in practice.

Quote:
Finally, it's worth noting that std::string has even more constraints
for COW than you or I have mentioned here, and that are entirely
unrelated to thread safety. In particular, it's not clear to me how
any COW implementation of std::string can be standards-conforming
because of this situation (which I've been pointing out for years in
hallway conversations and talks, but it now occurs to me I don't think
I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation
may do a shallow copy here. The reason is that there's no way to know
whether the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2,
then str[0] must do a deep copy, which means it will allocate memory,
which means it could throw a bad_alloc exception, and that is
nonconforming because when the index is within bounds no exception may
be thrown. The standard doesn't allow an exception to be thrown here,
but no COW string implementation I know of could realistically avoid
this possibility.

§19.4.4.8/3: "No destructor operation defined in the C++ Standard
Library will throw an exception. Any other fucntions defined in the C++
Standard Library that do not have an exception-specification may throw
implementation-defined exceptions unless otherwise specified."

There is no exception specification on operator[].

With regards to whether this is a good thing, or what a quality
implementation might do... But the standard does formally allow it.

Quote:
Here COW is nonconforming, but, worse, this violation of the explicit
contract of the type is highly surprising to users when it does
happen.

Here COW is conforming. But that's all you can say for it -- for the
rest, I totally agree. If we're talking about quality implementations,
I don't think that COW is viable for any class as fundamental as
string -- IMHO, independantly of threading, I have a right to expect
that at the very least, a normal read operation cannot throw. I'm of
two minds with regards to writing, but there are certainly strong
arguments that a normal write operation to an existing object should not
throw if the value written is legal.

Ideally, what we would like is that std::string be as robust and as
simple to use as int. That's not attainable, but we should definitly
consider just how close we can come.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Bill Wade
Guest





PostPosted: Tue Sep 14, 2004 11:06 pm    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Herb Sutter <hsutter (AT) gotw (DOT) ca> wrote


Quote:
Finally, it's worth noting that std::string has even more constraints for
COW than you or I have mentioned here, and that are entirely unrelated to
thread safety. In particular, it's not clear to me how any COW
implementation of std::string can be standards-conforming because of this
situation (which I've been pointing out for years in hallway conversations
and talks, but it now occurs to me I don't think I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation may
do a shallow copy here. The reason is that there's no way to know whether
the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2, then
str[0] must do a deep copy, which means it will allocate memory, which
means it could throw a bad_alloc exception, and that is nonconforming
because when the index is within bounds no exception may be thrown. The
standard doesn't allow an exception to be thrown here,

I don't believe the words in the standard caught the intent of the
writers. It is pretty clear that the writers intended COW to be
legal. I expect that the writers understood that typical COW
implementations could throw here. I agree that the appropriate
license was not included in the standard.

Quote:
but no COW string
implementation I know of could realistically avoid this possibility.

No existing COW std::string implementation I know of avoids this
possibility. Realistically, it is to avoid.

In cases where COW is a big win the saving are in the copy (O(N)), not
in the allocation (expect fast O(1) for single-threaded). An
implementation may defer the copy without deferring the allocation.

Spare (allocated, but not yet used) heap objects (string
representations) may be maintained by either the built-in allocator
(on a per-size basis), by string objects directly, or by string
representations.

It may be most efficient (excellent memory-access behavior) to
maintain the spares in the built-in allocator, but that doesn't work
for custom allocators.

The spares can be maintained as a single-linked list in the
representation object. This is space-efficient, since the pointer can
replace the reference count.

Even more space-efficient (but less copy-efficient), is to put all
pointers in the string objects (not in the shared-heap objects). With
small-string optimizations, a modern string object typically has room
for all the pointers needed:
charT* __begin, __end;
mutable charT* __begin_owned, __end_owned;
mutable stringT* __prev, __next; // strings sharing __begin
The shared heap object consists of nothing but the actual characters,
no reference count or size information. Optionally you might add
another pointer so that the shared buffer and owned buffer could have
different allocated sizes. With this implementation, the mutable
basic_string members prevent shareable strings from living in
read-only memory.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Joshua Lehrer
Guest





PostPosted: Wed Sep 15, 2004 12:49 am    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote in message news:<d6652001.0409130518.14bbb35d (AT) posting (DOT) google.com>...
Quote:
While I'm at it, I should mention that it's important to take into
account one or two things that Herb and I have not yet mentioned, for
reasons of simplification. Basically, if the function called is
returning a reference or an iterator, not only does it need a unique
copy of the image, it needs to ensure that this copy remains unique

We do that as well. We use a word to keep track of the reference
count, with a special bit meaning that the String is no longer
referenceable.

We solved the operator[] problem by returning proxies. Sure, it isn't
std compliant, but being std compliant was not part of our spec. In
fact, our spec required us to be compliant with an old dec::compap::hp
string class.

Our current issue that can not be fixed is with "begin()" and "end()"
which suffer from the same problems as operator[].

joshua lehrer
factset research systems
NYSE:FDS

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Ben Hutchings
Guest





PostPosted: Wed Sep 15, 2004 12:55 am    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

Herb Sutter wrote:
<snip>
Quote:
Finally, it's worth noting that std::string has even more constraints for
COW than you or I have mentioned here, and that are entirely unrelated to
thread safety. In particular, it's not clear to me how any COW
implementation of std::string can be standards-conforming because of this
situation (which I've been pointing out for years in hallway conversations
and talks, but it now occurs to me I don't think I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation may
do a shallow copy here. The reason is that there's no way to know whether
the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2, then
str[0] must do a deep copy, which means it will allocate memory, which
means it could throw a bad_alloc exception, and that is nonconforming
because when the index is within bounds no exception may be thrown. The
standard doesn't allow an exception to be thrown here, but no COW string
implementation I know of could realistically avoid this possibility.
snip


How do you figure that? The standard doesn't specify whether any
particular members of basic_string call the allocator. None of the
"Throws:" paragraphs include bad_alloc among the possible exceptions.

--
Ben Hutchings
If at first you don't succeed, you're doing about average.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Herb Sutter
Guest





PostPosted: Wed Sep 15, 2004 12:56 am    Post subject: Re: Vindicated? Sutter and COW Strings Reply with quote

On 14 Sep 2004 08:22:11 -0400, David Abrahams <dave (AT) boost-consulting (DOT) com>
wrote:
Quote:
Herb Sutter <hsutter (AT) gotw (DOT) ca> writes:
Finally, it's worth noting that std::string has even more constraints for
COW than you or I have mentioned here, and that are entirely unrelated to
thread safety. In particular, it's not clear to me how any COW
implementation of std::string can be standards-conforming because of this
situation (which I've been pointing out for years in hallway conversations
and talks, but it now occurs to me I don't think I've written about):

string s1( "Hello" );
string s2( s1 );

According to the standard, I don't believe a conforming implementation may
do a shallow copy here. The reason is that there's no way to know whether
the very next use of the string will mention op[]:

s1[0]; // or, s2[0]

If string did use COW and did a shallow copy when constructing s2, then
str[0] must do a deep copy, which means it will allocate memory, which
means it could throw a bad_alloc exception, and that is nonconforming
because when the index is within bounds no exception may be thrown.

Since when? Standard references please!

I think there may be reasons that a conforming COW string is
impossible, but this ain't one of them, AFAICT.

Ah, I forgot string []/at mirrored vector's. (We must fix that, but that's
a separate issue.) So just change [] to at:

s1.at(0); // or, s2.at(0)

Now it's clear that the above is not allowed to throw, but could do so on
a COW implementation.

For those keeping score at home, this is because of wording in the
standard that Dave himself was instrumental in crafting:

17.4.4.8(1)
Any of the functions defined in the C + + Standard Library can
report a failure by throwing an exception of the type(s) described
in their Throws: paragraph and/or their exception-specification
(15.4). An implementation may strengthen the exception-
specification for a non-virtual function by removing listed
exceptions. 175)

Footnote 175:
That is, an implementation of the function will have an explicit
exception-specification that lists fewer exceptions than those
specified in this International Standard. It may not, however,
change the types of exceptions listed in the exception-
specification from those specified, nor add others.

In short, the standard library, any operation with a "Throws:" paragraph
may throw only listed exceptions. For basic_string<>::at, we have:

21.3.4(3):
Throws: out_of_range if pos >= size().

That is, at may not throw an exception of any type besides out_of_range,
and it may throw no exceptions if pos < size(). That being the case here,
COW cannot be a conforming implementation of std::string AFAICS because of
the given sequence:

string s1( "Anything" );
string s2( s1 ); // this cannot be a shallow copy
s2.at(0); // because this might be the immediately following
// operation, and it cannot throw anything and must
// return a true reference into the string's buffer
s2.at(0) = 'x'; // and in particular this must not mutate s1

I know of no way of making all the above standard-required behavior
guaranteed to be always true in the presence of COW.

Herb

---
Herb Sutter (www.gotw.ca)

Convener, ISO WG21 (C++ standards committee) (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
Architect, Developer Division, Microsoft (www.gotw.ca/microsoft)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Goto page 1, 2, 3, 4, 5  Next
Page 1 of 5

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.