C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

passing data between threads
Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
wkaras@yahoo.com
Guest





PostPosted: Fri Jun 03, 2005 7:50 am    Post subject: passing data between threads Reply with quote



Here's a riduculously-over-simplified example of data passing
between two threads:

// Show changes to 't' to other threads.
template <typename T>
inline void observable(T &t)
{
char *p = reinterpret_cast<char *>(&t);
for(unsigned i = 0; i < unsigned(sizeof(T)); p++, i++)
* reinterpret_cast * reinterpret_cast<const char *>(p);
}

// See changes to 't' made by other threads.
template <typename T>
inline void observe(T &t)
{ static_cast<void>(static_cast<volatile T &>(t)); }

class Data
{
private:
int a_, b_;

public:

Data(int aa = 0, int bb = 0) : a_(aa), b_(bb) { }

int a(void) const { return(a_); }

int b(void) const { return(b_); }
};

Data data;

volatile bool ready = false;

void producer_thread(void)
{
data = Data(5, 10);

observable(data);

ready = true;
}

void consumer_thread(Data &d)
{
while (!ready)
;

observe(data);

d = data;
}

GCC compiled this fairly optimally, but, of course, the 'data'
variable was unecessarily written byte-by-byte in producer_thread().

The obvious alternative is to make 'data' volatile as well
as 'ready'. But then I'd need to write the member function:

Data & Data:operator = (const Data &d) volatile;

because the compiler doesn't generate a default version.
(Is this going to change in the next Standard revision?)
But aside from that issue, making 'data' volatile could
unecessarily prevent alot of other optimization.

Anyone found a better solution (that's safe and
Standard-compliant)?


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Maxim Yegorushkin
Guest





PostPosted: Fri Jun 03, 2005 12:52 pm    Post subject: Re: passing data between threads Reply with quote



On Fri, 03 Jun 2005 11:50:11 +0400, [email]wkaras (AT) yahoo (DOT) com[/email] <wkaras (AT) yahoo (DOT) com>
wrote:

Quote:
Here's a riduculously-over-simplified example of data passing
between two threads:

// Show changes to 't' to other threads.
template <typename T
inline void observable(T &t)
{
char *p = reinterpret_cast for(unsigned i = 0; i < unsigned(sizeof(T)); p++, i++)
* reinterpret_cast * reinterpret_cast<const char *>(p);
}

// See changes to 't' made by other threads.
template <typename T
inline void observe(T &t)
{ static_cast

[]

googling groups you can find long threads about misusing volatiles for
multithreaded synchronization / forcing memory visibility. The bottom line
is that if you code requires volatile to work in a multithreaded
environment it is nonportable and might be brittle.

Quote:
GCC compiled this fairly optimally, but, of course, the 'data'
variable was unecessarily written byte-by-byte in producer_thread().

[]

Quote:
Anyone found a better solution (that's safe and
Standard-compliant)?

The standard is silent on the subject of multithreading.

You probably should use another standard, like POSIX, which provides
certain guaranties regarding multithreading. Under POSIX compliant system,
volatile is neither necessary nor sufficient for ensuring proper
synchronization and memory visibility. Use mutexes in portable code to
achieve the goal.

http://www.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html

--
Maxim Yegorushkin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Maciej Sobczak
Guest





PostPosted: Fri Jun 03, 2005 12:56 pm    Post subject: Re: passing data between threads Reply with quote



Hi,

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:

Quote:
Here's a riduculously-over-simplified example of data passing
between two threads:

Anyone found a better solution (that's safe and
Standard-compliant)?

You cannot reason really anything about how your code behaves in
multithreaded environment, because the (current) C++ standard does not
specify it. Instead (or rather in addition), you have to rely on other
standards and specifications. If you use GCC, then there is a chance
that you are targeting some kind of Unix system, where pthreads can be
used for threading. If that's the case, then your code is bad and may
not work at all. Post your code on comp.programming.threads to learn why.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Torsten Robitzki
Guest





PostPosted: Fri Jun 03, 2005 12:56 pm    Post subject: Re: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
Here's a riduculously-over-simplified example of data passing
between two threads:
snip
Anyone found a better solution (that's safe and
Standard-compliant)?

Busy waiting is in most cases really evil. You should have a google for
mutex and condition variables. With this two devices one can build quit
easy a consumer/producer queue. Have a look at
http://www.boost.org/doc/html/threads.html for a platform independent
implementation.

regards
Torsten

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
simont
Guest





PostPosted: Fri Jun 03, 2005 12:59 pm    Post subject: Re: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
Here's a riduculously-over-simplified example of data passing
between two threads:

snipped: use of volatile to synchronise data between threads

Quote:
Anyone found a better solution (that's safe and
Standard-compliant)?

Since the current standard doesn't touch on threads at all, you're
already outside its remit.

The only way to use multiple threads is to use a system interface (such
as POSIX threads or Win32) which provides MT and synchronisation
primitives and also specifies the rules for memory visibility, or one
of the wrapper libraries that build on these.

Using volatile doesn't achieve anything very useful wrt to memory
visibility or synchronisation (just search c.l.c++.m or
comp.programming.threads for 'volatile' if you want to see all the
other people who suggested it being shot down). You may find it works
for you on a uniprocessor, but it is certainly neither safe nor
standard-compliant.

The synchronisation aspect of the volatile keyword in Java could, in
principle, be mandated by a C or C++ threading API too (this would
require the compiler to insert extra code around volatile data accesses
when using such an API). I don't know of any that do it, though, and
volatile certainly doesn't do that out of the box in C++.


HTH,
Simon.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
wkaras@yahoo.com
Guest





PostPosted: Sat Jun 04, 2005 2:33 am    Post subject: Re: passing data between threads Reply with quote



Maxim Yegorushkin wrote:
Quote:
On Fri, 03 Jun 2005 11:50:11 +0400, [email]wkaras (AT) yahoo (DOT) com[/email] <wkaras (AT) yahoo (DOT) com
wrote:

Here's a riduculously-over-simplified example of data passing
between two threads:

// Show changes to 't' to other threads.
template inline void observable(T &t)
{
char *p = reinterpret_cast for(unsigned i = 0; i < unsigned(sizeof(T)); p++, i++)
* reinterpret_cast * reinterpret_cast<const char *>(p);
}

// See changes to 't' made by other threads.
template <typename T
inline void observe(T &t)
{ static_cast
[]

googling groups you can find long threads about misusing volatiles for
multithreaded synchronization / forcing memory visibility. The bottom line
is that if you code requires volatile to work in a multithreaded
environment it is nonportable and might be brittle.

GCC compiled this fairly optimally, but, of course, the 'data'
variable was unecessarily written byte-by-byte in producer_thread().

[]

Anyone found a better solution (that's safe and
Standard-compliant)?

The standard is silent on the subject of multithreading.

You probably should use another standard, like POSIX, which provides
certain guaranties regarding multithreading. Under POSIX compliant system,
volatile is neither necessary nor sufficient for ensuring proper
synchronization and memory visibility. Use mutexes in portable code to
achieve the goal.

http://www.opengroup.org/onlinepubs/009695399/basedefs/pthread.h.html

--
Maxim Yegorushkin

In a strict sense, all of your comments are correct. Now let me start
another flame tangent by saying that there are situations where polling
loops are the best or at least the most expedient design. When using
polling loops, data structures such as ping-pong buffers or circular
queues that rely on volatile variables for proper mutual exclusion
and data passing are often the most efficient and portable solution.

Let's use this definition of threads: Multiple C++ virtual machines
but with the same memory and program load, with simultaneous
reads/writes atomic and arbitrarily serialized. Each thread
starts execution of some function at an arbitrary time after the
completion of static initialization.

Could a C++ implementation define "volatile read" and "volatile write"
in such a way that they would not be usable for data passing between
threads? The Standard's pretty wishy-washy about defining the
virtual machine it depends on. You might be able to create C++
implementation that was technically compliant but completely useless
for anything.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Maxim Yegorushkin
Guest





PostPosted: Sat Jun 04, 2005 12:29 pm    Post subject: Re: passing data between threads Reply with quote

On Sat, 04 Jun 2005 06:33:44 +0400, [email]wkaras (AT) yahoo (DOT) com[/email] <wkaras (AT) yahoo (DOT) com>
wrote:

[]

Quote:
In a strict sense, all of your comments are correct. Now let me start
another flame tangent by saying that there are situations where polling
loops are the best or at least the most expedient design. When using
polling loops, data structures such as ping-pong buffers or circular
queues that rely on volatile variables for proper mutual exclusion
and data passing are often the most efficient and portable solution.

The problem is that volatile does not guarantee change visibility between
threads. You should use something like __exchange_and_add to ensure you
get variables updates in proper time.

Quote:
Let's use this definition of threads: Multiple C++ virtual machines
but with the same memory and program load, with simultaneous
reads/writes atomic and arbitrarily serialized. Each thread
starts execution of some function at an arbitrary time after the
completion of static initialization.

Read/write atomicity do not imply visibility. What's why you need memory
barriers.

--
Maxim Yegorushkin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
wkaras@yahoo.com
Guest





PostPosted: Sat Jun 04, 2005 6:09 pm    Post subject: Meaning of "volatile" (Was: passing data between threads) Reply with quote

Maxim Yegorushkin wrote:
Quote:
On Sat, 04 Jun 2005 06:33:44 +0400, [email]wkaras (AT) yahoo (DOT) com[/email] wrote:

[]

In a strict sense, all of your comments are correct. Now let me start
another flame tangent by saying that there are situations where polling
loops are the best or at least the most expedient design. When using
polling loops, data structures such as ping-pong buffers or circular
queues that rely on volatile variables for proper mutual exclusion
and data passing are often the most efficient and portable solution.

The problem is that volatile does not guarantee change visibility between
threads. You should use something like __exchange_and_add to ensure you
get variables updates in proper time.

Let's use this definition of threads: Multiple C++ virtual machines
but with the same memory and program load, with simultaneous
reads/writes atomic and arbitrarily serialized. Each thread
starts execution of some function at an arbitrary time after the
completion of static initialization.

Read/write atomicity do not imply visibility. What's why you need memory
barriers.

Here are some highly portable assumptions people, at least
in the embedded world, typically make about what "volatile"
does:

1) It forces the compiler to reserve memory for a variable
(that is, to not implicitly make it a "register" variable).

2) At any given sequence point, if the value of the variable
has been used since the last sequence point, the value will
have been read once from the variable's reserved location in
memory.

3) At any given sequence point, if the value of the variable
has been changed since the last sequence point, the value will
have been written once to the variable's reservce location in
memory.

4) If thread A writes to a memory location, and thread B
then reads that memory location, B will read the value
written by A.

5) Simultaneous memory reads/writes by multiple threads all
complete properly but in arbitrary order.

consider this code:

sempaphore_t my_sem(0);
int i;

void producer_thread(void)
{
// ...
i = 5;
sem_incr(my_sem);
// ...
}

void consumer_thread(void)
{
// ...
sem_decr(my_sem);
do_somthing(2 * i);
// ...
}

Since sem_incr() probably is (or calls) a function outside the
implementation unit, the compiler will likely generate code
to write all non-stack variables to memory before calling sem_incr().
This is because it's theoretically possible that sem_incr()'s
implementation unit contains "extern int i", and sem_incr()
uses í's value. But there's nothing in the Standard to stop
the compiler from optimizing over the entire program load,
not just over each implementation unit. I have not yet seen
any compilers that optimize over the whole load. But if the
current stagnation in increase of CPU clock speeds continues,
the incentive to write such compilers will increase. If
optimization over the entire program load is possible, it
would be necessary to make "i" volatile in the above example.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
James Kanze
Guest





PostPosted: Sat Jun 04, 2005 6:14 pm    Post subject: Re: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
Maxim Yegorushkin wrote:

[...]
Quote:
In a strict sense, all of your comments are correct. Now let
me start another flame tangent by saying that there are
situations where polling loops are the best or at least the
most expedient design. When using polling loops, data
structures such as ping-pong buffers or circular queues that
rely on volatile variables for proper mutual exclusion and
data passing are often the most efficient and portable
solution.

As I pointed out in my response (which doesn't seem to have
appeared yet), polling loops don't always work. They can be an
efficient solution in some embedded applications, where what is
being polled depends on a hardware interupt, or something
similar. They don't work at all under Posix; depending on the
scheduling model and the relative priorities of the thread, you
can easily end up with an endless loop. And even when the
scheduling model and priorities mean that the polling loop will
work, it is exceptionally inefficient. (I'm talking here of a
polling loop such as yours. It is quite frequent to "poll" by
setting a timeout on a socket read, for example, to wake up once
a second to look at a flag.)

The seoncd point is that, as has been pointed out, the full
definition of volatile is implementation dependant, and the
actual definitions in most compilers for general purpose
machines (VC++, Sun CC and g++, at least, to name those I've
verified) is simply useless. For all intents and purposes, at
least with these compilers, you may consider that declaring a
variable volatile (or accessing it as volatile) has no practical
effect in a multiprocessor environment.

Quote:
Let's use this definition of threads: Multiple C++ virtual
machines but with the same memory and program load, with
simultaneous reads/writes atomic and arbitrarily serialized.
Each thread starts execution of some function at an arbitrary
time after the completion of static initialization.

Interesting definition, but what use is it if it doesn't
correspond to any implementation. Neither Windows nor Posix nor
Linux guarantee anything with regards to atomicity. And I'm not
sure what you mean be "simulatneous" and "arbitrarily
serialized"; neither Windows nor Posix make any guarantees
concerning the ordering writes from one thread are seen when
reading in another thread. Probably for the simple reason that
most hardward today doesn't make any guarantees.

Quote:
Could a C++ implementation define "volatile read" and
"volatile write" in such a way that they would not be usable
for data passing between threads?

"Could", according to what critera? One could, I think, argue
very strongly that such would be against the intent of the
standard. The fact remains that at every C++ compiler I've
personally verified (VC++, Sun CC and g++) defines volatile
access in such a way as to make it completely useless. Not just
for threading, but also for things it was definitly designed
for, like memory mapped IO. Or more exactly: all compilers
define an "access" as the execution of a hardware load or store
instruction, with nothing further. The Sparc architecture
manual explicitly says that this is *not* sufficient if the
memory is accessed by another "processor" (where whatever is
controlled by memory mapped IO is considered a processor). From
what I understand, this is pretty much the case with Intel 32
bit architecture as well.

Quote:
The Standard's pretty wishy-washy about defining the virtual
machine it depends on. You might be able to create C++
implementation that was technically compliant but completely
useless for anything.

That's an entirely different issue. And obviously, the standard
cannot require usability, and conformance doesn't guarantee
usefulness.

With regards to volatile, I think that there is a very strong
argument that a compiler has not met the intent of the standard
when volatile isn't sufficient for memory mapped IO. Where
threads are concerned, however, both the C and the C++ standards
are totally silent. Posix, however, does specify how a C
compiler should behave (and that specification does not involve
volatile in any way), and it seems reasonable to me to extend
these requirements to C++. Of course, that still leaves some
open points: dynamic initialization of objects with static
lifetime, or how pthread_cancel is mapped. But none of these
concern any issue raised by your code.

--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Maxim Yegorushkin
Guest





PostPosted: Sat Jun 04, 2005 10:30 pm    Post subject: Re: Meaning of "volatile" (Was: passing data between threads Reply with quote

On Sat, 04 Jun 2005 22:09:15 +0400, [email]wkaras (AT) yahoo (DOT) com[/email] <wkaras (AT) yahoo (DOT) com>
wrote:

[]

Quote:
Let's use this definition of threads: Multiple C++ virtual machines
but with the same memory and program load, with simultaneous
reads/writes atomic and arbitrarily serialized. Each thread
starts execution of some function at an arbitrary time after the
completion of static initialization.

Read/write atomicity do not imply visibility. What's why you need memory
barriers.

Here are some highly portable assumptions people, at least
in the embedded world, typically make about what "volatile"
does:

[]

Quote:
4) If thread A writes to a memory location, and thread B
then reads that memory location, B will read the value
written by A.

This one is a wrong assumption. At least Intel x86 processors
documentation says that you need to uses memory barriers for another
processor see the change.

Quote:
5) Simultaneous memory reads/writes by multiple threads all
complete properly but in arbitrary order.

This is also wrong.

[]

Quote:
Since sem_incr() probably is (or calls) a function outside the
implementation unit, the compiler will likely generate code
to write all non-stack variables to memory before calling sem_incr().
This is because it's theoretically possible that sem_incr()'s
implementation unit contains "extern int i", and sem_incr()
uses í's value. But there's nothing in the Standard to stop
the compiler from optimizing over the entire program load,
not just over each implementation unit. I have not yet seen
any compilers that optimize over the whole load. But if the
current stagnation in increase of CPU clock speeds continues,
the incentive to write such compilers will increase. If
optimization over the entire program load is possible, it
would be necessary to make "i" volatile in the above example.

Cross module optimization has already been implemented. Check out inter
procedural optimization (IPO) from Intel and link time code generation
from MS.

--
Maxim Yegorushkin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
wkaras@yahoo.com
Guest





PostPosted: Sun Jun 05, 2005 10:23 am    Post subject: Re: passing data between threads Reply with quote

James Kanze wrote:
Quote:
wkaras (AT) yahoo (DOT) com wrote:
Maxim Yegorushkin wrote:

[...]
In a strict sense, all of your comments are correct. Now let
me start another flame tangent by saying that there are
situations where polling loops are the best or at least the
most expedient design. When using polling loops, data
structures such as ping-pong buffers or circular queues that
rely on volatile variables for proper mutual exclusion and
data passing are often the most efficient and portable
solution.

As I pointed out in my response (which doesn't seem to have
appeared yet), polling loops don't always work. They can be an
efficient solution in some embedded applications, where what is
being polled depends on a hardware interupt, or something
similar. They don't work at all under Posix; depending on the
scheduling model and the relative priorities of the thread, you
can easily end up with an endless loop. And even when the
scheduling model and priorities mean that the polling loop will
work, it is exceptionally inefficient. (I'm talking here of a
polling loop such as yours. It is quite frequent to "poll" by
setting a timeout on a socket read, for example, to wake up once
a second to look at a flag.)

I was wrong to indicate that volatile is only important when
using polling loops. It's likely to be valuable in any
embedded system using a low-copy design where significant
amounts of data have to be processed by multiple threads.

Quote:
The seoncd point is that, as has been pointed out, the full
definition of volatile is implementation dependant, and the
actual definitions in most compilers for general purpose
machines (VC++, Sun CC and g++, at least, to name those I've
verified) is simply useless. For all intents and purposes, at
least with these compilers, you may consider that declaring a
variable volatile (or accessing it as volatile) has no practical
effect in a multiprocessor environment.

For many years in the life of C and C++ the entire language
was technically implementation dependent. But that didn't
stop careful coders from writing alot of highly portable
code. I almost get the feeling that you and Mr. Yegorushkin
think that the English language would cease to exist if
all English dictionaries were burned.

Quote:

Let's use this definition of threads: Multiple C++ virtual
machines but with the same memory and program load, with
simultaneous reads/writes atomic and arbitrarily serialized.
Each thread starts execution of some function at an arbitrary
time after the completion of static initialization.

Interesting definition, but what use is it if it doesn't
correspond to any implementation. Neither Windows nor Posix nor
Linux guarantee anything with regards to atomicity. And I'm not
sure what you mean be "simulatneous" and "arbitrarily
serialized"; neither Windows nor Posix make any guarantees
concerning the ordering writes from one thread are seen when
reading in another thread. Probably for the simple reason that
most hardward today doesn't make any guarantees.

Operating System don't worry about memory op atomicity because
any (commercially viable) CPU will take care of this in hardware.

Quote:

Could a C++ implementation define "volatile read" and
"volatile write" in such a way that they would not be usable
for data passing between threads?

"Could", according to what critera? One could, I think, argue
very strongly that such would be against the intent of the
standard. The fact remains that at every C++ compiler I've
personally verified (VC++, Sun CC and g++) defines volatile
access in such a way as to make it completely useless. Not just
for threading, but also for things it was definitly designed
for, like memory mapped IO. Or more exactly: all compilers
define an "access" as the execution of a hardware load or store
instruction, with nothing further. The Sparc architecture
manual explicitly says that this is *not* sufficient if the
memory is accessed by another "processor" (where whatever is
controlled by memory mapped IO is considered a processor). From
what I understand, this is pretty much the case with Intel 32
bit architecture as well.

The Standard's pretty wishy-washy about defining the virtual
machine it depends on. You might be able to create C++
implementation that was technically compliant but completely
useless for anything.

That's an entirely different issue. And obviously, the standard
cannot require usability, and conformance doesn't guarantee
usefulness.

With regards to volatile, I think that there is a very strong
argument that a compiler has not met the intent of the standard
when volatile isn't sufficient for memory mapped IO. Where
threads are concerned, however, both the C and the C++ standards
are totally silent. Posix, however, does specify how a C
compiler should behave (and that specification does not involve
volatile in any way), and it seems reasonable to me to extend
these requirements to C++. Of course, that still leaves some
open points: dynamic initialization of objects with static
lifetime, or how pthread_cancel is mapped. But none of these
concern any issue raised by your code.
....


foo.cpp:

#ifdef VOLATILE

volatile int i;

#else

int i;

#endif

void foo(void)
{
i;
i;
}

gcc -S -O foo.cpp

foo.s:

.file "foo.cpp"
.section .text
.p2align 4
..globl __Z3foov
__Z3foov:
LFB1:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
popl %ebp
ret
LFE1:
..globl _i
.section .bss
.p2align 2
_i:
.space 4
.ident "GCC: (GNU) 3.0.2"

gcc -S -O -DVOLATILE foo.cpp

foo.s:

.file "foo.cpp"
.section .text
.p2align 4
..globl __Z3foov
__Z3foov:
LFB1:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
movl _i, %eax
movl _i, %eax
popl %ebp
ret
LFE1:
..globl _i
.section .bss
.p2align 2
_i:
.space 4
.ident "GCC: (GNU) 3.0.2"

This looks correct to me. Can you give an example
of code using volatile that compiles in a way you
think is bad?

If the target only has one processor core, I
think my list of assumptions will apply without
complications to threads within a single process, or
to processes and ISRs in a real-time OS with no virtual
memory.

If the target has multiple processing cores, each with
it's own data cache, then typically the cache is write-
through, or each core snoops the writes by the other cores
to their caches. I think SMP Linux requires this sort
of "transparent" caching if each symmetric processor
has its own data cache.

If there are multiple processing cores with separate
but non-transparent data caching, this is a problem.
You'd need to try to put all volatile variables used
by multiple cores (or that correspond to registers of
memory-mapped peripherals) in a certain range of
memory, and turn off caching for that range of
memory.

"Bus arbitration" is typically how atomic memory
operations with automatic serialization is acheived
when there are multiple cores (or DMA-capable devices).
It's pretty typical from what I've seen.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
axter
Guest





PostPosted: Sun Jun 05, 2005 10:25 am    Post subject: Re: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
Here's a riduculously-over-simplified example of data passing
between two threads:
The obvious alternative is to make 'data' volatile as well
as 'ready'. But then I'd need to write the member function:

Data & Data:operator = (const Data &d) volatile;

because the compiler doesn't generate a default version.
(Is this going to change in the next Standard revision?)
But aside from that issue, making 'data' volatile could
unecessarily prevent alot of other optimization.

Anyone found a better solution (that's safe and
Standard-compliant)?

I recommend using a method like that in the following link:
http://code.axter.com/sync_ptr.h Also http://code.axter.com/sync_ctrl.h

The sync_ptr class is a smart pointer class with built-in
synchronization logic.

The logic is similar to that found in Modern C++ Design by Andrei
Alexandrescu. He has a chapter on smart pointers, and talks about
using smart pointers for synchronization in multithread applications.

Another method that you might want to consider is the automic_counter.
See following link:
http://www.dlugosz.com/Repertoire/refman/Classics/atomic_counter_whitepaper.html


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Uenal Mutlu
Guest





PostPosted: Sun Jun 05, 2005 10:26 am    Post subject: Re: passing data between threads Reply with quote

"James Kanze" <kanze (AT) none (DOT) news.free.fr> wrote
Quote:
wkaras (AT) yahoo (DOT) com wrote:
Maxim Yegorushkin wrote:

[...]
The seoncd point is that, as has been pointed out, the full
definition of volatile is implementation dependant, and the
actual definitions in most compilers for general purpose
machines (VC++, Sun CC and g++, at least, to name those I've
verified) is simply useless. For all intents and purposes, at
least with these compilers, you may consider that declaring a
variable volatile (or accessing it as volatile) has no practical
effect in a multiprocessor environment.
....
The fact remains that at every C++ compiler I've
personally verified (VC++, Sun CC and g++) defines volatile
access in such a way as to make it completely useless. Not just
for threading, but also for things it was definitly designed
for, like memory mapped IO. Or more exactly: all compilers
define an "access" as the execution of a hardware load or store
instruction, with nothing further. The Sparc architecture
manual explicitly says that this is *not* sufficient if the
memory is accessed by another "processor" (where whatever is
controlled by memory mapped IO is considered a processor). From
what I understand, this is pretty much the case with Intel 32
bit architecture as well.

'volatile' is IMO useful in following MT scenarios:

1) On multiple CPUs (SMP):
In a locked region, ie if one has already a lock on the shared data.
Then the role of volatile is to reread the variable's content.
Otherwise the variable could be held in a register and then no
reread from memory would happen, which in this MT scenario
is obviously not desirable. The most important point is the first
re-reading done after each thread switch.
In this scenario it is even not strictly necessary that it should be
right aligned (though recommended for performance), since the
operations occur in a locked state.

2) On single CPU:
If the CPU does atomic r/w on a right aligned integer variable
then on a single CPU a right aligned volatile variable can be used
for atomic operations. This is true for x86 and others.
In this scenario (single CPU, right aligned volatile integer) it would
even be sufficient to use such a bool or int variable as a cheap&veryfast
mutex, counter, or flag etc.

I personally am using the first case very extensively.



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
James Kanze
Guest





PostPosted: Sun Jun 05, 2005 6:48 pm    Post subject: Re: Meaning of "volatile" (Was: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
Maxim Yegorushkin wrote:

On Sat, 04 Jun 2005 06:33:44 +0400, [email]wkaras (AT) yahoo (DOT) com[/email]
[email]wkaras (AT) yahoo (DOT) com[/email]> wrote:

[]

In a strict sense, all of your comments are correct. Now let
me start another flame tangent by saying that there are
situations where polling loops are the best or at least the
most expedient design. When using polling loops, data
structures such as ping-pong buffers or circular queues that
rely on volatile variables for proper mutual exclusion and
data passing are often the most efficient and portable
solution.

The problem is that volatile does not guarantee change
visibility between threads. You should use something like
__exchange_and_add to ensure you get variables updates in
proper time.

Let's use this definition of threads: Multiple C++ virtual
machines but with the same memory and program load, with
simultaneous reads/writes atomic and arbitrarily serialized.
Each thread starts execution of some function at an arbitrary
time after the completion of static initialization.

Read/write atomicity do not imply visibility. What's why you
need memory barriers.

Here are some highly portable assumptions people, at least in
the embedded world, typically make about what "volatile" does:

Attention: the embedded world is not the universe, and embedded
processors tend to be considerably simpler than the processors
in a general purpose machine.

Quote:
1) It forces the compiler to reserve memory for a variable
(that is, to not implicitly make it a "register" variable).

Reserving the memory isn't a problem.

Quote:
2) At any given sequence point, if the value of the variable
has been used since the last sequence point, the value will
have been read once from the variable's reserved location in
memory.

Here, the semantics are somewhat less clear. What does it mean,
"the value will have been read once from the variable's reserved
location in memory"? In a modern general purpose processor,
different processors (and thus different threads) have different
views of the memory. Unless you take specific actions, two
processors, reading the same memory location may see different
values.

Quote:
3) At any given sequence point, if the value of the variable
has been changed since the last sequence point, the value will
have been written once to the variable's reservce location in
memory.

Same question: what does "written to a location in memory" mean?

Quote:
4) If thread A writes to a memory location, and thread B then
reads that memory location, B will read the value written by
A.

That depends on a lot of factors. If thread A and thread B are
running on different processors, there's no guarantee that
processor B will not recognize that it, in fact, read the value
earlier, and use the earlier read value. (Typically, memory
loads will always be a full cache line, and if the variable
happens to be in a cache line that was read earlier, the
processor will recover the value there, and not from the main
memory.)

Quote:
5) Simultaneous memory reads/writes by multiple threads all
complete properly but in arbitrary order.

What does "complete" mean, in this context?

Note that if it means actually accessing the global memory
shared by the different processors in the system, there are not
even any guarantees concerning the order of accesses in a single
thread.

Quote:
consider this code:

sempaphore_t my_sem(0);
int i;

void producer_thread(void)
{
// ...
i = 5;
sem_incr(my_sem);
// ...
}

void consumer_thread(void)
{
// ...
sem_decr(my_sem);
do_somthing(2 * i);
// ...
}

Since sem_incr() probably is (or calls) a function outside the
implementation unit, the compiler will likely generate code to
write all non-stack variables to memory before calling
sem_incr().

That depends on what you mean by "write all non-stack variables
to memory". Most compilers I know will generate store
instructions for all stack variables (in an arbitrary order, of
course).

Quote:
This is because it's theoretically possible that sem_incr()'s
implementation unit contains "extern int i", and sem_incr()
uses í's value. But there's nothing in the Standard to stop
the compiler from optimizing over the entire program load, not
just over each implementation unit. I have not yet seen any
compilers that optimize over the whole load.

Compilers which do intermodule optimization aren't really that
rare today, although there are still a number of compilers which
don't do it. (I think that VC++, at least in its latest
versions, are capable of doing the optimization in certain
circomstances.)

Quote:
But if the current stagnation in increase of CPU clock speeds
continues, the incentive to write such compilers will
increase. If optimization over the entire program load is
possible, it would be necessary to make "i" volatile in the
above example.

As we've been trying to point out to you, unless the compiler
implementors change their attitude vis-à-vis volatile, making
the variable volatile will not be sufficient. Just generating a
simple load or store instruction does not guarantee
synchronization with the main memory; you must do more. Use an
additional membar instruction on a Sparc, prefix the instruction
with a lock prefix on an IA-32, or whatever.

--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
James Kanze
Guest





PostPosted: Sun Jun 05, 2005 6:52 pm    Post subject: Re: passing data between threads Reply with quote

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:
Quote:
James Kanze wrote:

[email]wkaras (AT) yahoo (DOT) com[/email] wrote:

Maxim Yegorushkin wrote:

[...]

In a strict sense, all of your comments are correct. Now let
me start another flame tangent by saying that there are
situations where polling loops are the best or at least the
most expedient design. When using polling loops, data
structures such as ping-pong buffers or circular queues that
rely on volatile variables for proper mutual exclusion and
data passing are often the most efficient and portable
solution.

As I pointed out in my response (which doesn't seem to have
appeared yet), polling loops don't always work. They can be
an efficient solution in some embedded applications, where
what is being polled depends on a hardware interupt, or
something similar. They don't work at all under Posix;
depending on the scheduling model and the relative priorities
of the thread, you can easily end up with an endless loop.
And even when the scheduling model and priorities mean that
the polling loop will work, it is exceptionally inefficient.
(I'm talking here of a polling loop such as yours. It is
quite frequent to "poll" by setting a timeout on a socket
read, for example, to wake up once a second to look at a
flag.)

I was wrong to indicate that volatile is only important when
using polling loops. It's likely to be valuable in any
embedded system using a low-copy design where significant
amounts of data have to be processed by multiple threads.

Embedded systems represent a particular domain where I would
expect (hope) volatile to be useful. Most embedded processors
have significantly simpler architectures that the typical
general purpose processor, and given the targetted applications,
I would expect a compiler to do what ever was necessary if they
didn't. (Of course, I would also expect a compiler for a Sparc
to do whatever was necessary for memory mapped IO to work if the
access was volatile. Neither Sun CC nor g++ do, however.)

Quote:
The seoncd point is that, as has been pointed out, the full
definition of volatile is implementation dependant, and the
actual definitions in most compilers for general purpose
machines (VC++, Sun CC and g++, at least, to name those I've
verified) is simply useless. For all intents and purposes, at
least with these compilers, you may consider that declaring a
variable volatile (or accessing it as volatile) has no
practical effect in a multiprocessor environment.

For many years in the life of C and C++ the entire language
was technically implementation dependent. But that didn't
stop careful coders from writing alot of highly portable code.
I almost get the feeling that you and Mr. Yegorushkin think
that the English language would cease to exist if all English
dictionaries were burned.

Not at all. What we're trying to point out to you is that 1) it
is implementation defined, and 2) most implementations don't
define it in a manner sufficient for what you are trying to do.
(Strictly speaking, I'm being a bit sloppy in my use of most,
since I really only know the details of three implementations,
on two platforms. But in past discussions here, comments from
others lead me to believe that there is nothing unusual about
these three implementatinos.)

Quote:
Let's use this definition of threads: Multiple C++ virtual
machines but with the same memory and program load, with
simultaneous reads/writes atomic and arbitrarily serialized.
Each thread starts execution of some function at an arbitrary
time after the completion of static initialization.

Interesting definition, but what use is it if it doesn't
correspond to any implementation. Neither Windows nor Posix
nor Linux guarantee anything with regards to atomicity. And
I'm not sure what you mean be "simulatneous" and "arbitrarily
serialized"; neither Windows nor Posix make any guarantees
concerning the ordering writes from one thread are seen when
reading in another thread. Probably for the simple reason
that most hardward today doesn't make any guarantees.

Operating System don't worry about memory op atomicity because
any (commercially viable) CPU will take care of this in
hardware.

It's not just a problem of atomicity (although that can also be
an issue). It's a problem of synchronization, of going all the
way to the main, shared memory before allowing any other
accesses. Sparc doesn't guarantee this. Nor does Intel IA-32.
(And I can't imagine any definition of "commercially viable"
which would exclude Intel IA-32.) Both processors provide
special hardware mechanisms to ensure synchronisation, when it
is important, but such mechanisms (membar instruction, lock
prefix) must be explicitly used.

And none of the compilers I know for these two processors
generate code which uses the necessary mechanisms. Even for
volatile accesses.

Quote:
Could a C++ implementation define "volatile read" and
"volatile write" in such a way that they would not be usable
for data passing between threads?

"Could", according to what critera? One could, I think, argue
very strongly that such would be against the intent of the
standard. The fact remains that at every C++ compiler I've
personally verified (VC++, Sun CC and g++) defines volatile
access in such a way as to make it completely useless. Not
just for threading, but also for things it was definitly
designed for, like memory mapped IO. Or more exactly: all
compilers define an "access" as the execution of a hardware
load or store instruction, with nothing further. The Sparc
architecture manual explicitly says that this is *not*
sufficient if the memory is accessed by another "processor"
(where whatever is controlled by memory mapped IO is
considered a processor). From what I understand, this is
pretty much the case with Intel 32 bit architecture as well.

The Standard's pretty wishy-washy about defining the virtual
machine it depends on. You might be able to create C++
implementation that was technically compliant but completely
useless for anything.

That's an entirely different issue. And obviously, the
standard cannot require usability, and conformance doesn't
guarantee usefulness.

With regards to volatile, I think that there is a very strong
argument that a compiler has not met the intent of the
standard when volatile isn't sufficient for memory mapped IO.
Where threads are concerned, however, both the C and the C++
standards are totally silent. Posix, however, does specify
how a C compiler should behave (and that specification does
not involve volatile in any way), and it seems reasonable to
me to extend these requirements to C++. Of course, that still
leaves some open points: dynamic initialization of objects
with static lifetime, or how pthread_cancel is mapped. But
none of these concern any issue raised by your code.

...

foo.cpp:

#ifdef VOLATILE

volatile int i;

#else

int i;

#endif

void foo(void)
{
i;
i;
}

gcc -S -O foo.cpp

foo.s:

.file "foo.cpp"
.section .text
.p2align 4
.globl __Z3foov
__Z3foov:
LFB1:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
popl %ebp
ret
LFE1:
.globl _i
.section .bss
.p2align 2
_i:
.space 4
.ident "GCC: (GNU) 3.0.2"

gcc -S -O -DVOLATILE foo.cpp

foo.s:

.file "foo.cpp"
.section .text
.p2align 4
.globl __Z3foov
__Z3foov:
LFB1:
pushl %ebp
LCFI0:
movl %esp, %ebp
LCFI1:
movl _i, %eax
movl _i, %eax
popl %ebp
ret
LFE1:
.globl _i
.section .bss
.p2align 2
_i:
.space 4
.ident "GCC: (GNU) 3.0.2"

This looks correct to me. Can you give an example of code
using volatile that compiles in a way you think is bad?

The above. Where are the lock prefixes, which are necessary to
ensure memory synchronization?

Quote:
If the target only has one processor core, I think my list of
assumptions will apply without complications to threads within
a single process, or to processes and ISRs in a real-time OS
with no virtual memory.

I know of no processor which doesn't guarantee coherence within
a single processor. So you're probably safe with a single
processor system. Although I'm not sure what the implications of
"hyper-threading", or whatever it is called, are -- I think in
part they make one processor look like two.

The problem is that most general purpose architectures are
designed to support multiprocessor configurations. So that even
if your application is single processor today, it might not be
tomorrow.

Quote:
If the target has multiple processing cores, each with it's
own data cache, then typically the cache is write- through, or
each core snoops the writes by the other cores to their
caches. I think SMP Linux requires this sort of "transparent"
caching if each symmetric processor has its own data cache.

It's not a question of cache, alone. At least not what is
traditionally considered cache. The fact is that when I execute
an instruction like ‹ movl %eax, i ›, nothing takes place on the
external processor bus immediately; the write command is placed
in a sort of a waiting list of outstanding memory accesses. If
I do another write to another address in the same cache line,
most modern processors will merge the two, and generate only one
physical access outside the processor. Similarly, if I execute
‹@movl i, %eax ›, the processor will mark %eax as dirty
(possibly remapping it to a different hardware register, since
some earlier instructions using the old value may not yet have
finished), and try to acquire the targetted value. If the
processor happens to have read data from the same cache line
earlier, and still has this data in one of its read registers,
it will use this data, without any external accesses, even to
cache.

(Actually, I don't think that the IA-32 goes quite this far.
But Alpha processors definitly do, and the Sparc processor model
allows it. From what I've been told, Intel IA-64 also does.)

Quote:
If there are multiple processing cores with separate but
non-transparent data caching, this is a problem.

If you consider the memory access command pipeline as a type of
cache, I don't think there is a general purpose processor around
today which guarantees full transparent caching.

Quote:
You'd need to try to put all volatile variables used by
multiple cores (or that correspond to registers of
memory-mapped peripherals) in a certain range of memory, and
turn off caching for that range of memory.

You can't turn of pipelining in the main processor. You don't
want to, either; that would slow you down by a factor of maybe
10. You can take specific steps (membar instruction, lock
prefix) to ensure synchronization at specific points in the
program, when it is important.

Quote:
"Bus arbitration" is typically how atomic memory operations
with automatic serialization is acheived when there are
multiple cores (or DMA-capable devices). It's pretty typical
from what I've seen.

But bus arbitration doesn't come into play until there is an
access cycle on the bus. The problem is that some accesses get
reordered, or even merged, before going out to the bus.

I'm most familiar with the Sparc architecture -- back when I
worked regularly on Intel platforms, the processor was an 8086,
about the only pipelining involved instruction reads, and except
for self modifying code, you didn't have to worry about such
issues. Still, in this regards, I think that the Sparc is
typical; if you want all of the gory details, you can download
the architecture manual from
http://www.sparc.org/standards.html. (The manual in question is
the "V9 (64-Bit SPARC) Architecture Book". Check out the
chapters concerning the memory model.)

--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Goto page 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Next
Page 1 of 10

 
 


Powered by phpBB © 2001, 2006 phpBB Group