 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Matthew Wilson Guest
|
Posted: Thu Aug 21, 2003 12:02 am Post subject: Initialisation of local static variables; details |
|
|
I was wondering whether anyone could shed any light on the precise
nature of the ordering of initialisation of local static variables.
Precisely, I want to know whether the hidden flag variable is
initialised _after_ the correct and complete construction of the
static instance, or whether it can be done first. (I guess I'm also
intrigued to know whether there are alternate mechanisms to a hidden
flag in use.)
My assumption is that it is after, and this appears to be borne out by
the standard (6.7.4), which says "such an object is initialized the
first time control passes through its declaration; such an object is
considered initialised upon the completion of its initialisation."
Indeed, if the flag to say "you don't need to enter here" is set
before initialisation, how would initialisation failure be handled?
Nonetheless, it appears "implementation-dependent", so I'm seeking
enlightenment.
In other words, assume that there is an OS function get_pid(), that
will always return the same value within a given process. Can we then
(thread-)safely write the following:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static int s_pid = ::get_pid();
return s_pid;
}
. . .
};
Naturally, in this currently simple form it could not be done safely
for a type with a constructor, but I'm only interested in fundamental
types that are written atomically (in other words not a double on a
32-bit processor) for the purpose of this discussion. If the "flag" is
marked after the result of get_pid() is written to s_pid, then
re-entrancy by another thread (attempting to reinitialise) is safe,
since we said that get_pid() always returns the same value.
The worst case with the expected implementation sequence would be
(roughly)
thread 1 check flag, and enter
thread 1 call get_pid()
thread 2 check flag, and enter
thread 1 set s_pid
thread 1 set flag to "No Entry"
thread 2 call get_id()
thread 1 returns the value of s_pid
thread 2 set s_pod
thread 2 set flag to "No Entry"
thread 2 returns the value of s_pid
which is benign, since both s_pid and the "flag" will be unchanged by
the second assignment.
However, if the flag is set after entry, but before initialising the
local static variable, it could (it would be extremely rare, but those
are just the kind of bugs we want to avoid at all costs, no?) be:
thread 1 check flag, and enter
thread 1 set flag to "No Entry"
thread 2 check flag, and skip
thread 1 call get_pid()
thread 2 returns the value of s_pid
thread 1 set s_pid
thread 1 returns the value of s_pid
in which cast thread 2 is returning a PID of 0 (the initialised value
of the memory on which s_pid will be intialised)
This would be bad, but I cannot see a specific part of the standard
that stipulates the first behaviour.
Thanks in advance.
Matthew Wilson
STLSoft moderator and C++ monomaniac
mailto:matthew (AT) stlsoft (DOT) org
http://www.stlsoft.org
news://news.digitalmars.com/c++.stlsoft
"I can't sleep nights till I found out who hurled what ball through
what apparatus" -- Dr Niles Crane
-------------------------------------------------------------------------------
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ben Hutchings Guest
|
Posted: Thu Aug 21, 2003 5:03 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <5d33192c.0308192011.758c93aa (AT) posting (DOT) google.com>,
Matthew Wilson wrote:
| Quote: | I was wondering whether anyone could shed any light on the precise
nature of the ordering of initialisation of local static variables.
Precisely, I want to know whether the hidden flag variable is
initialised _after_ the correct and complete construction of the
static instance, or whether it can be done first. (I guess I'm also
intrigued to know whether there are alternate mechanisms to a hidden
flag in use.)
My assumption is that it is after, and this appears to be borne out by
the standard (6.7.4), which says "such an object is initialized the
first time control passes through its declaration; such an object is
considered initialised upon the completion of its initialisation."
Indeed, if the flag to say "you don't need to enter here" is set
before initialisation, how would initialisation failure be handled?
Nonetheless, it appears "implementation-dependent", so I'm seeking
enlightenment.
|
The standard doesn't say there is a flag, but there must be some kind
of indicator. Logically this is set after initialisation, because the
standard says that if construction of the static variable throws an
exception then the initialisation will be attempted again next time
the function is called.
<snip>
[Static initialisation of types that are written atomically.]
| Quote: | The worst case with the expected implementation sequence would be
(roughly)
thread 1 check flag, and enter
thread 1 call get_pid()
thread 2 check flag, and enter
thread 1 set s_pid
thread 1 set flag to "No Entry"
thread 2 call get_id()
thread 1 returns the value of s_pid
thread 2 set s_pod
thread 2 set flag to "No Entry"
thread 2 returns the value of s_pid
snip
However, if the flag is set after entry, but before initialising the
local static variable, it could (it would be extremely rare, but those
are just the kind of bugs we want to avoid at all costs, no?) be:
snip |
I'm afraid that even if the generated code sets the flag afterwards,
the processor may re-order writes so that the change to the flag is
visible to other processors before the change to the variable. Even
the volatile qualifier is not sufficient to prevent this, apparently.
Besides which, you should not make assumptions about the way the
logical flag is implemented.
So you cannot rely on static local initialisation being safe in any
multi-threaded program without using synchronisation.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Matthew Wilson Guest
|
Posted: Sat Aug 23, 2003 3:13 am Post subject: Re: Initialisation of local static variables; details |
|
|
| Quote: | [Static initialisation of types that are written atomically.]
The worst case with the expected implementation sequence would be
(roughly)
thread 1 check flag, and enter
thread 1 call get_pid()
thread 2 check flag, and enter
thread 1 set s_pid
thread 1 set flag to "No Entry"
thread 2 call get_id()
thread 1 returns the value of s_pid
thread 2 set s_pod
thread 2 set flag to "No Entry"
thread 2 returns the value of s_pid
snip
However, if the flag is set after entry, but before initialising the
local static variable, it could (it would be extremely rare, but those
are just the kind of bugs we want to avoid at all costs, no?) be:
snip
I'm afraid that even if the generated code sets the flag afterwards,
the processor may re-order writes so that the change to the flag is
visible to other processors before the change to the variable. Even
the volatile qualifier is not sufficient to prevent this, apparently.
[Static initialisation of types that are written atomically.]
The worst case with the expected implementation sequence would be
(roughly)
thread 1 check flag, and enter
thread 1 call get_pid()
thread 2 check flag, and enter
thread 1 set s_pid
thread 1 set flag to "No Entry"
thread 2 call get_id()
thread 1 returns the value of s_pid
thread 2 set s_pod
thread 2 set flag to "No Entry"
thread 2 returns the value of s_pid
snip
However, if the flag is set after entry, but before initialising the
local static variable, it could (it would be extremely rare, but those
are just the kind of bugs we want to avoid at all costs, no?) be:
snip
I'm afraid that even if the generated code sets the flag afterwards,
the processor may re-order writes so that the change to the flag is
visible to other processors before the change to the variable. Even
the volatile qualifier is not sufficient to prevent this, apparently.
|
I read this and had a "lights-on" moment. However, in re-reading, I
remain dubious. Isn't what we're talking about equivalent to the
following bit of C:
int get_pid();
bool flag;
int pid;
int func()
{
if(!flag)
{
pid = get_pid();
flag = true;
}
return pid;
}
Given that we agree on that (which I know is not mandated in the std,
but let's just assume), you're saying that an arbitrary is allowed to
write back to memory (accessible to another thread) the value of flag
before the value of pid. Am I correct in understanding this to be what
you're saying?
And you're further saying that volatile does not help. What if we
rewrote func() to the following:
bool flag;
volatile int pid;
int func()
{
if(!flag)
{
pid = get_pid();
flag = (0 != pid);
}
return pid;
}
Surely in this case the processor *must*, for any thread, write out pid
(and read it back again) before it can use it to evaluate flag?
So looking back at the C++, what about if it was:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile bool b_init;
static volatile int s_pid;
if(!b_init)
{
s_pid = ::get_pid();
b_init = 0 != s_pid;
}
return s_pid;
}
. . .
};
Surely that's safe?
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Matthew Wilson Guest
|
Posted: Sat Aug 23, 2003 3:13 am Post subject: Re: Initialisation of local static variables; details |
|
|
| Quote: | Uuuh, "threadsavity" is not guarantted for this meachism, simply following from the
fact, that "multiple threads" are not defined in Our Standard.
|
Yes, the fact that the standard does virtually nothing to address
threading leads to precisely this conundrum. I'm moved by Ben's
statements regarding ordering, but I am skeptical that it is not
possible to acquire safety in the circumstances with the volatile
keyword. AFAIUI volatile prevents the implementation from assuming a
particular variable (i.e. piece of memory) is not modified externally,
and therefore each time it is written to/read from must it not be
cached. If that it correct, then it seems to me we can achieve safety
(see my other post, where I enhance the function def to incorporate
volatile).
It's interesting reading, but I think it's over-engineering in certain
circumstances. If you want to keep the lazy-evaluation aspect of your
singleton, then it may well be the best solution, but if all you care
about is the singleton, then ensuring a safe creation can be as simple
as
// in Singleton.h
class Singleton
{
public:
static Singleton &instance()
{
return ... your singleton here.
}
private:
. . .
};
// also in Singleton.h
class Singleton_Initialiser
{
public:
Singleton_Initialiser()
{
Singleton::instance();
}
};
static Singleton_Initialiser s_init_Singleton;
/* or within ns, if your compiler supports them */
namespace
{
Singleton_Initialiser s_init_Singleton;
}
(You can make it more unique by putting in a macro and incorporating
the line# in it, but if you have a simple naming convention, or use a
namespace, then there's no need. I have to confess I've never tested
the namespace-version, as I always have to maintain compatibility with
crappy old compilers.)
s_init_Singleton will be a separate instance in every compilation
unit, and will be initialised before any client code that will call
Singleton::instance().
Of course, it's not guaranteed that the main thread may not initialise
another object of static linkage within which is called a function in
another compilation unit that itself uses Singleton::instance(), but
this is very unlikely. Still, "unlikely" translates to "not good
enough" in some circumstances, so it has to be an informed choice.
Matthew Wilson
STLSoft moderator and C++ monomaniac
mailto:matthew (AT) stlsoft (DOT) org
http://www.stlsoft.org
news://news.digitalmars.com/c++.stlsoft
"If i'm curt with you it's because time is a factor. I think fast, I
talk fast, and I need you guys to act fast" -- Mr Wolf
-------------------------------------------------------------------------------
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Wil Evers Guest
|
Posted: Sat Aug 23, 2003 2:09 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <5d33192c.0308212126.123f422c (AT) posting (DOT) google.com>, Matthew Wilson
wrote:
| Quote: | So looking back at the C++, what about if it was:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile bool b_init;
static volatile int s_pid;
if(!b_init)
{
s_pid = ::get_pid();
b_init = 0 != s_pid;
}
return s_pid;
}
. . .
};
Surely that's safe?
|
No. There are two write operations here: one to s_pid and one to b_init.
'volatile' does not guarantee that some other thread will see these write
operations in the same order. It may therefore see a b_init that is set to
'true' and an s_pid that is still 0.
- Wil
--
Wil Evers, DOOSYS R&D, Utrecht, Holland
[Wil underscore Evers at doosys dot com]
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Matthew Wilson Guest
|
Posted: Sun Aug 24, 2003 11:25 pm Post subject: Re: Initialisation of local static variables; details |
|
|
Wil Evers <bouncer (AT) dev (DOT) null> wrote
| Quote: | In article <5d33192c.0308212126.123f422c (AT) posting (DOT) google.com>, Matthew Wilson
wrote:
So looking back at the C++, what about if it was:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile bool b_init;
static volatile int s_pid;
if(!b_init)
{
s_pid = ::get_pid();
b_init = 0 != s_pid;
}
return s_pid;
}
. . .
};
Surely that's safe?
No. There are two write operations here: one to s_pid and one to b_init.
'volatile' does not guarantee that some other thread will see these write
operations in the same order. It may therefore see a b_init that is set to
'true' and an s_pid that is still 0.
- Wil
|
Ok, let me spin a last gambit:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile int s_pid = ::get_pid();
return (0 == s_pid) ? _get_pid() : s_pid;
}
. . .
};
Naturally this grates against one's instincts, but the loop is only
going to occur at one epoch (in the life of the process). (I am aware
that we might have to insert a sleep prior to the recursive call, to
handle different thread priorities, which increases the ugliness of
course ;)
If there is a flaw (and we can be satisfied with nothing less than
multi-thread _and_ multi-processor safety), can you explain (or give
reference) as to why? I'd be especially interested in differences
between processors.
Thanks to everyone so far. You've all been most illuminating.
Yours hardwarely-challenged.
Matthew Wilson
STLSoft moderator and C++ monomaniac
mailto:matthew (AT) stlsoft (DOT) org
http://www.stlsoft.org
news://news.digitalmars.com/c++.stlsoft
"I can't sleep nights till I found out who hurled what ball through
what apparatus" -- Dr Niles Crane
-------------------------------------------------------------------------------
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Jerry Feldman Guest
|
Posted: Sun Aug 24, 2003 11:42 pm Post subject: Re: Initialisation of local static variables; details |
|
|
On 23 Aug 2003 13:49:42 -0400
[email]jtorjo (AT) yahoo (DOT) com[/email] (John Torjo) wrote:
| Quote: | So looking back at the C++, what about if it was:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile bool b_init;
static volatile int s_pid;
apply mutex lock here
if(!b_init)
{
s_pid = ::get_pid();
b_init = 0 != s_pid;
}
unlock mutex here
return s_pid;
}
. . .
};
Surely that's safe?
thread-safe maybe, but multi-processor safe, no.
Specifically, |
You need to make this atomic. The lock needs to be set BEFORE the test.
--
Jerry Feldman <gaf-nospam-at-blu.org>
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Daniel Spangenberg Guest
|
Posted: Mon Aug 25, 2003 10:35 am Post subject: Re: Initialisation of local static variables; details |
|
|
Hello, Matthew Wilson!
Matthew Wilson schrieb:
[snip]
| Quote: | I think, my posting "Singletons in multi-thread environments" will be interesting
for
you. I proposed the canonical "Meyers" singleton (which in fact is an application
of a local, static variable) under thread-save conditions my means of the
boost::call_one() function.
http://groups.google.de/groups?hl=de&lr=&ie=UTF-8&selm=3F1397FA.3910343B%40bdal.de
It's interesting reading, but I think it's over-engineering in certain
circumstances. If you want to keep the lazy-evaluation aspect of your
singleton, then it may well be the best solution, but if all you care
about is the singleton, then ensuring a safe creation can be as simple
as
// in Singleton.h
class Singleton
{
public:
static Singleton &instance()
{
return ... your singleton here.
}
private:
. . .
};
// also in Singleton.h
class Singleton_Initialiser
{
public:
Singleton_Initialiser()
{
Singleton::instance();
}
};
static Singleton_Initialiser s_init_Singleton;
/* or within ns, if your compiler supports them */
namespace
{
Singleton_Initialiser s_init_Singleton;
}
(You can make it more unique by putting in a macro and incorporating
the line# in it, but if you have a simple naming convention, or use a
namespace, then there's no need. I have to confess I've never tested
the namespace-version, as I always have to maintain compatibility with
crappy old compilers.)
s_init_Singleton will be a separate instance in every compilation
unit, and will be initialised before any client code that will call
Singleton::instance().
Of course, it's not guaranteed that the main thread may not initialise
another object of static linkage within which is called a function in
another compilation unit that itself uses Singleton::instance(), but
this is very unlikely. Still, "unlikely" translates to "not good
enough" in some circumstances, so it has to be an informed choice.
|
If your requirements are that it is OK, that your singleton is inititialized
during the initialization part of the program, than your implementation is
sufficient. Please note, that boost provides a general class template for
this. Have a look at boost/pool/detail/singleton.hpp.
My proposed implementation is just the extension of the behaviour
of Meyer's (Stroustrups, etc.) functional singleton in multithreading
environments. This means, the first requirement is "Initialize the singleton
when it is needed the first time".
Concerning the additional threadsave cleanup-method for the singleton
I proposed (and which is missing in your implementation as far as I see it):
Since threads are currently not standardized for C++ I observed in
C++ implementations we use (VC6/VC7.1), that more than one
thread may exist after main, so I needed this somewhat complicated
looking method.
Greetings from Bremen,
Daniel Spangenberg
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Wil Evers Guest
|
Posted: Mon Aug 25, 2003 10:16 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <5d33192c.0308231355.4784ea0 (AT) posting (DOT) google.com>, Matthew Wilson
wrote:
| Quote: | Ok, let me spin a last gambit:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile int s_pid = ::get_pid();
return (0 == s_pid) ? _get_pid() : s_pid;
}
. . .
};
|
This is getting more and more complicated. You're still using dynamic
initialisation of a local static, and making assumptions about how that
mechanism would behave in an MT-environment. This is all unspecified, and
therefore I don't like it.
My preferred solution is to (1) avoid dynamic initialization of local
statics, or better yet, avoid local statics completely and (2) take
advantage of the target platform's documented MT-behaviour. So for now,
I'd simply accept there is no portable solution an deal with that
accordingly.
| Quote: | If there is a flaw (and we can be satisfied with nothing less than
multi-thread _and_ multi-processor safety), can you explain (or give
reference) as to why? I'd be especially interested in differences
between processors.
|
The real experts on this subject - and some others - hang out on
comp.programming.threads. And of course, since they're experts, they
sometimes disagree .
- Wil
--
Wil Evers, DOOSYS R&D BV, Utrecht, Holland
[Wil underscore Evers at doosys dot com]
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ben Hutchings Guest
|
Posted: Tue Aug 26, 2003 7:01 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <5d33192c.0308212126.123f422c (AT) posting (DOT) google.com>,
Matthew Wilson wrote:
<snip>
| Quote: | I'm afraid that even if the generated code sets the flag afterwards,
the processor may re-order writes so that the change to the flag is
visible to other processors before the change to the variable. Even
the volatile qualifier is not sufficient to prevent this, apparently.
I read this and had a "lights-on" moment. However, in re-reading, I
remain dubious. Isn't what we're talking about equivalent to the
following bit of C:
int get_pid();
bool flag;
int pid;
int func()
{
if(!flag)
{
pid = get_pid();
flag = true;
}
return pid;
}
Given that we agree on that (which I know is not mandated in the std,
but let's just assume), you're saying that an arbitrary is allowed to
write back to memory (accessible to another thread) the value of flag
before the value of pid. Am I correct in understanding this to be what
you're saying?
|
Add "implementation" after "arbitrary" and yes, that's what I'm saying.
| Quote: | And you're further saying that volatile does not help.
|
Right. A naive reading of the standard suggests that it will help, as
it says that access to volatile storage is "observable behaviour" and
that observable behaviour happens in the specified order (whereas other
behaviour may be reordered under the as-if rule). However, the
standard only talks about a single thread of execution which can be
interrupted by signals. So in practice "observable" is interpreted as
"observable to memory-mapped I/O devices [1] and to signal handlers in
the same thread". This seems to me to be an unhelpful interpretation
as it removes the one potentially portable means of synchronisation in
the standard language, but that's just how it is.
[1] The OS should configure the processor to serialise access to
memory addresses used for MMIO. However, this cannot normally be done
for arbitrary regions of program memory.
| Quote: | What if we rewrote func() to the following:
bool flag;
volatile int pid;
int func()
{
if(!flag)
{
pid = get_pid();
flag = (0 != pid);
}
return pid;
}
Surely in this case the processor *must*, for any thread, write out
pid (and read it back again) before it can use it to evaluate flag?
|
The generated instructions must write, read and write in that order
but a processor can and quite possibly will reorder the writes. This
can be prevented by generating a "write memory barrier" after the
write to pid, but this is not done.
| Quote: | So looking back at the C++, what about if it was:
class X
{
. . .
// Implementation
private:
static int _get_pid()
{
static volatile bool b_init;
static volatile int s_pid;
if(!b_init)
{
s_pid = ::get_pid();
b_init = 0 != s_pid;
}
return s_pid;
}
. . .
};
Surely that's safe?
|
I'm afraid not.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Thu Aug 28, 2003 10:31 am Post subject: Re: Initialisation of local static variables; details |
|
|
Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote
| Quote: | Right. A naive reading of the standard suggests that it will help, as
it says that access to volatile storage is "observable behaviour" and
that observable behaviour happens in the specified order (whereas
other behaviour may be reordered under the as-if rule). However, the
standard only talks about a single thread of execution which can be
interrupted by signals. So in practice "observable" is interpreted as
"observable to memory-mapped I/O devices [1] and to signal handlers in
the same thread". This seems to me to be an unhelpful interpretation
as it removes the one potentially portable means of synchronisation in
the standard language, but that's just how it is.
|
The real "problem" with volatile is that while the standard makes
certain guarantees concerning the order of accesses, etc., the actual
definition of "access" is left up to the implementation. With all of
the compilers I know of, the definition is that the CPU executes a read
or a write instruction on the data. In a typical multiprocessor
environment today, of course, this means absolutely nothing -- it
doesn't even guarantee ordering of the accesses in the local cache.
The real question is what volatile should mean in such cases. To ensure
visibility of the accesses in the correct order in another thread, it
would be necessary to wrap each access (of a volatile) with some sort of
barrier instruction; in the worst case (although I am unaware of any
such implementations), the barrier instruction will be protected, and a
system call will be needed both before and after the access.
The results could be *very* expensive with regards to performance. On
some (most?) machines, the barrier instructions work by flushing the
cache and marking it as invalide. The entire cache. So each access to
a volatile variable will result in a series of cache misses afterwards.
The results are generally not satisfactory, either, since in typical
cases, just protecting one variable isn't sufficient -- you need to
protect a sequence of operations (and remember as well that volatile
doesn't guarantee atomic).
In sum, you can't give volatile suffient semantics to make it useful,
and going halfway just isn't worth the effort.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ben Hutchings Guest
|
Posted: Thu Aug 28, 2003 8:33 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <d6652001.0308270532.20704886 (AT) posting (DOT) google.com>,
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote in message
news:<slrnbkmvqa.1hs.do-not-spam-benh (AT) tin (DOT) bwsint.com>...
Right. A naive reading of the standard suggests that it will help, as
it says that access to volatile storage is "observable behaviour" and
that observable behaviour happens in the specified order (whereas
other behaviour may be reordered under the as-if rule). However, the
standard only talks about a single thread of execution which can be
interrupted by signals. So in practice "observable" is interpreted as
"observable to memory-mapped I/O devices [1] and to signal handlers in
the same thread". This seems to me to be an unhelpful interpretation
as it removes the one potentially portable means of synchronisation in
the standard language, but that's just how it is.
The real "problem" with volatile is that while the standard makes
certain guarantees concerning the order of accesses, etc., the actual
definition of "access" is left up to the implementation. With all of
the compilers I know of, the definition is that the CPU executes a read
or a write instruction on the data.
|
Not for a byte read/write on an Alpha, I suspect.
| Quote: | In a typical multiprocessor
environment today, of course, this means absolutely nothing -- it
doesn't even guarantee ordering of the accesses in the local cache.
|
Yes, and even if there is no cache there can be types wider than the
memory bus (e.g. long on a 16-bit processor) for which access involves
more than one memory cycle even if it's only a single instruction.
This would be atomic w.r.t signals but not atomic w.r.t threads on any
other processors.
| Quote: | The real question is what volatile should mean in such cases. To ensure
visibility of the accesses in the correct order in another thread, it
would be necessary to wrap each access (of a volatile) with some sort of
barrier instruction; in the worst case (although I am unaware of any
such implementations), the barrier instruction will be protected, and a
system call will be needed both before and after the access.
The results could be *very* expensive with regards to performance. On
some (most?) machines, the barrier instructions work by flushing the
cache and marking it as invalide. The entire cache. So each access to
a volatile variable will result in a series of cache misses afterwards.
|
There may well be machines on which memory barriers are very expensive,
but since they are required for all synchronisation between threads
those machines cannot reasonably be used for multithreaded programs.
Memory barriers normally only affect the memory read and write buffers
in the processor core. There should be no need to flush the cache.
| Quote: | The results are generally not satisfactory, either, since in typical
cases, just protecting one variable isn't sufficient -- you need to
protect a sequence of operations (and remember as well that volatile
doesn't guarantee atomic).
|
True, it doesn't guarantee atomicity. There's no thr_atomic_t
counterpart to sig_atomic_t.
Supposing there is a type of object which can be accessed atomically
though, I think any synchronisation operation can be built up from
serialised accesses to such objects.
| Quote: | In sum, you can't give volatile suffient semantics to make it useful,
and going halfway just isn't worth the effort.
|
I'm afraid you're right.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Aug 29, 2003 3:01 pm Post subject: Re: Initialisation of local static variables; details |
|
|
Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote
[...]
| Quote: | The results could be *very* expensive with regards to performance.
On some (most?) machines, the barrier instructions work by flushing
the cache and marking it as invalide. The entire cache. So each
access to a volatile variable will result in a series of cache
misses afterwards.
There may well be machines on which memory barriers are very
expensive, but since they are required for all synchronisation between
threads those machines cannot reasonably be used for multithreaded
programs.
|
Memory barriers are almost by definition expensive. Otherwise, they
would be implicitly present around every instruction. How expensive
will vary, but putting them at every sequence point is bound to slow
things down.
They are of course necessary within the synchronization primitives, like
pthread_mutex_lock. But in well designed code, these primatives aren't
invoked that often, so the effect of the barrier on performance is
negligible.
| Quote: | Memory barriers normally only affect the memory read and write buffers
in the processor core. There should be no need to flush the cache.
|
I'm not sure I understand. Processor A writes a word to the address
0x10000. The write actually goes to its cache. Processor A then issues
a memory barrier instruction. Processor B has previously read the word
at 0x10000, so it is in his cache. Processor B then issues a memory
barrier instruction, then reads the word at 0x10000. The requirements
are that it read the value which was previously written by processor A.
How can this not affect the cache?
Obviously, the hardware can be more intelligent: both Intel IA32
architectures and the Sun Sparc implementations that I know actually
write through, with all writes going immediately to main memory. In
addition, I think that the local caches "track" writes to main memory,
so as to invalidate just the single cache line, and not the entire
cache. But this sort of hardware logic carries a performance penalty:
as far as I know, it will not be present in the IA64, and the Sparc
architecture specification doesn't require it in all modes.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ben Hutchings Guest
|
Posted: Sat Aug 30, 2003 12:48 pm Post subject: Re: Initialisation of local static variables; details |
|
|
In article <d6652001.0308290101.5f0353ea (AT) posting (DOT) google.com>,
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote in message
news:<slrnbks0eo.1rc.do-not-spam-benh (AT) tin (DOT) bwsint.com>...
[...]
The results could be *very* expensive with regards to performance.
On some (most?) machines, the barrier instructions work by flushing
the cache and marking it as invalide. The entire cache. So each
access to a volatile variable will result in a series of cache
misses afterwards.
There may well be machines on which memory barriers are very
expensive, but since they are required for all synchronisation between
threads those machines cannot reasonably be used for multithreaded
programs.
Memory barriers are almost by definition expensive.
|
But not so expensive as you think.
<snip>
| Quote: | They are of course necessary within the synchronization primitives, like
pthread_mutex_lock. But in well designed code, these primatives aren't
invoked that often, so the effect of the barrier on performance is
negligible.
|
Sure, synchronisation should be done as rarely as possible without
compromising correctness. However, lock-free synchronisation (which
volatile may appear to support, but doesn't) should be cheaper than
locking synchronisation (such as with mutexes).
| Quote: | Memory barriers normally only affect the memory read and write buffers
in the processor core. There should be no need to flush the cache.
I'm not sure I understand. Processor A writes a word to the address
0x10000. The write actually goes to its cache. Processor A then issues
a memory barrier instruction.
|
It's the memory barrier instruction that forces the write to cache before
any memory access is done for the following instructions. So you have
this in the wrong order.
| Quote: | Processor B has previously read the word at 0x10000, so it is in his
cache.
|
The change in processor A's cache forces processor B to invalidate or
update the relevant line of its cache. If this were not so, objects
sharing a cache line could not be updated asynchronously, so objects in
a multithreaded program would have to be padded to fill an entire cache
line (as much as 64 bytes).
| Quote: | Processor B then issues a memory
barrier instruction, then reads the word at 0x10000. The requirements
are that it read the value which was previously written by processor A.
How can this not affect the cache?
|
It doesn't affect the whole cache.
| Quote: | Obviously, the hardware can be more intelligent: both Intel IA32
architectures and the Sun Sparc implementations that I know actually
write through, with all writes going immediately to main memory.
|
IA32 hasn't done write-through by default for a while. Perhaps you
mean they write-through shared cache lines. I suspect the Sparc is
the same, as write-through tends to be a performance drag.
| Quote: | In addition, I think that the local caches "track" writes to main
memory, so as to invalidate just the single cache line, and not the
entire cache. But this sort of hardware logic carries a performance
penalty: as far as I know, it will not be present in the IA64, and
the Sparc architecture specification doesn't require it in all
modes.
|
Practically, they have to have some kind of cache coherency protocol
to support a reasonable multi-threaded implementation of C (i.e. one
that doesn't require huge padding for everything that's shared).
There are a number of options for how to do this, with a trade-off
between speed and complexity (which correlates somewhat with cost
and power consumption).
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Sep 02, 2003 6:00 am Post subject: Re: Initialisation of local static variables; details |
|
|
Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote
| Quote: | In article <d6652001.0308290101.5f0353ea (AT) posting (DOT) google.com>,
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Ben Hutchings <do-not-spam-benh (AT) bwsint (DOT) com> wrote in message
news:<slrnbks0eo.1rc.do-not-spam-benh (AT) tin (DOT) bwsint.com>...
[...]
The results could be *very* expensive with regards to
performance. On some (most?) machines, the barrier instructions
work by flushing the cache and marking it as invalide. The
entire cache. So each access to a volatile variable will result
in a series of cache misses afterwards.
There may well be machines on which memory barriers are very
expensive, but since they are required for all synchronisation
between threads those machines cannot reasonably be used for
multithreaded programs.
Memory barriers are almost by definition expensive.
But not so expensive as you think.
|
I think we're basically disagreeing on the meaning of "very".:-)
| Quote: | snip
They are of course necessary within the synchronization primitives,
like pthread_mutex_lock. But in well designed code, these
primatives aren't invoked that often, so the effect of the barrier
on performance is negligible.
Sure, synchronisation should be done as rarely as possible without
compromising correctness. However, lock-free synchronisation (which
volatile may appear to support, but doesn't) should be cheaper than
locking synchronisation (such as with mutexes).
|
Certainly. Locking synchronization must do a lock-free synchronization,
plus manage the lock. And doing more cannot take less time than doing
less.
The real problem is that I don't know of a standardized interface for
lock-free synchronization. I know how to do it on a Sparc, using asm,
but obviously, that won't work anywhere else. I do know how to write
locking a mutex in a way which is portable to any machine implementing
Posix. I suspect that I am capable of learning to do it under Windows.
Together, that handles a lot of machines, and all of which I am
currently interested in. Managing two variants is easier than managing
how many? And the probability of the next machine supporting one of
those variants is fairly high. (In some ways, I wish this wasn't the
case; I'd love to do some hard real-time work, for example. But at my
level of experience, they always insist on experience with the actual OS
being used, and in my case, that generally means Posix.)
| Quote: | Memory barriers normally only affect the memory read and write
buffers in the processor core. There should be no need to flush
the cache.
I'm not sure I understand. Processor A writes a word to the
address 0x10000. The write actually goes to its cache. Processor
A then issues a memory barrier instruction.
It's the memory barrier instruction that forces the write to cache
before any memory access is done for the following instructions. So
you have this in the wrong order.
Processor B has previously read the word at 0x10000, so it is in
his cache.
The change in processor A's cache forces processor B to invalidate or
update the relevant line of its cache. If this were not so, objects
sharing a cache line could not be updated asynchronously, so objects
in a multithreaded program would have to be padded to fill an entire
cache line (as much as 64 bytes).
|
I understand what you mean. I presume that I have thus misunderstood
something else. Perhaps where I am saying cache, I should be referring
to something even smaller? (Which would also be a cache of sorts, but
not what we usually mean by cache.)
On the other hand, how can processor B update its cache if processor A
hasn't written through to main memory? (Supposing processor B doesn't
have access to processor A's cache.)
| Quote: | Processor B then issues a memory barrier instruction, then reads
the word at 0x10000. The requirements are that it read the value
which was previously written by processor A. How can this not
affect the cache?
It doesn't affect the whole cache.
Obviously, the hardware can be more intelligent: both Intel IA32
architectures and the Sun Sparc implementations that I know
actually write through, with all writes going immediately to main
memory.
IA32 hasn't done write-through by default for a while. Perhaps you
mean they write-through shared cache lines.
|
Perhaps. I was basing my statement on the guarantees they give in a
multiprocessor environement.
| Quote: | I suspect the Sparc is the same, as write-through tends to be a
performance drag.
|
Cache consistency in general tends to be a performance drag. How much,
I've not been able to measure -- I don't regularly have access to a
multiprocessor machine.
| Quote: | In addition, I think that the local caches "track" writes to main
memory, so as to invalidate just the single cache line, and not the
entire cache. But this sort of hardware logic carries a
performance penalty: as far as I know, it will not be present in
the IA64, and the Sparc architecture specification doesn't require
it in all modes.
Practically, they have to have some kind of cache coherency protocol
to support a reasonable multi-threaded implementation of C (i.e. one
that doesn't require huge padding for everything that's shared).
There are a number of options for how to do this, with a trade-off
between speed and complexity (which correlates somewhat with cost and
power consumption).
|
OK. I see that I'm going to have to read some more modern literature on
hardware architectures.
What I do know, of course, is that the memory barriers are necessary,
and that they are not free -- just flushing the processor's pipeline can
be an expensive operation. Beyond that, of course, the effective cost
depends on what else you are doing, and how often you are issuing the
barriers.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|