C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

C++ inlining as a multithreading optimization tecnique?
Goto page 1, 2  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
gianguz
Guest





PostPosted: Mon Dec 13, 2004 10:43 am    Post subject: C++ inlining as a multithreading optimization tecnique? Reply with quote



The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren't I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?

Gianguglielmo


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Back to top
Catalin Marinas
Guest





PostPosted: Tue Dec 14, 2004 3:31 am    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote



"gianguz" <gianguglielmo.calvi (AT) noze (DOT) it> writes:
Quote:
My opinion is that we could use inline as a mechanism to improve contex
switching performance.

Inlining doesn't have anything to do with context switching. It
optimises the function calling but the context switch overhead is the
same. It can be even greater on some architectures where the cache is
virtually indexed/virtually tagged and switching to a different
process requires flushing the whole cache. The code is larger if
inlining and the instruction cache usage is greater, taking more time
to fill.

Quote:
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.

At a context switch, the operating system automatically saves all the
current registers on it's internal structures and restores them when
returning to that thread. It always returns to the address of the next
instruction to be executed (cannot really say the same code line since
a line can take more than one CPU instructions), it _doesn't_ restart
the interrupted function. The OS doesn't have any knowledge of what
registers need to be saved and it doesn't touch the user stack.

Catalin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Ulrich Eckhardt
Guest





PostPosted: Tue Dec 14, 2004 3:31 am    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote



gianguz wrote:
Quote:
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren't I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?

1. You are mistaken to think that context-switching requires putting
anything onto the stack. The implementation of threads I know always stores
the current registers at a constant position in thread-specific storage.
So, inlining might make your program faster, but that has nothing to do
with multithreading.
2. Using too many threads (200 sounds pretty high) itself might be your
problem.
3. Before you optimize, you need to benchmark/profile your application.
Don't even bother tweaking here or there before you finished that.

Uli


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Antoun Kanawati
Guest





PostPosted: Tue Dec 14, 2004 7:53 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

gianguz wrote:
Quote:
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.

At the time of thread creation, you create a context for the thread;
the context includes the program counter, the stack pointer, the frame
pointer, and some other registers, as well as a runtime stack allocated
for the particular thread.

Suspending/resuming threads does not depend on inlining or lack thereof.
It is essentially an operation on a bunch of registers, similar in
spirit to longjmp().

Furthermore, since most thread creation APIs require a function pointer,
or an object with a specific virtual method, as the thread's entry
point, inlining will not be of any benefit to that particular activity.

The internal mechanics of threads are not as you imagined them.
--
A. Kanawati
[email]NO.antounk.SPAM (AT) comcast (DOT) net[/email]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Jonathan Bartlett
Guest





PostPosted: Tue Dec 14, 2004 8:06 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Quote:
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?

This has nothing to do with multithreading and nothing to do with
context switching. A context switch is where an application switches
from user-mode to kernel-mode, or changes which thread is active.

The short answer is yes, inlining is an optimization technique that is
useful in pretty much any environment -- multithreaded or
single-threaded, and for the reasons you mention.

Jon
----
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017

Quote:

Gianguglielmo


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Tue Dec 14, 2004 8:34 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Ulrich Eckhardt wrote:
Quote:
gianguz wrote:
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren't I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must
ensure that the advantage obtained by the parallelization is not
lesser than the overhead introduced by the context switching needed
to suspend/wake up/run threads.
My opinion is that we could use inline as a mechanism to improve
contex switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at
that code line instead of popping from the stack the current
function called with its own parameter. It seems to me a great save
of memory space and time.
Anyone has comments about that or has already experience such
solution with some result?

1. You are mistaken to think that context-switching requires putting
anything onto the stack. The implementation of threads I know always
stores the current registers at a constant position in
thread-specific
storage.

I thnk he was talking about the registers saved automatically at the
beginning of each function call, with the idea of generating the
context
switch code inline. Or course, the context switch is not a typical
function, but a system request, so 1) you could easily be right about
the registers (at any rate, they've got to be stored somewhere), and 2)
the call ultimately goes through to priviledged code in the kernel
(where the scheduler is located), so there just isn't anyway of
inlining
it.

Quote:
So, inlining might make your program faster, but that has nothing to
do with multithreading.

2. Using too many threads (200 sounds pretty high) itself might be
your problem.

I suppose it depends on the system, but I regularly use more threads
that that in my application (one thread per client connection, plus a
few service threads); I suspect that this is the case most of the time
for a server.

On the other hand, if I understood the original posting correctly, the
goal was to increase the speed of a specific, CPU intensive
calculation,
not to maintain so may distinct "contexts" as such. In that case, any
threads more than the number of CPU's available is just wasted, and
will
almost surely slow things down.

--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
GianGuz
Guest





PostPosted: Tue Dec 14, 2004 8:37 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Ulrich Eckhardt wrote:
Quote:
1. You are mistaken to think that context-switching requires putting
anything onto the stack.


I think you have to revise your knowledge about that. Threads - also
known as LWP (Light Weight Process , not a random name!) - have to
change context exactly has process even if this operation is an order
of magnitude faster than the process for the simple fact that a
processes have to save the whole data (such as globals) currently
involved in computation while threads share them so don't need to save
and restore them. Moreover I hope you known that when a function is
called processor has to perform some old known operations:

SCHEME I
1) pushing parameters into the stack to recall them when enters into
the function body
2) save the current PC has a return point to be able to return from
function
not in a random position of the code
3) entering into the function body popping out parameters and executing
the body
4) returning to the caller

Obviously the whole scheme above is recursive in respect of nested
function calls.

Quote:
The implementation of threads I know always stores
the current registers at a constant position in thread-specific
storage.


Why saving registers and data to restart computation later should not
be
named context switching?

Quote:
So, inlining might make your program faster, but that has nothing to
do
with multithreading.

I'm not soure about that. In the scheme I, an inlined function (i.e. a
piece
of contiguos code) needs only to save PC.

Quote:
2. Using too many threads (200 sounds pretty high) itself might be
your
problem.
3. Before you optimize, you need to benchmark/profile your
application.
Don't even bother tweaking here or there before you finished that.


It is not an optimization before testing it is simply a discussion
about an idea. But if you are curious you can try the following:

#include <zthread/Thread.h>
#include <zthread/ThreadedExecutor.h>
#include <string>
#include <iostream>

extern "C" {

#include <stdlib.h>
#include <time.h>
}

using namespace std;
using namespace ZThread;

#define MAX_VALUE 32000
#define STRESSING_TEST_LIMIT 10000000

#ifdef _INLINE_

#define CURRENT_INLINING inline

#else

#define CURRENT_INLINING

#endif

class MyTask : public Runnable {

private:

long* _x;
long* _y;
long* _result;

explicit MyTask() : _x(NULL) , _y(NULL) , _result(NULL) { }

CURRENT_INLINING const long k(const long a,const long b,const long c);

CURRENT_INLINING const long h(const long a,const long b,const long c);

CURRENT_INLINING const long g(const long a,const long b,const long c);

CURRENT_INLINING const long f(const long a,const long b,const long c);

CURRENT_INLINING void stressTest(const long i);

long _run(const unsigned long i) {

stressTest(i);

(*_result) = (*_x) + (*_y);

return (*_result);
}

public :

MyTask(long* x,long* y,long* result) : _x(x) , _y(y) , _result(result)
{ }

void run() {

_run(STRESSING_TEST_LIMIT);

return;
}

~MyTask() { }
};

const long MyTask::k(const long a,const long b,const long c) {

return a - b + c;
}

const long MyTask::h(const long a,const long b,const long c) {

return k((a / b) / c,(c + a) - b,(a * c) / b);
}

const long MyTask::g(const long a,const long b,const long c) {

return h(a / b, b / a, (a + b) * c);
}

const long MyTask::f(const long a,const long b,const long c) {

return g(a*b,b*a,c);
}

void MyTask::stressTest(const long i) {

for(unsigned long j=1; j<i; j++) f(i,j,0);

return;
}

int main(int argc, char* argv[]) {

if (argc > 1) {

int size;

size = atoi(argv[1]);

long* inputX = new long[size];
long* inputY = new long[size];

long* output = new long[size];

MyTask** tasks = new (MyTask*)[size];

long T;

time(&T);

srand(T);

ThreadedExecutor executor;

cout << "Creating tasks... " << endl;

for(int i=0; i
inputX[i] = rand() % MAX_VALUE;

inputY[i] = rand() % MAX_VALUE;

output[i] = 0;

tasks[i] = new MyTask(&(inputX[i]),&(inputY[i]),&(output[i]));

cout << "Input " << i << " : " << inputX[i] << "," << inputY[i] <<
endl;
}

cout << "Executing tasks... " << endl;

long start0,end0;

clock_t start1,end1;

time(&start0);

start1 = clock();

for(int i=0; i
executor.execute(tasks[i]);
}

executor.wait();

time(&end0);
end1 = clock();

cout << "Collecting results... " << endl;

for(int i=0; i
cout << "Output " << i << " : " << output[i] << endl;
}

cout << "Computation time (sec)" << (end0) - (start0) << endl;

cout << "Computation time (msec)" << (end1) - (start1) << endl;
}
}

Quote:
Uli


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Andrei Alexandrescu (See
Guest





PostPosted: Tue Dec 14, 2004 8:39 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Gianguglielmo,


(I know, I know, I owe you two emails.)

"gianguz" <gianguglielmo.calvi (AT) noze (DOT) it> wrote

Quote:
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren't I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must ensure
that the advantage obtained by the parallelization is not lesser than
the overhead introduced by the context switching needed to suspend/wake
up/run threads.

According to results shared with me by Ami Tavori, the performance of
computation-bound multithreading peaks at about 4 threads/processing unit,
and then starts to degrade because it's being flooded by context switches.

Quote:
My opinion is that we could use inline as a mechanism to improve contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at that
code line instead of popping from the stack the current function called
with its own parameter. It seems to me a great save of memory space and
time.
Anyone has comments about that or has already experience such solution
with some result?

The description is not really accurate. First, you can't create a thread
without also creating a stack for it, no matter the amount of inlining.
Then, the cost of a thread switch is largely unaffected by the size of its
stack (aside, of course, from the costs of occupying memory in general).

A thread switch means swapping the entire execution context (including the
special registers stack pointer and instruction pointer) of a thread out,
and of another thread in. A thread with a slightly smaller stack won't do
better.


Andrei



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Ben Hutchings
Guest





PostPosted: Wed Dec 15, 2004 4:40 am    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Andrei Alexandrescu (See Website for Email) wrote:
<snip>
Quote:
According to results shared with me by Ami Tavori, the performance of
computation-bound multithreading peaks at about 4 threads/processing unit,
and then starts to degrade because it's being flooded by context switches.
snip


That's surprising. Can you say why it is useful to have more than one
such thread per processor (or per virtual processor, when hardware
multithreading is used)?

--
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Allan W
Guest





PostPosted: Wed Dec 15, 2004 4:44 am    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

It's possible that GianGuz meant "context switching" in a different
sense than everyone else here is using the expression.

If the operating system doesn't provide threads, and especially if the
CPU doesn't provide support for this context, then some other type of
code has to handle this support. There is a programming technique
called "concurrent threads" where a master scheduler has a list of
tasks to do, and repeatedly calls functions to accomplish the next
piece of one of those tasks. I'm not talking about the way that Windows
3 had "cooperative multi-tasking" built into the OS -- I'm talking
about something that is done completely within user code.

I read about this technique many years ago. I've never personally used
it, so I've probably already said some things about it that are
inaccurate.

Anyway, if GianGuz meant "context switching" in the sense of jumping
from one user-written function directly to another, and then back
again... then making those functions inline might make sense...?

On the other hand, if there are 200 of these functions, and most of
them call each other... he might see the program size bloat up to 200
times it's original size or more, depending on how deeply his compiler
is willing to recurse inlines.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Back to top
George Neuner
Guest





PostPosted: Wed Dec 15, 2004 4:46 am    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

On 14 Dec 2004 15:39:37 -0500, "Andrei Alexandrescu (See Website for
Email)" <SeeWebsiteForEmail (AT) moderncppdesign (DOT) com> wrote:


Quote:
According to results shared with me by Ami Tavori, the performance of
computation-bound multithreading peaks at about 4 threads/processing unit,
and then starts to degrade because it's being flooded by context switches.

That's architecture dependent but I've seen similar analysis.
Performance degradation is usually a function of cache operation and
whether the threads' working sets can be held simultaneously. A large
multiway associative cache goes a long way.


Quote:
Then, the cost of a thread switch is largely unaffected by the size of its
stack (aside, of course, from the costs of occupying memory in general).

Also architecture dependent. Remember tasking on 8 bit CPUs?

Embedded systems still use many small chips that require stacks be
within limited address ranges. If all your stacks won't fit within
the allotted range, you have to overlay some or all of them. Then
context switching may require stack copying.

Of course, one wonders why any sane person would want to multiprogram
such devices ... other than to say "look what I did" or "someone paid
me to" 8-)

George
--
for email reply remove "/" from address

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
GianGuz
Guest





PostPosted: Wed Dec 15, 2004 1:06 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Andrei Alexandrescu (See Website for Email) wrote:
Quote:
Gianguglielmo,


(I know, I know, I owe you two emails.)

"gianguz" <gianguglielmo.calvi (AT) noze (DOT) it> wrote in message
news:1102932410.962685.195250 (AT) f14g2000cwb (DOT) googlegroups.com...
The question is about the possible use of inlining to improve
performance in a heavy multithreading environment (200+ threads).
If we have to work with applications in which threads aren't I/O
bounded or user-time bounded (i.e. windows based applications) but
they are concurrently involved in the execution of the same
parallelized task (i.e. a matrix-matrix multiplication), we must
ensure
that the advantage obtained by the parallelization is not lesser
than
the overhead introduced by the context switching needed to
suspend/wake
up/run threads.

According to results shared with me by Ami Tavori, the performance of

computation-bound multithreading peaks at about 4 threads/processing
unit,
and then starts to degrade because it's being flooded by context
switches.


Interesting, a sort of PI Greek in multithreading computation!Wink
But is also true for architecture suited with hyperthreading or similar
technologies?

Quote:
My opinion is that we could use inline as a mechanism to improve
contex
switching performance.
An inlined function doesn't need to save into the stack its calling
with its own parameter. It is simply expanded into the code with a
temporary copy of any parameter it carries. Restoring a thread
execution in a point when an inlined
function was called simply means restoring the program counter at
that
code line instead of popping from the stack the current function
called
with its own parameter. It seems to me a great save of memory space
and
time.
Anyone has comments about that or has already experience such
solution
with some result?

The description is not really accurate. First, you can't create a
thread
without also creating a stack for it, no matter the amount of
inlining.
Then, the cost of a thread switch is largely unaffected by the size
of its
stack (aside, of course, from the costs of occupying memory in
general).


I have to be more accurate. My problem does not concern threads
creation (with its stack, etc...) but the fact that the context
switching procedure (with its own backup information) executed by the
O/S Kernel during the preemption of a thread could perform better if
inlining was used to expand a functions chain
like F(G(H(K(x,y,z)))). My experience said that this is true (with good
results) under certain conditions like :

1) code bloat avoidance
2) threads executes almost the same 'piece of' inlined code (i.e. they
call the same function chain) with their own different parameters
3) functions executed have long parameters list

1,3 are also valid for inlined code into a single-threaded application,
but has I said before, I saw that performance gain with the same code
and the same rules seems to be better in the multithreading case.

Quote:
A thread switch means swapping the entire execution context
(including the
special registers stack pointer and instruction pointer) of a thread
out,
and of another thread in. A thread with a slightly smaller stack
won't do
better.


Another question arise to me at this point. Does an incresead amount of
shared memory between threads (i.e. globals) reduce also the context
switching!?Wink
More globals should mean lesser memory to save and restore during a
Thread Out / Thread In procedure!

Quote:


Andrei


Gianguglielmo
Quote:



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Andrei Alexandrescu (See
Guest





PostPosted: Thu Dec 16, 2004 1:19 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

"GianGuz" <gianguglielmo.calvi (AT) noze (DOT) it> wrote

Quote:
Interesting, a sort of PI Greek in multithreading computation!Wink
But is also true for architecture suited with hyperthreading or similar
technologies?

Yes. Hyperthreading is a technique that improves ALU utilization and reduces
memory latency. As such, it can be quite accurately seen as "classic"
parallel processing, except that the wiring is more efficient and that less
computing resources are wasted.

Quote:
I have to be more accurate. My problem does not concern threads
creation (with its stack, etc...) but the fact that the context
switching procedure (with its own backup information) executed by the
O/S Kernel during the preemption of a thread could perform better if
inlining was used to expand a functions chain
like F(G(H(K(x,y,z)))). My experience said that this is true (with good
results) under certain conditions like :

1) code bloat avoidance
2) threads executes almost the same 'piece of' inlined code (i.e. they
call the same function chain) with their own different parameters
3) functions executed have long parameters list

1,3 are also valid for inlined code into a single-threaded application,
but has I said before, I saw that performance gain with the same code
and the same rules seems to be better in the multithreading case.

Except for exceptional cases as Allan W writes, the context switching that
need be done is constant. There is a fixed amount of state that need be
swapped, and that is independent of the amount of inlining.

You need to write your experiments carefully because what you might be
measuring might be improvements brought by inlining *anyway*, orthogonal on
threading. Then, let's say you still measure some improvement even after you
normalize results. Still, that might be brought by better utilization of
some other resource (disk, memory hierarchy, I/O) brought synergically by
the speedup of each thread, and not better context switching. For example,
I'd claim that (2) is better cache utilization and not better thread
switching.

Quote:
Another question arise to me at this point. Does an incresead amount of
shared memory between threads (i.e. globals) reduce also the context
switching!?Wink
More globals should mean lesser memory to save and restore during a
Thread Out / Thread In procedure!

A qualified "sometimes". More sharing brings more contention. Read-only
sharing is lovely. Read-write sharing is problematic.

And again: this "lesser memory to save" is a total red herring. There's no
saving, there's only caching. What you mean is: "better memory locality
across threads" and "better cache coherence".


Andrei



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Andrei Alexandrescu (See
Guest





PostPosted: Thu Dec 16, 2004 1:20 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

"Ben Hutchings" <ben-public-nospam (AT) decadentplace (DOT) org.uk> wrote

Quote:
Andrei Alexandrescu (See Website for Email) wrote:
snip
According to results shared with me by Ami Tavori, the performance of
computation-bound multithreading peaks at about 4 threads/processing
unit,
and then starts to degrade because it's being flooded by context
switches.
snip

That's surprising. Can you say why it is useful to have more than one
such thread per processor (or per virtual processor, when hardware
multithreading is used)?

I've reread Ami's email and he wasn't referring to computational-bound MT,
but rather to MT that involved I/O. So the reason is that more than one
thread per processor improved utilization of another resource than the
processor itself.

Andrei



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Allan W
Guest





PostPosted: Thu Dec 16, 2004 1:41 pm    Post subject: Re: C++ inlining as a multithreading optimization tecnique? Reply with quote

Quote:
"gianguz" <gianguglielmo.calvi (AT) noze (DOT) it> wrote
My opinion is that we could use inline as a mechanism to
improve contex switching performance.
An inlined function doesn't need to save into the stack
its calling with its own parameter. It is simply expanded
into the code with a temporary copy of any parameter it
carries. Restoring a thread execution in a point when an
inlined function was called simply means restoring the
program counter at that code line instead of popping from
the stack the current function called with its own parameter.
It seems to me a great save of memory space and time.
Anyone has comments about that or has already experience
such solution with some result?

Andrei Alexandrescu (See Website for Email) wrote:
The description is not really accurate. First, you can't
create a thread without also creating a stack for it, no
matter the amount of inlining.
Then, the cost of a thread switch is largely unaffected by
the size of its stack (aside, of course, from the costs of
occupying memory in general).

GianGuz wrote:
Quote:
I have to be more accurate. My problem does not concern threads
creation (with its stack, etc...) but the fact that the context
switching procedure (with its own backup information) executed by the
O/S Kernel during the preemption of a thread

So much for my theory that GianGuz wasn't talking about OS context
switches...

Quote:
could perform better if inlining was used to expand a functions
chain like F(G(H(K(x,y,z)))).

I don't see how. Judging from other responses, nobody else does either.

Remember that context switching is supposed to be invisible to the
programs being swapped, at an assembly-language level. The OS is
responsible for saving ALL memory owned by the process (including
the stack, all read/write data, and of course the program itself)
and all of the registers (including the program counter and stack
pointer). Before the program is resumed, all of this has to be
restored exactly as it was before. How does this relate to
inline function calls?

Quote:
My experience said that this is
true (with good results) under certain conditions like :

1) code bloat avoidance

When you use inlines extensively, you usually make the program
larger, not smaller.

Quote:
2) threads executes almost the same 'piece of' inlined code (i.e.
they
call the same function chain) with their own different parameters

Not sure I understand this, but even if I do... how does this help?

Maybe you're thinking about using shared memory segments for multiple
processes? For instance, in many (not all) OS's, if two different
programs both use the same Dynamically-Loaded Library, it only has
to be loaded into physical memory once.

This ends up using less physical memory, but it has nothing to do
with context switching.

Quote:
3) functions executed have long parameters list

Again, this is just data on the stack... and the stack is just
read/write memory, not unlike any other read/write memory that has
to be retained between context switches.

Quote:
Another question arise to me at this point. Does an incresead
amount of shared memory between threads (i.e. globals) reduce
also the context switching!?Wink

If I understand you right... no, it doesn't, at least not on
systems with virtual memory.

Quote:
More globals should mean lesser memory to save and restore
during a Thread Out / Thread In procedure!

First, it doesn't mean that, not unless the globals were in a
shared read/write section. Normally each process has it's own
copy of read/write sections, so that it has it's own values...
if process A adds one to a global variable, you wouldn't want
it to affect process B!

Second, just because some data being used by a process happens
to be loaded into physical memory, doesn't mean that you don't
have to do a context switch! You have to re-load the registers
that point to this memory, etc.

At this point it might make sense to ask you what type of computer
and OS you're using. Does your system have virtual memory?

You seem to think that a "context switch" means that absolutely
every byte of the program has to be written out to a hard
disk, and then reloaded later. But even on systems without
virtual memory, this usually isn't true... you simply allocate
space for process A, and it promises not to touch any memory
not allocated for it (because this might belong to process B).


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.