 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Ingo Nolden Guest
|
Posted: Sun Apr 24, 2005 10:58 am Post subject: performance of std::vector<double>, double[] and uBlas::vect |
|
|
Dear Group,
I am a little confused by the result of a code that should give me some
information about CPU cache effects.
I wrote a function that performs some flops on a vector/array of doubles
or floats and of changing size . While playing around and trying
different things I compared the use of a standard c array with a
std::vector and the vector from the boost library.
The program was compiled with a VC++ 7.1 comiler with the default
release settings, and later with also whole program optimization and
global optimization activated ( which made no difference ).
On my laptop intel P4 2.6GHz and 512 Mbyte RAM the result was extremely
surprising as the raw c array performed as expected in a range between
300 and 350 MFlops ( if my Flops calculation is right ).
The other arrays however were about 80 times !!!! slower.
I had them expected to be probably some percentage slower.
Investigating the asm code ( as far as I can guess what it means ) seemd
to be doing the same thing however.
This made me think that it must be an issue about memory access. It can
not be due to main memory size because the difference occurs at any
array size, beginning from 1 Mbyte.
Now I wanted to prove that it is not an compiler/optimization dependant
issue. I ran the executable on a different machine, which is a AMD
Athlon 2400+ Desktop with 1Gb RAM. On this machine I got an even more
surprising result:
The std::vector and uBlas::vector performed well and even superceded the
plain c array.
I usually don't care so much about performance, but 8000% is worth
thinking about it.
Below is my source. If one has no uBlas at hand, he can comment the two
lines and it should work.
Also, to get back to my original intention, I want to change from
sequential access to arbitrary access of the vector items. Does anyone
know a good/standard way to do so? It should put additional effort on
the CPU.
So, here goes my code:
#include <iostream>
#include <fstream>
#include <vector>
#include <list>
#include <windows.h>
//#include <math.h>
//#include <float.h>
#include <boost/numeric/ublas/vector.hpp>
using namespace std;
ofstream trash( "trash.txt" );
namespace my
{
template< typename ValueT >
inline ValueT const& max( ValueT const& l, ValueT const& r )
{
return ( l > r ) ? l : r;
}
template< typename ValueT >
inline ValueT const& min( ValueT const& l, ValueT const& r )
{
return ( l < r ) ? l : r;
}
}
template< typename ValueT, typename ArrayT > inline
void InitializeArray( ArrayT array, unsigned &length, ValueT &initializer )
{
for( unsigned i = 0; i < length; ++i )
array[ i ] = initializer;
}
template< typename ValueT > inline
ValueT ProcessMixedOps( ValueT &value )
{
return static_cast<ValueT>( (1.0 + value) * (1.5 - value) / value );
}
template< typename ValueT, typename ArrayT > inline
ValueT ProcessMixedArray( ArrayT &array, unsigned &length, unsigned &loops )
{
ValueT result = 1;
for( unsigned j = 0; j < loops; ++j )
for( unsigned i = 0; i < length; ++i )
result *= ProcessMixedOps( array[ i ] );
return result;
}
template< typename ValueT >
class Memory
{
public:
template< typename ArrayT >
ArrayT Alloc( unsigned &length )
{
return ArrayT( length );
}
template<>
ValueT* Alloc< ValueT* >( unsigned &length )
{
return new ValueT[ length ];
}
template< typename ArrayT >
void Dealloc( ArrayT &array )
{
//array.clear( );
}
template<>
void Dealloc< ValueT* >( ValueT* &array )
{
delete array;
}
};
template< typename ValueT, typename ArrayT >
double Test( ValueT init, unsigned memLength )
{
unsigned length = memLength / sizeof( ValueT );
ArrayT Vector = Memory<ValueT>( ).Alloc<ArrayT>( length );
InitializeArray( Vector, length, init );
unsigned loops = my::max( 10000000 / length, (unsigned)1 );
unsigned tick = GetTickCount( );
double res = ProcessMixedArray<ValueT>( Vector, length, loops );
tick = GetTickCount( ) - tick;
Memory<ValueT>( ).Dealloc<ArrayT>( Vector );
double dSec = (double) tick / (double) 1000;
trash << res << endl; // output and forget result
double dFlops = (double) length * 5.0 * loops;
double dMFlops = dFlops / 1000000.0;
return dMFlops / dSec;
}
int main2( )
{
//unsigned min_size_p = 10; // 2 ^ 10 = 1.024
//unsigned min_size_p = 12; // 2 ^ 12 = 4.096
unsigned min_size_p = 1000000; // 2 ^ 3 = 16.384
unsigned max_size_p = 200000000; // 2 ^ 25 = 33.554.432
//unsigned max_size_p = 10; // 2 ^ 25 = 33.554.432
cout << "Max Vector memory length: ";
cout << (unsigned)pow( 2, max_size_p );
cout << endl;
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
short number = static_cast
cout << "number: " << number << endl;
//cout << "double t float t intn";
cout << "tdouble* 1tvector
for( unsigned v_size_p = min_size_p ; v_size_p < max_size_p; v_size_p
+= 2000000 )
{
unsigned size = v_size_p;//(unsigned)pow( 2, v_size_p );
cout << fixed << size << "t";
cout << Test
cout << Test( number, size ) << "t";
cout << Test( number,
size ) << "t";
//cout << Test( number, size ) << "t";
//cout << Test
//cout << Test( number, size ) << "t";
//cout << Test
//cout << Test
//cout << Test
//cout << Test
//cout << Test
//cout << Test
cout << "n";
}
//cout << dMFlops << " / " << dSec << " = " << dMFlops / dSec;
cout << endl;
return 0;
}
int main( )
{
return main2( );
}
|
|
| Back to top |
|
 |
Axter Guest
|
Posted: Sun Apr 24, 2005 11:51 am Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
Ingo Nolden wrote:
| Quote: | Dear Group,
I am a little confused by the result of a code that should give me
some
information about CPU cache effects.
I wrote a function that performs some flops on a vector/array of
doubles
or floats and of changing size . While playing around and trying
different things I compared the use of a standard c array with a
std::vector and the vector from the boost library.
The program was compiled with a VC++ 7.1 comiler with the default
release settings, and later with also whole program optimization and
global optimization activated ( which made no difference ).
On my laptop intel P4 2.6GHz and 512 Mbyte RAM the result was
extremely
surprising as the raw c array performed as expected in a range
between
300 and 350 MFlops ( if my Flops calculation is right ).
The other arrays however were about 80 times !!!! slower.
I had them expected to be probably some percentage slower.
Investigating the asm code ( as far as I can guess what it means )
seemd
to be doing the same thing however.
This made me think that it must be an issue about memory access. It
can
not be due to main memory size because the difference occurs at any
array size, beginning from 1 Mbyte.
Now I wanted to prove that it is not an compiler/optimization
dependant
issue. I ran the executable on a different machine, which is a AMD
Athlon 2400+ Desktop with 1Gb RAM. On this machine I got an even more
surprising result:
The std::vector and uBlas::vector performed well and even superceded
the
plain c array.
I usually don't care so much about performance, but 8000% is worth
thinking about it.
Below is my source. If one has no uBlas at hand, he can comment the
two
lines and it should work.
Also, to get back to my original intention, I want to change from
sequential access to arbitrary access of the vector items. Does
anyone
know a good/standard way to do so? It should put additional effort on
the CPU.
So, here goes my code:
#include <iostream
#include
#include
#include
#include
//#include
//#include
#include
using namespace std;
ofstream trash( "trash.txt" );
namespace my
{
template< typename ValueT
inline ValueT const& max( ValueT const& l, ValueT const& r )
{
return ( l > r ) ? l : r;
}
template< typename ValueT
inline ValueT const& min( ValueT const& l, ValueT const& r )
{
return ( l < r ) ? l : r;
}
}
template< typename ValueT, typename ArrayT > inline
void InitializeArray( ArrayT array, unsigned &length, ValueT
&initializer )
{
for( unsigned i = 0; i < length; ++i )
array[ i ] = initializer;
}
template< typename ValueT > inline
ValueT ProcessMixedOps( ValueT &value )
{
return static_cast<ValueT>( (1.0 + value) * (1.5 - value) / value );
}
template< typename ValueT, typename ArrayT > inline
ValueT ProcessMixedArray( ArrayT &array, unsigned &length, unsigned
&loops )
{
ValueT result = 1;
for( unsigned j = 0; j < loops; ++j )
for( unsigned i = 0; i < length; ++i )
result *= ProcessMixedOps( array[ i ] );
return result;
}
template< typename ValueT
class Memory
{
public:
template< typename ArrayT
ArrayT Alloc( unsigned &length )
{
return ArrayT( length );
}
template
ValueT* Alloc< ValueT* >( unsigned &length )
{
return new ValueT[ length ];
}
template< typename ArrayT
void Dealloc( ArrayT &array )
{
//array.clear( );
}
template
void Dealloc< ValueT* >( ValueT* &array )
{
delete array;
}
};
template< typename ValueT, typename ArrayT
double Test( ValueT init, unsigned memLength )
{
unsigned length = memLength / sizeof( ValueT );
ArrayT Vector = Memory
InitializeArray( Vector, length, init );
unsigned loops = my::max( 10000000 / length, (unsigned)1 );
unsigned tick = GetTickCount( );
double res = ProcessMixedArray<ValueT>( Vector, length, loops );
tick = GetTickCount( ) - tick;
Memory<ValueT>( ).Dealloc<ArrayT>( Vector );
double dSec = (double) tick / (double) 1000;
trash << res << endl; // output and forget result
double dFlops = (double) length * 5.0 * loops;
double dMFlops = dFlops / 1000000.0;
return dMFlops / dSec;
}
int main2( )
{
//unsigned min_size_p = 10; // 2 ^ 10 = 1.024
//unsigned min_size_p = 12; // 2 ^ 12 = 4.096
unsigned min_size_p = 1000000; // 2 ^ 3 = 16.384
unsigned max_size_p = 200000000; // 2 ^ 25 = 33.554.432
//unsigned max_size_p = 10; // 2 ^ 25 = 33.554.432
cout << "Max Vector memory length: ";
cout << (unsigned)pow( 2, max_size_p );
cout << endl;
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
short number = static_cast
cout << "number: " << number << endl;
//cout << "double t float t intn";
cout << "tdouble* 1tvector
for( unsigned v_size_p = min_size_p ; v_size_p < max_size_p;
v_size_p
+= 2000000 )
{
unsigned size = v_size_p;//(unsigned)pow( 2, v_size_p );
cout << fixed << size << "t";
cout << Test
cout << Test( number, size ) << "t";
cout << Test(
number,
size ) << "t";
//cout << Test( number, size ) << "t";
//cout << Test
//cout << Test( number, size ) << "t";
//cout << Test
//cout << Test
//cout << Test
//cout << Test
//cout << Test
//cout << Test
cout << "n";
}
//cout << dMFlops << " / " << dSec << " = " << dMFlops / dSec;
cout << endl;
return 0;
}
int main( )
{
return main2( );
}
|
When you did the test that showed vector being slower, did you do that
test in DEBUG mode?
If you did, then your test is invalid.
You should perform all performance test in release mode only.
I've perform test with vector VS C-Style array, and in my test the
vector out performance the C-Style array.
My test used VC++ 6.0 and VC++ 7.1
|
|
| Back to top |
|
 |
Ingo Nolden Guest
|
Posted: Sun Apr 24, 2005 12:13 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
| Quote: |
When you did the test that showed vector being slower, did you do that
test in DEBUG mode?
If you did, then your test is invalid.
You should perform all performance test in release mode only.
I've perform test with vector VS C-Style array, and in my test the
vector out performance the C-Style array.
My test used VC++ 6.0 and VC++ 7.1
|
Hi Axter,
thank you for your reply,
as I wrote I did the test in Release mode, and I explained in detail
what settings I used. So I didn't leave any space for guesses.
If it was in debug mode, it wouldn't have surprised me.
Also your result doens't surprise too much. As I wrote, on my AMD CPU I
got the same result as you. *** with the same exe build as on the intel
machine ***
So, but what I would like to know, what CPU do you have?
As long as there is nobody coming up with an idea whats going wrong, I
could try to examine on what type of CPU I get what behaviour.
thanks
Ingo
|
|
| Back to top |
|
 |
Uenal Mutlu Guest
|
Posted: Mon Apr 25, 2005 5:45 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
"Ingo Nolden" wrote
| Quote: |
I am a little confused by the result of a code that should give me some
information about CPU cache effects.
I wrote a function that performs some flops on a vector/array of doubles
or floats and of changing size . While playing around and trying
different things I compared the use of a standard c array with a
std::vector and the vector from the boost library.
The program was compiled with a VC++ 7.1 comiler with the default
release settings, and later with also whole program optimization and
global optimization activated ( which made no difference ).
On my laptop intel P4 2.6GHz and 512 Mbyte RAM the result was extremely
surprising as the raw c array performed as expected in a range between
300 and 350 MFlops ( if my Flops calculation is right ).
The other arrays however were about 80 times !!!! slower.
I had them expected to be probably some percentage slower.
Investigating the asm code ( as far as I can guess what it means ) seemd
to be doing the same thing however.
This made me think that it must be an issue about memory access. It can
not be due to main memory size because the difference occurs at any
array size, beginning from 1 Mbyte.
Now I wanted to prove that it is not an compiler/optimization dependant
issue. I ran the executable on a different machine, which is a AMD
Athlon 2400+ Desktop with 1Gb RAM. On this machine I got an even more
surprising result:
The std::vector and uBlas::vector performed well and even superceded the
plain c array.
|
How is that ever possible? I guess it is mostly due to your code and/or CPU caching.
| Quote: | I usually don't care so much about performance, but 8000% is worth
thinking about it.
Below is my source. If one has no uBlas at hand, he can comment the two
lines and it should work.
Also, to get back to my original intention, I want to change from
sequential access to arbitrary access of the vector items. Does anyone
know a good/standard way to do so? It should put additional effort on
the CPU.
.... |
Try this framework:
/*
Measuring array access overhead STL vs. RAW
Written by uenal.mutlu at t-online.de
Compiler: VC++6, but should work with any compiler
AppType: Console
Compile and Link: CL /GX /W3 /Od PerfTest.cpp
Sample output:
int clkTicksSTL: 1492 clkTicksRAW: 1412
float clkTicksSTL: 1482 clkTicksRAW: 1412
double clkTicksSTL: 1492 clkTicksRAW: 1412
Result: the performance penalty for STL is about 5%.
This is IMO neglectable.
*/
#include <iostream>
#include <vector>
#include <ctime>
template <typename T>
void PerfTestArrayAccess_STL_vs_RAW(const char* const pszTypename,
const size_t& nelems,
const size_t& niterations,
clock_t& retClkTicksSTL,
clock_t& retClkTicksRAW,
bool AfDump = true)
{
retClkTicksSTL = retClkTicksRAW = 0;
size_t i, j;
// Timing STL array access:
std::vector<T> vect(nelems);
clock_t clkTicksStart = clock();
unsigned dummycounter = 0;
for (i = 0; i < niterations; i++)
for (j = 0; j < nelems; j++)
{
unsigned ix = (rand() * rand()) % nelems;
if (!(unsigned(vect[ix]) % 2))
dummycounter++;
}
retClkTicksSTL = clock() - clkTicksStart;
vect.clear();
// Timing RAW array access:
T* pa = new T[nelems];
clkTicksStart = clock();
dummycounter = 0;
for (i = 0; i < niterations; i++)
for (j = 0; j < nelems; j++)
{
unsigned ix = (rand() * rand()) % nelems;
if (!(unsigned(pa[ix]) % 2))
dummycounter++;
}
retClkTicksRAW = unsigned(clock() - clkTicksStart);
delete pa;
if (AfDump)
std::cout << pszTypename << " "
<< "clkTicksSTL: " << retClkTicksSTL << " "
<< "clkTicksRAW: " << retClkTicksRAW << std::endl;
}
int main(int argc, char* argv[])
{
const size_t nelems = 1000000;
const size_t niterations = 5;
clock_t clkTicksSTL, clkTicksRAW;
PerfTestArrayAccess_STL_vs_RAW
PerfTestArrayAccess_STL_vs_RAW<float>( "float ", nelems, niterations, clkTicksSTL, clkTicksRAW, true);
PerfTestArrayAccess_STL_vs_RAW<double>("double", nelems, niterations, clkTicksSTL, clkTicksRAW, true);
return 0;
}
|
|
| Back to top |
|
 |
block111@mail.ru Guest
|
Posted: Mon Apr 25, 2005 7:11 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
Not sure if I'm right, but I think that the reason c-array appears to
be slower when the size is > 1Mb is that c-array is stack-based and
default stack size for windows apps compiled with vc is 1mb. Vector
doesn't store values on stack (maybe except cases for short array, but
it's not important in this case) so with large c-arrays there might be
some sort of overhead handling extra stack size. I didn't check your
long sorce code, but from what I read I have no other idea for such
strange results...
|
|
| Back to top |
|
 |
block111@mail.ru Guest
|
Posted: Mon Apr 25, 2005 7:35 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
Your code doesn't compile well (at least for me)
it seems that you use dynamic c-arrays so, I was wrong about stack
based overhead.
Why don't you want to use std::min and std::max defined in <algorithm>?
|
|
| Back to top |
|
 |
Uenal Mutlu Guest
|
Posted: Mon Apr 25, 2005 8:16 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
<block111 (AT) mail (DOT) ru> wrote
| Quote: | Your code doesn't compile well (at least for me)
|
Which compiler?
What error does it report?
| Quote: | Why don't you want to use std::min and std::max defined in <algorithm>?
|
Sorry, I don't know what you mean. There was no necessity
to use them in the posted code of mine.
BTW, in case you don't know: there is a possibility to quote the relevant
portions of a posting one replies to. This helps to understand what
the writer might have meant.
|
|
| Back to top |
|
 |
block111@mail.ru Guest
|
Posted: Mon Apr 25, 2005 8:32 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
does it matter which compiler as long as windows.h defines macros for
min and max and you DO use my::max in your code.
There were some portions of code that seem quite strange.
What does this part of code do:
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
Perhaps, for questions completely unrelated to any platform you would
want to avoid use of windows.h and use <ctime>
for example a timer could be
#include <ctime>
class timer {
std::clock_t t;
public:
timer() : t(std::clock()){}
double stop(){ return ( (static_cast<double>(std::clock() -t))/CLK_TCK
); }
};
And when I compiled the code I didn't have any unexpected results they
were all in a reasonable range, with vector/ublas::vector being a bit
better than c-style arrays. This probably is a result of your coding
and not any sort of optimization, IMO
|
|
| Back to top |
|
 |
block111@mail.ru Guest
|
Posted: Mon Apr 25, 2005 8:35 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
and yes, I know about quoting
I just use google::groups and do not subscribe to any usenet etc etc.
With google groups interface to have quoting I'd need to manually copy
your message and add "> " for each line
|
|
| Back to top |
|
 |
Uenal Mutlu Guest
|
Posted: Mon Apr 25, 2005 9:27 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
<block111 (AT) mail (DOT) ru> wrote
| Quote: | does it matter which compiler as long as windows.h defines macros for
min and max and you DO use my::max in your code.
There were some portions of code that seem quite strange.
What does this part of code do:
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
|
You are refering not to my code, it is the code of Ingo Nolden.
Mine does not use Windows stuff.
Here is a slightly updated version of my code:
/*
PerfTest.cpp v1.01
Measuring array access overhead STL vs. RAW
Written by uenal.mutlu at t-online.de
Compiler: VC++6, but should work with any compiler
AppType: Console
Compile and Link: CL /GX /W3 /Od PerfTest.cpp
Sample output:
int clkTicksSTL: 1492 clkTicksRAW: 1412
float clkTicksSTL: 1482 clkTicksRAW: 1412
double clkTicksSTL: 1492 clkTicksRAW: 1412
Result: the performance penalty for STL is about 5%.
This is IMO neglectable.
*/
#include <iostream>
#include <vector>
#include <ctime>
template <typename T>
void PerfTestArrayAccess_STL_vs_RAW(const char* const pszTypename,
const size_t& nelems,
const size_t& niterations,
clock_t& retClkTicksSTL,
clock_t& retClkTicksRAW,
bool AfDump = true)
{
retClkTicksSTL = retClkTicksRAW = 0;
size_t i, j;
// Timing STL array access:
std::vector<T> vect(nelems);
clock_t clkTicksStart = clock();
unsigned dummycounter = 0;
for (i = 0; i < niterations; i++)
for (j = 0; j < nelems; j++)
{
unsigned ix = (rand() * rand()) % nelems;
if (!(unsigned(vect[ix]) % 2))
dummycounter++;
}
retClkTicksSTL = clock() - clkTicksStart;
vect.clear();
// Timing RAW array access:
T* pa = new T[nelems];
clkTicksStart = clock();
dummycounter = 0;
for (i = 0; i < niterations; i++)
for (j = 0; j < nelems; j++)
{
unsigned ix = (rand() * rand()) % nelems;
if (!(unsigned(pa[ix]) % 2))
dummycounter++;
}
retClkTicksRAW = clock() - clkTicksStart;
delete pa;
if (AfDump)
std::cout << pszTypename << " "
<< "clkTicksSTL: " << retClkTicksSTL << " "
<< "clkTicksRAW: " << retClkTicksRAW << std::endl;
}
int main(int argc, char* argv[])
{
const size_t nelems = 1000000;
const size_t niterations = 5;
clock_t clkTicksSTL, clkTicksRAW;
PerfTestArrayAccess_STL_vs_RAW
PerfTestArrayAccess_STL_vs_RAW<float>( "float ", nelems, niterations, clkTicksSTL, clkTicksRAW, true);
PerfTestArrayAccess_STL_vs_RAW<double>("double", nelems, niterations, clkTicksSTL, clkTicksRAW, true);
return 0;
}
|
|
| Back to top |
|
 |
block111@mail.ru Guest
|
Posted: Mon Apr 25, 2005 9:42 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
Yeah, I thought I was talking with the original poster. I din't
try/look at your code and cannot comment on that :)
|
|
| Back to top |
|
 |
Richard Herring Guest
|
Posted: Tue Apr 26, 2005 8:58 am Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
In message <1114461343.139440.246400 (AT) o13g2000cwo (DOT) googlegroups.com>,
[email]block111 (AT) mail (DOT) ru[/email] writes
| Quote: | and yes, I know about quoting
I just use google::groups and do not subscribe to any usenet etc etc.
With google groups interface to have quoting I'd need to manually copy
your message and add "> " for each line
Not true. Unless they've changed the interface *again* iIf you click |
on "show options" at the top of the message, then the "followup" that is
revealed there, Google will quote the message for you.
--
Richard Herring
|
|
| Back to top |
|
 |
Ingo Nolden Guest
|
Posted: Fri Apr 29, 2005 8:25 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
Uenal Mutlu wrote:
| Quote: | block111 (AT) mail (DOT) ru> wrote
does it matter which compiler as long as windows.h defines macros for
min and max and you DO use my::max in your code.
There were some portions of code that seem quite strange.
What does this part of code do:
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
I wanted to get a 1 without the compiler being aware. |
But true, I should use standard stuff like <ctime> and will do so from now.
| Quote: |
You are refering not to my code, it is the code of Ingo Nolden.
Mine does not use Windows stuff.
Here is a slightly updated version of my code:
/*
PerfTest.cpp v1.01
Measuring array access overhead STL vs. RAW
Written by uenal.mutlu at t-online.de
Compiler: VC++6, but should work with any compiler
AppType: Console
Compile and Link: CL /GX /W3 /Od PerfTest.cpp
|
and compiled with optimization disabled, what are you going to do with
it? The performance penalty of anything is quite irrelevant, if it is
not compiled with the settings that I use for production code, is it?
The assembly of my code looked quite similar for both dynamic c-array
and std::vector. This is what I actually hoped to see, but the type of
memory used seems different.
| Quote: | Sample output:
int clkTicksSTL: 1492 clkTicksRAW: 1412
float clkTicksSTL: 1482 clkTicksRAW: 1412
double clkTicksSTL: 1492 clkTicksRAW: 1412
Result: the performance penalty for STL is about 5%.
This is IMO neglectable.
|
against wich runtime are you linking. I get results comparable to yours
if I link against no-debug runtime - but as you did - disabled
optimization /Od
Ingo
|
|
| Back to top |
|
 |
Ingo Nolden Guest
|
Posted: Fri Apr 29, 2005 8:27 pm Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
| Quote: |
And when I compiled the code I didn't have any unexpected results they
were all in a reasonable range, with vector/ublas::vector being a bit
better than c-style arrays. This probably is a result of your coding
and not any sort of optimization, IMO
|
do you tell me what CPU you have?
|
|
| Back to top |
|
 |
Uenal Mutlu Guest
|
Posted: Sat Apr 30, 2005 7:59 am Post subject: Re: performance of std::vector<double>, double[] and uBlas:: |
|
|
"Ingo Nolden" wrote
| Quote: | Uenal Mutlu wrote:
[email]block111 (AT) mail (DOT) ru[/email]> wrote
does it matter which compiler as long as windows.h defines macros for
min and max and you DO use my::max in your code.
There were some portions of code that seem quite strange.
What does this part of code do:
DWORD dwNumber = GetTickCount( );
dwNumber = GetTickCount( ) / dwNumber;
I wanted to get a 1 without the compiler being aware.
But true, I should use standard stuff like <ctime> and will do so from now.
You are refering not to my code, it is the code of Ingo Nolden.
Mine does not use Windows stuff.
Here is a slightly updated version of my code:
/*
PerfTest.cpp v1.01
Measuring array access overhead STL vs. RAW
Written by uenal.mutlu at t-online.de
Compiler: VC++6, but should work with any compiler
AppType: Console
Compile and Link: CL /GX /W3 /Od PerfTest.cpp
and compiled with optimization disabled, what are you going to do with
it? The performance penalty of anything is quite irrelevant, if it is
not compiled with the settings that I use for production code, is it?
The assembly of my code looked quite similar for both dynamic c-array
and std::vector. This is what I actually hoped to see, but the type of
memory used seems different.
Sample output:
int clkTicksSTL: 1492 clkTicksRAW: 1412
float clkTicksSTL: 1482 clkTicksRAW: 1412
double clkTicksSTL: 1492 clkTicksRAW: 1412
Result: the performance penalty for STL is about 5%.
This is IMO neglectable.
against wich runtime are you linking. I get results comparable to yours
if I link against no-debug runtime - but as you did - disabled
optimization /Od
|
I tested on W2kP.
Using /Ox (Full Optimization) gives:
int clkTicksSTL: 1191 clkTicksRAW: 1202
float clkTicksSTL: 1202 clkTicksRAW: 1201
double clkTicksSTL: 1202 clkTicksRAW: 1202
So, using full optimization there is virtually no overhead.
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|