C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Compared valarray and blitz. Surprise.

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Piotr L.
Guest





PostPosted: Tue Sep 21, 2004 11:39 am    Post subject: Compared valarray and blitz. Surprise. Reply with quote



Hallo

I'd like to write a small program, which reads and manipulates not so
small data (scanned pictures aprox. 100MB each).

So in the beginning I wanted to choose some kind of container to store the
image data in memory.

I have written simple 'benchmark test' for native C array, STL::vector,
STL::valarray, boost::array, blitz::Array. This test does nothing more but
assigns value to each element of container (sometimes in different ways
(loops, algorithms, = operators)).

Here is the source of my "benchmarks" Wink
http://main.ams.edu.pl/~piotrlg/vector-test.cc
Compiled on linux, using g++ 3.3.3
Both boost and blitz libs were rather new ones.

I wanted to find the fastest one. The planned operations on the image data
are rather simple ones, so sophisticated functions of the container aren't
the top priority.

I was very surprised with the results.

On my P4 2GHz the winner is valarry/vector! I have seen rather bad
opinions about valarray class. Not only about its design but also about
performance. The advice was to use blitz libraries instead of valarray. So
could someone explain me what should I think about the following table?

Sorted results
-----------------------
fill vect time: 0.108265
for_each vect time: 0.108998
valarray time: 0.111239
vect C loop time: 0.115843
boost array for_each: 0.123644
boost array assign: 0.124048
C array time: 0.205496
boost array C loop: 0.211093
generate vect time: 0.256162
blitz fill array time: 0.329419
blitz for_each array : 0.33393
blitz array time: 1.13735


Please note, that blitz Array is almost 3 times slower than vector or
valarray. Please also note the poor C array is not the fastest ;-)

So valarray is not that bad and blitz is hm slow. I hoped that the library
mentioned and suggested as the valarray killer could be at least that good.

Could someone explain me why I see such poor blitz performance? Is it
because the nature of the very primitive "test suite"?

So I'm confused now. What to choose?

BTW. I'm not a programmer. I just want to write a program for my own use.
Please also note that this is one of my first C++ programs.

One more thing. Please note, that when I run this test second time (just
right after the first is finished) I can see that 'blitz array time: '
accelerates a lot. Strange (?).

Sorted results
-----------------------
fill vect time: 0.10828
vect C loop time: 0.10846
valarray time: 0.10907
for_each vect time: 0.109135
boost array for_each: 0.115181
boost array assign: 0.117081
C array time: 0.196561
boost array C loop: 0.206079
blitz array time: 0.209327 <-- previously over 1 sec.
generate vect time: 0.25631
blitz for_each array : 0.322798
blitz fill array time: 0.323558


Regards
Piotr Legiecki

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Thomas Richter
Guest





PostPosted: Tue Sep 21, 2004 6:24 pm    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote



Hi,

Quote:
I have written simple 'benchmark test' for native C array, STL::vector,
STL::valarray, boost::array, blitz::Array. This test does nothing more but
assigns value to each element of container (sometimes in different ways
(loops, algorithms, = operators)).

On my P4 2GHz the winner is valarry/vector! I have seen rather bad
opinions about valarray class. Not only about its design but also about
performance. The advice was to use blitz libraries instead of valarray. So
could someone explain me what should I think about the following table?

IMHO, your tests are not really very realistic. You typically have more
to do on the vectors than just to fill them with constant values. Thus,
you're currently only testing one function that could even end up as a built-
in.

My personal test case - because it was realistic for my applications - was
the implementation of a FIR filter on arrays, or rather, an elementary
filter step. Take the sum of two vectors, multiply by a constant, add
result to aonther vector, measure time. The result was that the standard
C array was the fastest possibility (without choosing some vector formats,
i.e.
SIMD instructions, the platform had to offer). The compiler generated code
for the valarray looked pretty bad, worse than the C code, and no compiler
I tried was able to generate the SIMD form "out of the box". My personal
opinion about the valarrays is that I really don't see a use for them; if
it really matters, I've to do it by hand anyhow.

Quote:
So I'm confused now. What to choose?

Write a test case that minimics your typical operations on the data you'll
perform later, then test again. Unless, of course, your typical operation
is filling of vectors with constants, your benchmark doesn't really help.

Quote:
One more thing. Please note, that when I run this test second time (just
right after the first is finished) I can see that 'blitz array time: '
accelerates a lot. Strange (?).

As a general hint: Always run your main program loop several times, dispose
the timing of the first loops, then take the average of the last twenty
loops or so. The first loops might not only allocate memory, they may
also move data into the cache. Depending on the size of your vectors,
memory might then be partially in the cache and the code will perform much
better. This is especially true for the P4 architecture and its rather
"microscopic" first level cache of 8K. If you can arrange your data to
fit in there, you win. Everything else is rather secondary as branch
prediction and code cache do a good job on the P4.

So long,
Thomas


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Brooks Moses
Guest





PostPosted: Tue Sep 21, 2004 10:51 pm    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote



Thomas Richter wrote:
[attribution lost]
Quote:
One more thing. Please note, that when I run this test second time (just
right after the first is finished) I can see that 'blitz array time: '
accelerates a lot. Strange (?).

As a general hint: Always run your main program loop several times, dispose
the timing of the first loops, then take the average of the last twenty
loops or so. The first loops might not only allocate memory, they may
also move data into the cache.

Another thing potentially worth doing is trying a second test with the
order of which array index corresponds to the outer and inner loop
variables reversed. Some array implementations (most notably those that
are attempting to provide behavior similar to Fortran arrays) use a
different order than the standard C array, and it's also worth making
sure you've got the order right.

Which reminds me: while the original poster is testing implementations,
the Template Numerical Toolkit that NIST distributes
(http://math.nist.gov/tnt/) might be worth checking out. It looks to me
like it should be a fairly straightforward wrapper around the standard C
array, but may be a little easier to work with.

Quote:
Depending on the size of your vectors,
memory might then be partially in the cache and the code will perform much
better. This is especially true for the P4 architecture and its rather
"microscopic" first level cache of 8K. If you can arrange your data to
fit in there, you win. Everything else is rather secondary as branch
prediction and code cache do a good job on the P4.

Indeed -- but this is rather difficult to do with the original poster's
100MB image files!

- Brooks


--
The "bmoses-nospam" address is valid; no unmunging needed.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Glen Low
Guest





PostPosted: Wed Sep 22, 2004 10:06 am    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote

Quote:
My personal test case - because it was realistic for my applications - was
the implementation of a FIR filter on arrays, or rather, an elementary
filter step. Take the sum of two vectors, multiply by a constant, add
result to aonther vector, measure time. The result was that the standard
C array was the fastest possibility (without choosing some vector formats,
i.e.
SIMD instructions, the platform had to offer). The compiler generated code
for the valarray looked pretty bad, worse than the C code, and no compiler
I tried was able to generate the SIMD form "out of the box". My personal
opinion about the valarrays is that I really don't see a use for them; if
it really matters, I've to do it by hand anyhow.

My implementation of valarray transparently uses SIMD to accelerate
such code. The PowerPC Altivec implementation is fairly complete now
but I'm working on an SSE/SSE2 version and cleaning up the code
somewhat. I have gotten close to the theoretical limit for SIMD
acceleration and the generated code looks sharp. Stay tuned.

http://www.pixelglow.com/macstl/

IIRC, only one other valarray implementation besides mine uses
expression templates i.e. libstdc++ (gcc). Metrowerks, SGI STL and
Visual C++ (Dinkumware) don't use this technique, which means their
code slows down signficantly with longer expressions as large temp
valarrays get allocated.

Cheers,
Glen Low, Pixelglow Software
www.pixelglow.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Tom Widmer
Guest





PostPosted: Wed Sep 22, 2004 6:37 pm    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote

On 21 Sep 2004 14:24:32 -0400, Thomas Richter
<thor (AT) cleopatra (DOT) math.tu-berlin.de> wrote:
Quote:
The compiler generated code
for the valarray looked pretty bad, worse than the C code, and no compiler
I tried was able to generate the SIMD form "out of the box". My personal
opinion about the valarrays is that I really don't see a use for them; if
it really matters, I've to do it by hand anyhow.

There are valarray implementations around that generate vector
assembler instructions. I don't know a SIMD one, but there's an
Altivec one (for PowerPC) here:

http://www.pixelglow.com/macstl/

I have a feeling that the GCC and libstdc++ team are working on adding
vector instructions to their valarray implementation. I don't know how
far they've got.

Tom

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Gabriel Dos Reis
Guest





PostPosted: Tue Nov 23, 2004 10:53 am    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote

Tom Widmer <tom_usenet (AT) hotmail (DOT) com> writes:

Quote:
I have a feeling that the GCC and libstdc++ team are working on adding
vector instructions to their valarray implementation. I don't know how
far they've got.

When I get more time Smile
But, measurements I've made on my own applications suggested quite
good performance.

Part of the perceived problem is that array processing means different
things for different people...

--
Gabriel Dos Reis
[email]gdr (AT) integrable-solutions (DOT) net[/email]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Gabriel Dos Reis
Guest





PostPosted: Tue Nov 23, 2004 10:54 am    Post subject: Re: Compared valarray and blitz. Surprise. Reply with quote

[email]glenlow (AT) pixelglow (DOT) com[/email] (Glen Low) writes:

Quote:
http://www.pixelglow.com/macstl/

IIRC, only one other valarray implementation besides mine uses
expression templates i.e. libstdc++ (gcc).

At the time I originally implemented and reworked GCC's valarray, I
also had a use for it and I was quite sensitive to performance loss.

I've read Thomas Richter complain a couple of time, but as far as I
can determine he never gave me a set of testcases I could run in order
to see what the problems might be...

--
Gabriel Dos Reis
[email]gdr (AT) integrable-solutions (DOT) net[/email]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.