C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

STL vectors, allocators and SSE2 instructions

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Philip Montrowe
Guest





PostPosted: Mon Sep 05, 2005 11:14 am    Post subject: STL vectors, allocators and SSE2 instructions Reply with quote



Hi,

I have a multi-part question about allocating and copying memory for
STL.

My environment is currently MSVC 7.1 but the code must be portable, ANSI
compliant and 64-bit compliant.

The code uses a lot of STL and manipulates large vectors of
char/short/float types of 1-10mb in total length each. The STL
implementation is the one supplied with MSVC 7.1 which I believe is
basically Dinkumware.

I wish to process these vectors using Intel SSE2 instructions intrinsics
and optionally allocate, initialize, and copy them using the memory
intrinsics. This departure from portability is permitted.

So, here are my questions.

1) Is there a clean portable way to fulfill the SSE2 requirement that
its data fields be aligned on 16 byte boundaries?

2) Is there a way to use/overload the STL allocator and/or STL raw
memory routines (uninitialized_fill etc) to use the fast SSE2 array
copy/store instructions.

Here are some of the issues I have noted.

a) One could provide an operator new to allocate memory on 16 byte
boundaries in 16 byte chunks but that would be wasteful because it
applies to all memory allocation.

b) I could provide my own allocator for the vectors but the Dinkumware
implementation only calls std::uninitialized_fill/copy etc for those
implementations using its standrad allocator. Otherwise it calls the
element-oriented (and thereby unusable by SSE2) construct and destroy
functions.

b) Unless I use a non-portable alignment keyword, I think I would have
to allocate and then round up to a 16 byte boundary, leaving a
requirement to store the original new address somewhere for later delete
in addition to the rounded up "user" address.

There are other issuess as well, but I will stick to these big free for
the time being.

So far the only solution I have come up with is to obtain a
copyleft/open source implementation of vector/allocator from somewhere
and modify appropriately to handle the issues above.

Any comments welcome.

Philip

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
P.J. Plauger
Guest





PostPosted: Tue Sep 06, 2005 10:54 am    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote



"Philip Montrowe" <montrowe (AT) hotmail (DOT) com> wrote


Quote:
I have a multi-part question about allocating and copying memory for
STL.

My environment is currently MSVC 7.1 but the code must be portable, ANSI
compliant and 64-bit compliant.

The code uses a lot of STL and manipulates large vectors of
char/short/float types of 1-10mb in total length each. The STL
implementation is the one supplied with MSVC 7.1 which I believe is
basically Dinkumware.

I wish to process these vectors using Intel SSE2 instructions intrinsics
and optionally allocate, initialize, and copy them using the memory
intrinsics. This departure from portability is permitted.

So, here are my questions.

1) Is there a clean portable way to fulfill the SSE2 requirement that
its data fields be aligned on 16 byte boundaries?

Mostly. First, I suspect that anything you get from malloc will be
aligned on at least an eight-byte boundary, so you could probably
just make your vectors one element larger and choose an offset of
zero or one as needed for each vector. If that imposes too much
messy logic on your program proper, then write your own allocator
to do much the same thing. (Bootleg a copy of template class allocator
from <xmemory> and doctor it up.)

Quote:
2) Is there a way to use/overload the STL allocator and/or STL raw
memory routines (uninitialized_fill etc) to use the fast SSE2 array
copy/store instructions.

Overload on what? Unless you wrap the scalar types in classes
of your own devising, there's no way to distinguish them.

Quote:
Here are some of the issues I have noted.

a) One could provide an operator new to allocate memory on 16 byte
boundaries in 16 byte chunks but that would be wasteful because it
applies to all memory allocation.

Probably not all that wasteful, given the overheads of malloc anyway.
Remember, vector allocates *arrays* of T not individual T objects.

Quote:
b) I could provide my own allocator for the vectors but the Dinkumware
implementation only calls std::uninitialized_fill/copy etc for those
implementations using its standrad allocator. Otherwise it calls the
element-oriented (and thereby unusable by SSE2) construct and destroy
functions.

See above about defining your own types.

Quote:
b) Unless I use a non-portable alignment keyword, I think I would have
to allocate and then round up to a 16 byte boundary, leaving a
requirement to store the original new address somewhere for later delete
in addition to the rounded up "user" address.

Yep.

Quote:
There are other issuess as well, but I will stick to these big free for
the time being.

So far the only solution I have come up with is to obtain a
copyleft/open source implementation of vector/allocator from somewhere
and modify appropriately to handle the issues above.

Again, see above. It's not all that hard.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Tue Sep 06, 2005 11:10 am    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote



Philip Montrowe wrote:

Quote:
I have a multi-part question about allocating and copying
memory for STL.

My environment is currently MSVC 7.1 but the code must be
portable, ANSI compliant and 64-bit compliant.

The code uses a lot of STL and manipulates large vectors of
char/short/float types of 1-10mb in total length each. The
STL implementation is the one supplied with MSVC 7.1 which I
believe is basically Dinkumware.

I wish to process these vectors using Intel SSE2 instructions
intrinsics and optionally allocate, initialize, and copy them
using the memory intrinsics. This departure from portability
is permitted.

So, here are my questions.

1) Is there a clean portable way to fulfill the SSE2
requirement that its data fields be aligned on 16 byte
boundaries?

For what definition of clean? And portable?

I don't think that there is anything in the standard which says
that an implementation has to start its actual array at the
first address returned; an implement could insert some special,
hidden information at the start of the buffer returned from the
allocator, so that &v[0] would not be the address returned from
the allocator. In practice, I can't imagine an implementation
doing this with std::vector, so you should be safe.

Whether you can write and use such an allocator in a way that
could be considered clean and portable is another question.

Quote:
2) Is there a way to use/overload the STL allocator and/or STL
raw memory routines (uninitialized_fill etc) to use the fast
SSE2 array copy/store instructions.

You can (and probably will have to) instantiate std::vector on
your own allocator type. The default allocator used by the STL
is thus not a problem.

The STL algorithms (all of them) basically only know the
iterator type. Again, although I don't think the standard
actually requires it, the actual iterator type must, in some
way, depend on the Allocator::pointer. It must explicitly
depend on Allocator::reference, since this is what is returned
from operator*. But I think that Allocator::reference must be a
real reference. So you definie Allocator::pointer to be a
special, user defined type, and specialize the algorithms you
need over std::vector< X, MyAllocatorType >::iterator.
(Depending on the implementation, this will either be
Allocator::pointer, or some class type which is distinct for
different Allocator::pointer types.)

That's in theory, at least. Quite frankly, the use seems exotic
enough that I doubt many implementations have actually tested
it. Which means that you have effectively become a beta tester,
or maybe even an alpha tester, for the STL implementation.
Without, however, the typical quick-fix proceedures which a
normal beta tester would have. (For that matter, there is a
clause in the standard which says that "Implementations of
containers described in this International Standard are
permitted to assume that their Allocator template parameter
meets the following two additional requirements [...] -- The
typedef members pointer [...] are required to be T* [...]." If
an implementation takes advantage of this phrase, you're stuck.
Quote:
From what I've read in this forum, however, I think that
Dinkumware tries to go considerably farther in its support of

allocators.)

Quote:
Here are some of the issues I have noted.

a) One could provide an operator new to allocate memory on 16
byte boundaries in 16 byte chunks but that would be wasteful
because it applies to all memory allocation.

And it doesn't necessarily help, since there is no guarantee
that &v[0] is an address returned by an allocator function.

But that's a theoretical problem, more than a real one. In
practice, it's actually likely that the standard allocator
aligns this much; I'd be very surprised if it aligned less than
8, and there could be internal reasons why it aligns at 16. I
doubt that the waste would be mesurable.

Quote:
b) I could provide my own allocator for the vectors but the
Dinkumware implementation only calls
std::uninitialized_fill/copy etc for those implementations
using its standrad allocator. Otherwise it calls the
element-oriented (and thereby unusable by SSE2) construct and
destroy functions.

There is a requirement on uninitialized_fill/copy, that no
exceptions are throw from the iterator. Presumably,
Allocator::pointer could throw on dereferencing; if
std::vector<>::iterator uses it, you could have a problem. So
either the implementation ensures that uninitialized_fill/copy
does work correctly even if the iterator throws, or it cannot
use uninitialized_fill/copy except with allocators which it
knows cannot throw.

Of course, there is no requirement for std::vector to use
uninitialized_fill/copy, or any other particular algorithm in
the standard library.

Quote:
b) Unless I use a non-portable alignment keyword, I think I
would have to allocate and then round up to a 16 byte
boundary, leaving a requirement to store the original new
address somewhere for later delete in addition to the rounded
up "user" address.

Right. For that matter, you may not even be able to guarantee
that memory allocated directly by the system is sufficiently
aligned.

Whether this is a problem in practice is a different question.

Quote:
There are other issuess as well, but I will stick to these big
free for the time being.

So far the only solution I have come up with is to obtain a
copyleft/open source implementation of vector/allocator from
somewhere and modify appropriately to handle the issues above.

I don't think that there is a solution which is 100% conform
with the standard. There are too many loopholes.

What I would do in your place would be to start by contacting
Dinkumware; you might be able to get a custom implementation
from them at a reasonable price, which does exactly what you
want. (The advantage, of course, is that 1) they could probably
do it in a way that remains compatible with the rest of the
library, in case you have to link in components compiled with
the standard library, and 2) since they know the ins and outs of
the library thoroughly, the risk of an error due to an
unintentional incompatibility creeping in is much lower than for
anything you could do yourself.)

If for some reason that doesn't work, I'd probably implement the
needed subset from scratch; vector isn't all that difficult, and
trying to maintain a generic version, when all you need is your
specific allocator, is probably a lot more work than writing
everything from scratch (unless you already know the internals
of the implementation thoroughly).

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
msalters
Guest





PostPosted: Tue Sep 06, 2005 4:10 pm    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

kanze schreef:
....
Quote:
I don't think that there is anything in the standard which says
that an implementation has to start its actual array at the
first address returned; an implement could insert some special,
hidden information at the start of the buffer returned from the
allocator, so that &v[0] would not be the address returned from
the allocator. In practice, I can't imagine an implementation
doing this with std::vector, so you should be safe.

It could, but it has to ensure the alignment requirements are
retained. If an allocator<T> allocates aligned memory,
it is possible to use any integer multiple of sizeof(T) bytes
for that hidden information. The problem is that SSE2 requires
alignment at a multiple of sizeof(T). Using 3*sizeof(T) bytes
of hidden information would break SSE2.

HTH,
Michiel Salters


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Wed Sep 07, 2005 10:07 am    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

msalters wrote:
Quote:
kanze schreef:
...
I don't think that there is anything in the standard which
says that an implementation has to start its actual array at
the first address returned; an implement could insert some
special, hidden information at the start of the buffer
returned from the allocator, so that &v[0] would not be the
address returned from the allocator. In practice, I can't
imagine an implementation doing this with std::vector, so
you should be safe.

It could, but it has to ensure the alignment requirements are
retained. If an allocator<T> allocates aligned memory, it is
possible to use any integer multiple of sizeof(T) bytes for
that hidden information. The problem is that SSE2 requires
alignment at a multiple of sizeof(T). Using 3*sizeof(T) bytes
of hidden information would break SSE2.

Which alignment requirements must it ensure are retained? If I
understand the problem correctly, it is really that SSE2 has
stricter alignment requirements than would otherwise be
necessary. (I'm not at all familiar with SSE, but the orignal
poster spoke of an alignment requirement of 16 bytes, which is
definitly more than are needed for most, if not all types on the
architecture.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Philip
Guest





PostPosted: Wed Sep 07, 2005 2:00 pm    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

Thanks for the clarifying answers...

What do you think about subclassing vector and overloading the
constructors and assignment operator as well as insert, assign, and
resize. I would also write a custom allocator which stores the
additional aligned memory data just before teh "user" start.

I believe this would allow me to insert my scalar SSE2 routines.

Since this is a performance option, I can live with the possibility of
slicing or lost polymorphism because of missing virtual qualifiers on
the insert, assign, and resize.

But I have always stayed away from doing this to STL because of the
lack of a virtual destructor. In this case there is no additional
destructor function taht needs to be performed in the child destructor,
but I always think it is bad style to do this.

Your thoughts?

Philip


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Philip
Guest





PostPosted: Wed Sep 07, 2005 2:02 pm    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

Thnanks for the reply.

Initial testing shows that malloc on WinXp aligns on 8-byte boundaries,
which probably has something to do with their free-chain management
structure.

I like the tip to store hidden information at the front of the memory
allocation and then step the "user" start beyond that. If I do that in
conjunction with my own allocator then I will always be able to find
the address of the original malloc returned by simply backing up the
address supllied by vector.

I am pretty certain that having an allocator<_Type>::pointer type that
is not actually _Type* will open up a huge canm of worms, although
writing the implementation has its attractions. Also I believe Scott
Meyers had something to say about the inadvisability of doing this.

Philip


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
André Kempe
Guest





PostPosted: Wed Sep 07, 2005 5:19 pm    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

kanze schrieb:
Quote:
Which alignment requirements must it ensure are retained? If I
understand the problem correctly, it is really that SSE2 has
stricter alignment requirements than would otherwise be
necessary. (I'm not at all familiar with SSE, but the orignal
poster spoke of an alignment requirement of 16 bytes, which is
definitly more than are needed for most, if not all types on the
architecture.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34



SSE2 requires alignment on a clean 16-byte-boundary, otherwise you get a
General-Protection-Error from your CPU. But some/most of the
SSE2-intrinsics have an unaligned counterpart, which Philip might try
first. Intel states that these give a performance penalty, but I have no
idea how many processor-ticks it is.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Dave Harris
Guest





PostPosted: Thu Sep 08, 2005 9:14 am    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

[email]Philip (AT) Montrowe (DOT) com[/email] (Philip) wrote (abridged):
Quote:
What do you think about subclassing vector and overloading the
constructors and assignment operator as well as insert, assign, and
resize. I would also write a custom allocator which stores the
additional aligned memory data just before teh "user" start.

I'd much rather write my own vector from scratch. For something like this
I'd prefer to have complete control. A basic vector-like class isn't hard
to write, and you may not need the full std interface.

-- Dave Harris, Nottingham, UK.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze
Guest





PostPosted: Thu Sep 08, 2005 11:29 am    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

Philip wrote:

Quote:
Initial testing shows that malloc on WinXp aligns on 8-byte
boundaries, which probably has something to do with their
free-chain management structure.

There are two factors involved: internal overhead, and alignment
requirements. On an Intel architecture, there are, strictly
speaking, no alignment requirements, but performance takes a hit
if one of the basic object types (those addressed directly by
hardware) spans a cache line, or something like that; ensuring
an alignment of at least the object's size (rounded up to the
next power of 2) ensures that this never happens. Of the basic
object types, the only one that might be larger than 8 is a long
double. If WinXp is returning aligned at 8, it is either
ignoring the performance hit on long double, or supposing that
the hardware long double is never used (e.g. because the
compilers implement the C++ long double type as a hardware level
8 byte double).

Quote:
I like the tip to store hidden information at the front of the
memory allocation and then step the "user" start beyond that.
If I do that in conjunction with my own allocator then I will
always be able to find the address of the original malloc
returned by simply backing up the address supllied by vector.

It's the usual solution for this problem. Regardless of the
type of information you want to hide.

Quote:
I am pretty certain that having an allocator<_Type>::pointer
type that is not actually _Type* will open up a huge canm of
worms,

I suspect that just about anything you try to do with a custom
allocator will open a huge can of worms. But this is probably
worse than most things; I think that Dinkumware has made some
effort to make allocators useful (more than the minimum required
by the standard, anyway), but I doubt that all of the
theoretically possible combinations have been fully tested.

Quote:
although writing the implementation has its attractions. Also
I believe Scott Meyers had something to say about the
inadvisability of doing this.

Probably.

If the goal is simply to write correct code, which works
portably, the advice can only be, don't do it. If you have
concrete performance requirements for a specific platform,
however, you often have to do things that aren't really nice.

All in all, however, I think that the best solution is either to
write your own, or to pay someone to do it for you.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Simon Bone
Guest





PostPosted: Fri Sep 09, 2005 3:23 pm    Post subject: Re: STL vectors, allocators and SSE2 instructions Reply with quote

On Thu, 08 Sep 2005 07:29:39 -0400, kanze wrote:

Quote:
Philip wrote:

Initial testing shows that malloc on WinXp aligns on 8-byte
boundaries, which probably has something to do with their
free-chain management structure.

There are two factors involved: internal overhead, and alignment
requirements. On an Intel architecture, there are, strictly
speaking, no alignment requirements, but performance takes a hit
if one of the basic object types (those addressed directly by
hardware) spans a cache line, or something like that; ensuring
an alignment of at least the object's size (rounded up to the
next power of 2) ensures that this never happens. Of the basic
object types, the only one that might be larger than 8 is a long
double. If WinXp is returning aligned at 8, it is either
ignoring the performance hit on long double, or supposing that
the hardware long double is never used (e.g. because the
compilers implement the C++ long double type as a hardware level
8 byte double).


IIRC VC++ does do that (implement long double as a hardware level double),
which always seems particularly daft to me. Why bother typing the extra
4 characters if you don't want the extra precision guaranteed? But I know
some Windows compilers do implement long double as long double. G++ for
one, and I think Digital Mars, from posts seen here.

There is no guarantee those compilers use the same heap allocators as the
Microsoft compilers either. That would be worth investigating before
beginning to write a special allocator for the platform. This could be one
of the differences between mingw-g++ and cygwin-g++, which are advertised
as using the native and ported C libraries respectively.

Simon Bone

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.