C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

speed issues with resize() and POD
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Paul Rosen
Guest





PostPosted: Fri Sep 02, 2005 11:06 am    Post subject: speed issues with resize() and POD Reply with quote



I have a vector like this:

std:vector<unsigned char>

I have an object of this type that is used over and over. Conceptually
it is kind of like this (NOTE: the following is trivial code, don't take
the exact code seriously.):

std:vector<unsigned char> Coll;

while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.

In my implementation (the STL that comes with MSVC 7.1), I think I could
just set the variable Coll._Mylast if I'm changing the size to something
under the capacity, but that is unportable and just seems wrong.

Does anyone have a better solution?

I don't really want to create my own vector class since I pass it around
to other routines that shouldn't know about any of this.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
andysemail@runbox.com
Guest





PostPosted: Fri Sep 02, 2005 2:30 pm    Post subject: Re: speed issues with resize() and POD Reply with quote



Have you thought about changing the container you are using?

vectors are often implemented as wrappers around a single array so if
there is not enough space to grow the array in place a new array must
be allocated and all the contents copied over.

the dequeue class has a very similar protocol but is often implemented
as a series of arrays, if the dequeue needs to grow it can just
allocate some more memory without having to copy the whole data
structure.

Forgive me if this is not helpful as I am fairly new to cpp.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
mzdude
Guest





PostPosted: Fri Sep 02, 2005 2:31 pm    Post subject: Re: speed issues with resize() and POD Reply with quote



Paul Rosen said
Quote:
std:vector<unsigned char> Coll;
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}
Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to.

Rather than resize and passing the entire container to Process, use
iterators to mark the boundries

while( lots of times through )
{
int iSize = SizeThisTime();
if( i > Coll.size() )
Coll.resize( iSize );
Process( col.begin(), col.begin()+iSize);
}


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Fri Sep 02, 2005 9:41 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

Paul Rosen wrote:
Quote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and
over. Conceptually it is kind of like this (NOTE: the
following is trivial code, don't take the exact code
seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is
taking up a lot of time, but it theoretically wouldn't have
to. I stepped through it and can see that it is going through
the elements one by one, which is unnecessary in this case
because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum
along really fast.

I'm not sure I understand. Technically, this is the required
behavior of resize() -- it is not allowed to invalidate
iterators, and thus cannot deallocate any memory. The only
thing it would normally do in addition to just setting the new
size is 1) call destructors on removed objects, and 2) call
constructors on added objects. In the former case, a good
compiler will detect that the loop in the code doesn't do
anything, and surpress it. In the second, that is part of the
specification of the container (all containers) -- a container
never contains an uninitialized object, POD or no.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Howard Hinnant
Guest





PostPosted: Fri Sep 02, 2005 9:42 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

In article <BYFRe.5139$9i4.1794 (AT) newsread2 (DOT) news.atl.earthlink.net>,
Paul Rosen <prosen (AT) fte (DOT) com> wrote:

Quote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and over. Conceptually
it is kind of like this (NOTE: the following is trivial code, don't take
the exact code seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.

In my implementation (the STL that comes with MSVC 7.1), I think I could
just set the variable Coll._Mylast if I'm changing the size to something
under the capacity, but that is unportable and just seems wrong.

Does anyone have a better solution?

I don't really want to create my own vector class since I pass it around
to other routines that shouldn't know about any of this.

Fwiw, the Freescale implementation of vector<unsigned char> does what
you suggest, except instead of:

2) else do the normal stuff.

It is more like:

2) expand using memset to "construct" the new elements.

That being said, this is not a clear-cut case. A major question is
whether such optimizations are desirable or not in the face of
allocator::construct and allocator::destroy which may have side effects
beyond those listed in the standard.

I.e. Are allocator::construct/destroy convenience functions or
customization points?

I believe the standard is less than clear on this issue. And
furthermore I believe the "right" answer is less than clear as well.
The Freescale implementation currently has a hybrid approach (for better
or worse):

1. allocator::construct is consistently used as long as the contained
type has a non-trivial copy constructor, else it is not.

2. allocator::destroy is consistently used as long as the contained
type has a non-trivial destructor, else it is not.

Although I do not do this (yet), it would be possible to further
specialize these optimizations on whether or not std::allocator was
being used. But then my customers using vector<char,
my_allocator may rightly complain that vector slows to a crawl
with their allocator but not with std::allocator.

This is a classic performance / functionality tradeoff, and as you note,
the performance hit can be significant. So we do not want to dictate
this functionality if it is not needed.

-Howard

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Greg Herlihy
Guest





PostPosted: Fri Sep 02, 2005 9:45 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

Paul Rosen wrote:
Quote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and over. Conceptually
it is kind of like this (NOTE: the following is trivial code, don't take
the exact code seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.


Does anyone have a better solution?

You should "preflight" this operation by first calling
std::vector::reserve() to increase Coll's capacity to some reasonably
large number that is unlikely to be exceeded for the duration of this
function. Reserving memory for the vector's contents up front ensures
that resizing the vector to any size smaller than the vector's capacity
will not entail reallocating and copying the vector's contents. It's
the reallocations and copying of the vector's elements as its size
gradually increases that is almost certainly the cause of the poor
performance.

Note that if SizeThisTime() returns a value that exceeds Coll's
capacity, then the routine should double the amount of memory Coll has
in reserve and do the same on all such future occurrences (of course if
Coll's initial capacity was well chosen, having to increase it
subsequently should not be a frequent event).

Greg


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Paul Rosen
Guest





PostPosted: Fri Sep 02, 2005 9:45 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

mzdude wrote:
Quote:
Paul Rosen said

std:vector<unsigned char> Coll;
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}
Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to.


Rather than resize and passing the entire container to Process, use
iterators to mark the boundries

while( lots of times through )
{
int iSize = SizeThisTime();
if( i > Coll.size() )
Coll.resize( iSize );
Process( col.begin(), col.begin()+iSize);
}

That's a possibility, but I'd prefer to just pass one parameter. A
really nice feature of using vector instead of a plain old unsigned
char* and a length is that the vector is a single parameter for a single
concept.

If I were to propagate the parameter change all the way through the
Process() routine and everything it called, I'd be tempted to just go
back to C:

unsigned char* puc = new unsigned char[MAX_BUFF];

while( lots of times through )
{
int iSize = SizeThisTime();
Process(puc, iSize);
}

delete [] puc;

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Tony Delroy
Guest





PostPosted: Fri Sep 02, 2005 9:47 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

Hi Paul,

If your STL isn't doing what you want, then you've provided the logical
solution in your question, and you can implement it yourself in code:

std:vector<unsigned char> Coll;

int largest_size = 0;
while (lots of times through)
{
int iSize = SizeThisTime();
if (iSize > largest_size)
{
largest_size = iSize;
Coll.resize(iSize);
}
Process(Coll);
}

Anyway, I guess the STL implementation's iteration/empty-destructor
calls should be optimised away if you set your optimisation level
appropriately.

Cheers, Tony


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Carlos Moreno
Guest





PostPosted: Fri Sep 02, 2005 9:49 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

Paul Rosen wrote:
Quote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and over. Conceptually
it is kind of like this (NOTE: the following is trivial code, don't take
the exact code seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.

In my implementation (the STL that comes with MSVC 7.1), I think I could
just set the variable Coll._Mylast if I'm changing the size to something
under the capacity, but that is unportable and just seems wrong.

Does anyone have a better solution?

How about don't resize unless the object is growing?

Quote:
I don't really want to create my own vector class since I pass it around
to other routines that shouldn't know about any of this.

This should not be a valid argument -- your other routines should not be
attached (tightly coupled) to the fact that you're using one particular
container. If you make those templates, or have them receive iterators
(which would mean template as well), that would solve both problems:
you don't have to resize the vector, since you only have to pass a pair
of iterators specifying the sequence (so, you pass the second iterator
pointing where you know the end should be, but without making that the
end).

If your other functions are not up for negotiation, then perhaps you're
out of luck -- you could try raising the optimization level and see if
the compiler inlines and optimizes away the do-nothing destructor?

HTH,

Carlos
--

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Maxim Yegorushkin
Guest





PostPosted: Fri Sep 02, 2005 9:49 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

Paul Rosen wrote:
Quote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and over. Conceptually
it is kind of like this (NOTE: the following is trivial code, don't take
the exact code seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

It does not do that for PODs. For PODs it uses memcpy/memmove to copy
and memset to initialize new elements. Check the sources carefully.

Quote:
What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.

This is what happens for PODs.

Quote:
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.

[]

Quote:
Does anyone have a better solution?

I don't really want to create my own vector class since I pass it around
to other routines that shouldn't know about any of this.

I experienced a similar problem. I used a vector<char> as a buffer for
reading data from a socket. Having profiled my server under heavy load
it turned out that 30% of its runtime was spent in zeroing out buffer
when I grew it (vector::resize) to read data in.

I threw std::vector<> away and wrote my own low cholesterol
pod_vector<> and have been happy ever since.


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Ivan Vecerina
Guest





PostPosted: Fri Sep 02, 2005 9:50 pm    Post subject: Re: speed issues with resize() and POD Reply with quote

"Paul Rosen" <prosen (AT) fte (DOT) com> wrote

Quote:
I have a vector like this:
....
std:vector<unsigned char> Coll;

while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is taking up a
lot of time, but it theoretically wouldn't have to. I stepped through it
and can see that it is going through the elements one by one, which is
unnecessary in this case because the object contains POD.

Unfortunately, without compiler support, the library code cannot
automatically determine if a template parameter is POD or not.
Some library implementations would have a default specialization
for built-in types, in release mode at least. Maybe not the one you use?

Quote:
What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum along really
fast.

In my implementation (the STL that comes with MSVC 7.1), I think I could
just set the variable Coll._Mylast if I'm changing the size to something
under the capacity, but that is unportable and just seems wrong.

Does anyone have a better solution?

Question: do you need to preserve the contents of Coll between
iterations of the loop?
If not, you could try using the pair of calls:
Coll.resize(0);
Coll.resize(iSize);
This will save std::vector from copying the previous contents of
the collection when it needs to allocate a larger memory block.
You could also call vector::reserve initially with a
"reasonable" size guess.

I hope this helps,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Maxim Yegorushkin
Guest





PostPosted: Sat Sep 03, 2005 9:31 am    Post subject: Re: speed issues with resize() and POD Reply with quote


[email]andysemail (AT) runbox (DOT) com[/email] wrote:

Quote:
vectors are often implemented as wrappers around a single array so if
there is not enough space to grow the array in place a new array must
be allocated and all the contents copied over.

Then what is the "rare" implementation which does not wrap a single
array?


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Greg Herlihy
Guest





PostPosted: Sat Sep 03, 2005 9:34 am    Post subject: Re: speed issues with resize() and POD Reply with quote


[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
Quote:
Paul Rosen wrote:
I have a vector like this:

std:vector<unsigned char

I have an object of this type that is used over and
over. Conceptually it is kind of like this (NOTE: the
following is trivial code, don't take the exact code
seriously.):

std:vector
while (lots of times through)
{
int iSize = SizeThisTime();
Coll.resize(iSize);
Process(Coll);
}

Now, my profiler is showing that the resize() function is
taking up a lot of time, but it theoretically wouldn't have
to. I stepped through it and can see that it is going through
the elements one by one, which is unnecessary in this case
because the object contains POD.

What I'd like resize() to do in this case is:

1) if the new size is less than the capacity, just set the size.
2) else do the normal stuff.

Very quickly Coll would reach a maximum size and would hum
along really fast.

I'm not sure I understand. Technically, this is the required
behavior of resize() -- it is not allowed to invalidate
iterators, and thus cannot deallocate any memory. The only
thing it would normally do in addition to just setting the new
size is 1) call destructors on removed objects, and 2) call
constructors on added objects. In the former case, a good
compiler will detect that the loop in the code doesn't do
anything, and surpress it. In the second, that is part of the
specification of the container (all containers) -- a container
never contains an uninitialized object, POD or no.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


Certainly callng vector::resize() will deallocate memory and invalidate
the vector's iterators when called to resize the vector beyond its
current capacity. How could it avoid doing so? And what would be the
benefit from not doing so? A quick check of vector::resize()'s
implementation from two different compiler vendors shows that in both
libraries, calling vector::resize() will reallocate memory if
necessary.

Here's a another suggestion in response to the original post. If the
initialization of the elements when resizing the vector turns out to be
time consuming, then eliminate the resize call altogether (but still
call reserve() so that the vector has some sufficiently large
capacity). Then just have the routine push_back() the items as they are
being added to the vector.

Greg


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]


Back to top
Ivan Vecerina
Guest





PostPosted: Sat Sep 03, 2005 9:39 am    Post subject: Re: speed issues with resize() and POD Reply with quote

"Ivan Vecerina" <INVALID_use_webform_instead (AT) vecerina (DOT) com> wrote in
message news:df9nlr$vtu$1 (AT) news (DOT) hispeed.ch...
: Question: do you need to preserve the contents of Coll between
: iterations of the loop?
: If not, you could try using the pair of calls:
: Coll.resize(0);
: Coll.resize(iSize);
: This will save std::vector from copying the previous contents of
: the collection when it needs to allocate a larger memory block.
Woops: while this might help in some cases,
it probably won't do much as the new contents
will still have to be "initialized" individually.
[ although it might save read-access to the original
data, but this isn't likely to do much ].

Sorry for the confusion,
Ivan



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Ivan Vecerina
Guest





PostPosted: Sat Sep 03, 2005 9:39 am    Post subject: Re: speed issues with resize() and POD Reply with quote

"Paul Rosen" <prosen (AT) fte (DOT) com> wrote

....
: > Rather than resize and passing the entire container to Process,
use
: > iterators to mark the boundries
: >
: > while( lots of times through )
: > {
: > int iSize = SizeThisTime();
: > if( i > Coll.size() )
: > Coll.resize( iSize );
: > Process( col.begin(), col.begin()+iSize);
: > }
:
: That's a possibility, but I'd prefer to just pass one parameter. A
: really nice feature of using vector instead of a plain old unsigned
: char* and a length is that the vector is a single parameter for a
single
: concept.
[NB: boost.org has classes to wrap iterator ranges into a single
variable]

: If I were to propagate the parameter change all the way through the
: Process() routine and everything it called, I'd be tempted to just
go
: back to C:
:
: unsigned char* puc = new unsigned char[MAX_BUFF];
:
: while( lots of times through )
: {
: int iSize = SizeThisTime();
: Process(puc, iSize);
: }
:
: delete [] puc;

Yes but this is not exception-safe.
You can pass a buffer the C-way while retaining the safety and
convenience of std::vector:

std::vector<unsigned char> buf(MAX_SIZE_GUESS);
while( lots of times through )
{
unsigned const size = SizeThisTime();
if( size > buf.size() ) buf.resize( size );
Process( &buf.front(), size );
}

Iterators, as suggested by mzdude, give you more flexibility, but you
can use the 'good old way' if you know you won't have to bother with
non-contiguous storage.

Cheers,
Ivan

--
http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact
form



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.