 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Dave Harris Guest
|
Posted: Sat Jun 09, 2012 6:20 pm Post subject: Re: Techniques to avoid minimal scope inefficiency with comp |
|
|
0xCDCDCDCD (AT) gmx (DOT) at (Martin B.) wrote (abridged):
| Quote: | Have you measured both variants? Which one was more efficient?
[D] is more efficient. It will re-use the buffer from string s
a number of times (except for the pathological case where the
string lengths increase strictly in steps larger than the
allocation granularity of string).
[C] will - must - delete the buffer of the s object each loop step
and do a fresh allocation.
|
True, but have you measured them? The difference may be smaller than
you think.
Much depends on the memory allocator. Nowadays some keep an array of
free lists, indexed by block size. In a case like this, where
essentially the same block is freed and reallocated over and over,
both allocation and deallocation could take O(1) time, operating only
at the head of the appropriate free list.
Where-as your string processing is O(N) in the length of the string.
First the characters have to be read from their source and written to
get_text's argument. Then to_lower has to read each character, figure
out its lower-case version, and write it. (And lower-casing can be
non-trivial if non-English languages are supported correctly.) Then
set_lower_text has to read each character and write it to its
destination.
Now let's think about cache coherency. Since it is the same block of
memory we are freeing and reallocating each time, it will likely
become resident in cache for the duration. Since we are dealing with
a large number of strings, the memory they take probably won't fit in
cache. You may find your execution time is so dominated by cache misses
that the difference between the optimised and unoptimised versions is
lost in the noise.
And that's without considering where the strings ultimately came from
or are going to. Were they read from disk or over a network? Are they
being rendered to the screen as pixels?
In a separate post I have tried to show how a more idiomatic version,
such as:
std::transform( begin(), end(), lower_begin(), to_lower );
by exploiting move semantics, might perform fewer string copies. If
your pattern of deallocation and allocation is cheaper than the cost
of copying strings, you may be optimising the wrong thing.
I appreciate your specific example was intended to make a more general
point. My picking at specific points is intended to make a point
equally general. Any optimisation that makes the code significantly
less elegant should be justified by measurement, or else it is, almost
by definition, premature optimisation.
-- Dave Harris, Nottingham, UK.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Dave Harris Guest
|
Posted: Sat Jun 09, 2012 6:36 pm Post subject: Re: Techniques to avoid minimal scope inefficiency with comp |
|
|
0xCDCDCDCD (AT) gmx (DOT) at (Martin B.) wrote (abridged):
| Quote: | I really asked this question because I though there might be a
technical, elegant solution out of this, but since noone posted
anything that I found particularly enlightening(*), I guess there
really isn't a technical solution to this, but it has to be
addressed on the pedagogical and psychological level
|
Well, move semantics should help. Consider some straight-forward
code for expressing what you are doing:
for (int i = 0; i != n; ++i)
set_lower_text( i, to_lower( get_text( i ) ) );
I hope this passes your guidelines. Suppose the signatures are:
string get_text( int i );
string to_lower( string &&s );
void set_lower_text( int i, &&s );
So get_text() allocates and returns a copy. This is passed to to_lower() as an
r-value reference, so it can modify it in-place and return it without copying the
characters or doing an allocation. Set_lower_text() receives another r-value
reference, which it can move to its final destination, again without allocating or
copying. So we have one allocation and one copy, both in get_text(). If instead the
signatures are:
const string &get_text( int i );
string to_lower( const string &s );
void set_lower_text( int i, &&s );
then the count is the same but now the allocation and copy happens inside
to_lower(). We can provide both overloads of to_lower().
Now compare with your optimised code:
| Quote: | string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
|
with signatures:
void get_text( int i, string &s );
void to_lower( string &s );
void set_lower_text( int i, const string &s );
Your version necessarily copies the string in get_text(), and again in
set_lower_text(). Set_lower_text() will probably have to do an allocation. (If
instead it takes an r-value reference and pilfers its argument's memory, s is left
empty and we gain nothing by moving it outside the loop.) So we have 1 allocation
and 2 copies. You've turned two lines of code into five and not really gained
anything over the plain version. You might have saved an allocation, but you've
definitely copied the characters an extra time, so if the allocator is efficient
(as I've argued elsewhere) you might even be slower.
Just write plain, simple code in C++11 and it will usually be fine.
If you can use iterators, so much the better:
std::transform( begin(), end(), lower_begin(), to_lower );
What's not to like?
If it really matters, I would look more at what get_text() and
set_lower_text() are actually doing. For example, supposing
set_lower_text() actually stores its argument in a vector called
lower_text:
void set_lower_text( int i, const string &s ) {
lower_text[i] = s;
// Enforce invariants here, if any.
}
You might consider using in-place modification explicitly. Add:
void set_lower_text( int i,
const std::function<void(string &)> fn ) {
fn( lower_text[i] );
// Enforce invariants here, if any.
}
so the caller code looks like:
for (int i = 0; i != n; ++i)
set_lower_text( i, [=]( string &lower_text ) {
get_text( i, lower_text );
to_lower( lower_text );
} );
This avoids the need for an intermediate buffer entirely. The work is
done directly in lower_text's own storage. If you can encapsulate the
loop, too, so much the better. You could present it as a kind-of
in-place transform:
template<class Iterator, class Func>
void set_all_lower_text( Iterator i, Func f ) {
for (auto j = lower_text.begin(); j != lower_text.end();
++j, ++i)
f( *j, *i );
// Enforce invariants here...
}
// ... or here.
}
Used like:
set_all_lower_text( text_cbegin(),
[]( string &lower_text, const string &text ) {
lower_text = text;
to_lower( lower_text );
} );
This version allows more control and economies of scale. For example,
if lower_text must be kept sorted, you can resort it once at the end,
instead of repeatedly after each string is changed. If you need to
notify listeners when the strings are changed, you can do so once
instead of once per string. If you need to lock the data against
multi-threaded access, you can do so with less overhead.
The key thing is for the lambda expression to express what we want to
do to each string in its purest and most efficient form, and then use
that with a custom algorithm that manages everything else. Separation
of concerns.
The reason I don't think this is a good solution for your particular
case is because, without measurements, it's premature optimisation
again. I don't advocate that every signature like:
void set_lower_text( int i, const string &s );
also have an in-place version like:
void set_lower_text( int i,
const std::function<void(string &)> fn );
However, where there is a proven need for performance, the in-place
version may be worth considering. And just adding the r-value
reference version:
void set_lower_text( int i, string &&s ) {
lower_text[i] = std::move( s );
// Enforce invariants here, if any.
}
gains efficiency without disrupting caller code at all.
(I have reordered these quotes for clarity.)
| Quote: | Of course, iff I'm going to write an algorithm, it's gonna be
as efficient as it gets without contorting myself.
BUT -- that's really missing the point I think: The examples I
gave in the OP were just that: examples. People are *going to*
write loops that don't make much sense as algorithms (-> used
once).
|
We may be saying the same thing here. Perhaps set_all_lower_text() is
what you had in mind for the efficient algorithm.
On the other hand, perhaps you are saying that anything used only
once shouldn't be an algorithm? If so, then I disagree. Even if
set_all_lower_text() is only used in one place, it's still providing
benefits like abstraction, information hiding, de-coupling and
separation of concerns. We have three components here - the source
of get_text(), the destination for set_lower_text(), and the code
that transfers between them - and none of them needs to know
implementation details of the other two.
I would further say that thinking in algorithms often leads to code
that is both elegant and efficient. Elegant because it is more
abstract and higher level. Efficient because modern C++ is all
about zero-cost abstractions, and clean code is easier to optimise.
Where-as a signature like:
void get_text( int i, string &result );
leads to inelegant code. It uses a side-effect to return its result,
so it can't be composed in a functional way. It forces the string to
be copied; a copy that can't be eliminated. It uses an integer index
instead of iterators so it doesn't place nicely with standard
algorithms. It forces you to use five or six lines of code where only
one or two should be needed.
| Quote: | And summed up over the codebase, there will be a lot of these
(different) loops. And some (few?) of these loops will mess up
performance in one (minor) way or another.
And then I will have to mess around with the guidelines
|
How does my code break your guidelines?
| Quote: | because some took them too literally and now think they have to
micro-optimize everywhere.
|
This does sound like pedagogical problem.
-- Dave Harris, Nottingham, UK.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Powered by phpBB © 2001, 2006 phpBB Group
|