 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Matte Guest
|
Posted: Fri Nov 26, 2004 1:13 pm Post subject: Which way are fastest to find a string, sorting+binary_searc |
|
|
Which way are fastest to find a string, sorting+binary_search vs.
hashing.
I have only a string that is a key value and I just wont to know if
the key is in the container. My program are in C++.
I have two alternative to find the key in a container:
1. Using the container hash_set and the command find.
2. Using the container set and then sorting the keys. To find the key
I use binary_search.
Is it someone that know the fastest way to find a key. Or is it some
other way that are faster then my two ways.
Meny thanks
/Mattias
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Maxim Yegorushkin Guest
|
Posted: Fri Nov 26, 2004 11:51 pm Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
Matte <berca (AT) home (DOT) se> wrote:
| Quote: | Which way are fastest to find a string, sorting+binary_search vs.
hashing.
|
[]
Hashing may be faster than sorting + binary_search, since the former
compares hashes only rather than strings when there are no hash
collisions, but it depends on your data and your hash function.
| Quote: | Is it someone that know the fastest way to find a key. Or is it some
other way that are faster then my two ways.
|
Ternary tree may pretty much be faster than hashing.
http://home.od.ua/~relayer/algo/data/tern/lead.htm
--
Maxim Yegorushkin
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Antoun Kanawati Guest
|
Posted: Sat Nov 27, 2004 4:08 am Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
Matte wrote:
| Quote: | Which way are fastest to find a string, sorting+binary_search vs.
hashing.
I have only a string that is a key value and I just wont to know if
the key is in the container. My program are in C++.
I have two alternative to find the key in a container:
1. Using the container hash_set and the command find.
2. Using the container set and then sorting the keys. To find the key
I use binary_search.
|
The std::set is sorted, pretty much by definition; the second argument
of the std::set template (defaulted to std::less<T>) must define an
ordering relation. std::set guarantees logarithmic time search, and
insertion.
| Quote: | Is it someone that know the fastest way to find a key. Or is it some
other way that are faster then my two ways.
|
Hash-sets/maps have a higher setup overhead. Best thing to do is to
implement through an abstraction (often a simple typedef will do),
then measure (profile) and compare.
--
A. Kanawati
[email]NO.antounk.SPAM (AT) comcast (DOT) net[/email]
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Alberto Barbati Guest
|
Posted: Sat Nov 27, 2004 4:08 am Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
Matte wrote:
| Quote: | 1. Using the container hash_set and the command find.
|
hash_set is not a standard container (yet), thus the question is
implementation specific. BTW, find is a method (aka member function),
not a command.
| Quote: | 2. Using the container set and then sorting the keys. To find the key
I use binary_search.
|
std::set already sorts the keys for you. You don't need to sort them
again (in fact you can't because keys in a set are immutable) and you
can use (in fact you *should* use) the set::find() method to efficiently
look for the key.
sort+binary_search is the right way to work with vectors, not sets.
| Quote: | Is it someone that know the fastest way to find a key. Or is it some
other way that are faster then my two ways.
|
Don't forget that there are three quantities in play: memory, insertion
time and access time. Different containers provides different trade-offs
among these and only your application knows which trade-off is the best
one. Morale: profiling is your best friend.
HTH,
Alberto
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
jjr2004a Guest
|
Posted: Sat Nov 27, 2004 12:33 pm Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
| Quote: | Hashing may be faster than sorting + binary_search, since the former
compares hashes only rather than strings when there are no hash
collisions, but it depends on your data and your hash function.
A hash lookup with no collisions has no comparisons. The key is |
fed to a hash function which returns an index into a table which
is used to retrieve a value. No comparison are needed.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
jjr2004a Guest
|
Posted: Sat Nov 27, 2004 12:33 pm Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
| Quote: | Which way are fastest to find a string, sorting+binary_search vs.
hashing.
I have only a string that is a key value and I just wont to know if
the key is in the container. My program are in C++.
I have two alternative to find the key in a container:
1. Using the container hash_set and the command find.
2. Using the container set and then sorting the keys. To find the key
I use binary_search.
In general... |
A hash strives to achieve constant time look-up whatever the number of
objects and so will be faster than the search, on average, as the number
of objects increases.
For a small number of objects, you should not notice a difference.
You could illustrate the difference by doing a test on 100, 1,000, and
10,000 strings using the two methods.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Torsten Robitzki Guest
|
Posted: Sat Nov 27, 2004 9:28 pm Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
jjr2004a wrote:
| Quote: | Hashing may be faster than sorting + binary_search, since the former
compares hashes only rather than strings when there are no hash
collisions, but it depends on your data and your hash function.
A hash lookup with no collisions has no comparisons. The key is
fed to a hash function which returns an index into a table which
is used to retrieve a value. No comparison are needed.
|
How will you make sure that the retrieved value is the searched one
without a comparison? The retrieved value and the searched value might
have the same hash value by accident.
regards
Torsten
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Antoun Kanawati Guest
|
Posted: Sun Nov 28, 2004 2:09 am Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
jjr2004a wrote:
| Quote: | For a small number of objects, you should not notice a difference.
|
For a small number of objects, you'll notice that hash table setup may
dominate. The typical hash table has a few hundered buckets, each of
which must be initialized and later destroyed. When the number of
elements is small, this can become significant even if access time
is constant or nearly so.
You will have to characterize the usage of the container, and then
measure to find which data structure is better.
--
A. Kanawati
[email]NO.antounk.SPAM (AT) comcast (DOT) net[/email]
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Tue Nov 30, 2004 11:17 pm Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
Antoun Kanawati <NO.antounk.SPAM (AT) comcast (DOT) net> wrote
| Quote: | jjr2004a wrote:
For a small number of objects, you should not notice a difference.
For a small number of objects, you'll notice that hash table setup may
dominate. The typical hash table has a few hundered buckets, each of
which must be initialized and later destroyed. When the number of
elements is small, this can become significant even if access time is
constant or nearly so.
|
As you say, a lot depends on what he is doing with the container. If
the container is just initialized once at program start-up, and its
contents never change, then setup time is irrelevant. (In practice,
I've not noticed any difference in set up time between hash containers
and std::set -- and when there was, it was in favor of hash containers.)
Most importantly, of course, is the number of elements. Over about
10000, I don't think anything will be a hashed container. For smaller
numbers, std::set, or a binary search on a sorted std::vector may be
advantageous. And for say, 10 or so, a straight linear search on an
unsorted vector could end up the fastest.
Note too that the contents of the strings will have an effect. If the
strings only different in the trailing characters, a hashed container
will end up being more efficient for less elements than if the strings
different in the first one or two characters.
--
James Kanze GABI Software http://www.gabi-soft.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Antoun Kanawati Guest
|
Posted: Wed Dec 01, 2004 9:20 am Post subject: Re: Which way are fastest to find a string, sorting+binary_s |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote:
| Quote: | Antoun Kanawati <NO.antounk.SPAM (AT) comcast (DOT) net> wrote in message
news:<iemdnXPjeetrNDXcRVn-hQ (AT) comcast (DOT) com>...
For a small number of objects, you'll notice that hash table setup may
dominate. The typical hash table has a few hundered buckets, each of
which must be initialized and later destroyed. When the number of
elements is small, this can become significant even if access time is
constant or nearly so.
As you say, a lot depends on what he is doing with the container. If
the container is just initialized once at program start-up, and its
contents never change, then setup time is irrelevant. (In practice,
I've not noticed any difference in set up time between hash containers
and std::set -- and when there was, it was in favor of hash containers.)
|
My experiments were with integer elements, not strings; so, I had very
little overhead related to element construction/copying/comparison.
Also, my container size varied from large to very small, and many
containers were created (and destroyed). So, the simple generlizations
did not apply, and measurement was necessary to find the best "set"
representation. I ended up with a situation where hash_set's
hash_table::clear took nearly 30% of the time; with std::set, container
overhead went to the bottom of the profile (the top was occupied by the
setup of a 1300x1300 lookup table, to cache an expensive and very
frequently used function).
The basic point remains: without a well-characterized load (the mix
of operations used on the container), it is hard to predict which
type will do best. It is often best to measure under a "typical"
load, and compare. In addition to the algorithmic characteristics
of the various containers, inlining (or lack of it) and the nature
of the data itself can have significant effects on performance that
are not easy to characterize.
--
A. Kanawati
[email]NO.antounk.SPAM (AT) comcast (DOT) net[/email]
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|