 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Michael Mellor Guest
|
Posted: Sun Feb 08, 2004 12:25 pm Post subject: RTTI Performance |
|
|
I have been comparing a self wrote RTTI vs compiler RTTI and I am
finding the compiler generated one is much slower. The code listed
below, it is separated out in many files to try and stop the optimiser
being too clever, was compiled on g++ 3.3.1 and VC++ 7.0 both with full
optimisation.
Cast base to derived object(cast base* to B*):
my_cast g++ 2.237s
my_cast VC++ 1.537s
dynamic_cast g++ 5.699s
dynamic_cast VC++ 9.429s
When the cast isn't the object(cast A* to B*):
my_cast g++ 2.138s
my_cast VC++ 1.969s
dynamic_cast g++ 12.529s
dynamic_cast VC++ 14.589s
My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?
The code below is somewhat untested and so if I have made a mistake
please kindly inform me .
Michael Mellor
--------a.h-------------
struct base {
enum { ID = 0 };
virtual bool am_i ( int id ) const;
};
struct A : base {
enum { ID = 1 };
virtual bool am_i ( int id ) const;
};
struct B : base {
enum { ID = 2 };
virtual bool am_i ( int id ) const;
};
void do_nothing ( B * );
base *get_obj ( int num );
--------a.cc-------------
#include "a.h"
bool base::am_i ( int id ) const {
return ID == id;
}
bool A::am_i ( int id ) const {
return ID == id || base::am_i(id);
}
bool B::am_i ( int id ) const {
return ID == id || base::am_i(id);
}
-------main.cc------------
#include "a.h"
static const int LOOPS = 100000000;
template<typename OBJ>
OBJ *my_cast ( base *b ) {
if ( b->am_i ( OBJ::ID ) )
return static_cast<OBJ*>(b);
return 0;
}
int main ( int argc, char *argv[] ) {
base *b = get_obj ( 2 );
if ( argc != 1 ) {
for ( int a = 0; a < LOOPS; ++a ) {
do_nothing ( dynamic_cast
}
} else {
for ( int a = 0; a < LOOPS; ++a ) {
do_nothing ( my_cast
}
}
}
------stuff.cc----------
#include "a.h"
base *get_obj ( int num ) {
return num == 1 ? (base*)new A : (base*)new B;
}
void do_nothing ( B * ) {
}
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Ivan Vecerina Guest
|
Posted: Mon Feb 09, 2004 12:49 am Post subject: Re: RTTI Performance |
|
|
"Michael Mellor" <news-at- (AT) michaelmellor-dot- (DOT) com> wrote
Hi,
| Quote: | I have been comparing a self wrote RTTI vs compiler RTTI and I am
finding the compiler generated one is much slower. The code listed
below, it is separated out in many files to try and stop the optimiser
being too clever, was compiled on g++ 3.3.1 and VC++ 7.0 both with full
optimisation.
....
My question is: What makes RTTI at least twice as slow as the custom
implementation below?
The standard RTTI mechanism defined in the standard needs to do much |
more than your specifically tuned implementation. It needs to support
multiple inheritance, dynamic_casts, and other run-time features --
all this within the context of separate compilation.
Also, RTTI has not been designed to be a replacement of virtual
function calls, but to fulfil other kinds of needs (e.g. the
streaming or "flattening" of objects, where i/o is the main bottleneck).
While it is still possible for an implementation to optimize the
specific scenario you describe, it may create overhead or negatively
impact the performance of other features.
| Quote: | Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?
Not in a portable way. But this will work on some platforms. |
Regards,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- e-mail contact form
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Walter Guest
|
Posted: Mon Feb 09, 2004 12:53 am Post subject: Re: RTTI Performance |
|
|
"Michael Mellor" <news-at- (AT) michaelmellor-dot- (DOT) com> wrote
| Quote: | My question is: What makes RTTI at least twice as slow as the custom
implementation below?
|
An excellent question. The answer is that a simple virtual function call is
very fast. RTTI does not use a virtual function call, however, a library
function is called and types are compared. Because of this, I usually eschew
dynamic_cast and instead write something similar to your example:
struct B;
struct A
{
virtual B* isB() { return (B*)NULL; }
};
struct B : A
{
B* isB() { return this; }
};
void test(A* a)
{
B* b;
b = dynamic_cast<B*>(a); // slower
b = a->isB(); // faster
}
This, of course, requires A to know in advance that B will derive from A,
which is not possible if A is in a general purpose library. But it's smokin'
fast <g>.
| Quote: | Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?
|
Comparing vtable addresses can tell you if two classes are the same, but
will not tell you if one is derived from the other. But since such isn't
explicitly supported in the language, be careful because it is possible for
the compiler to emit multiple vtables for the same class, in which case such
a technique will fail. Comparing the contents of the vtable won't help,
either, since it is unknown at runtime how many entries are in it.
-Walter
www.digitalmars.com free C/C++/D compilers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Scott Meyers Guest
|
Posted: Tue Feb 10, 2004 11:02 pm Post subject: Re: RTTI Performance |
|
|
On 8 Feb 2004 07:25:59 -0500, Michael Mellor wrote:
| Quote: | My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?
|
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp. I asked why MS didn't just compare
addresses of type_info objects (which is essentially the same as comparing
vtbl addresses), and they pointed out that when you have DLLs, there may be
multiple type_info objects, so a simple address comparison doesn't work.
As a result, a dynamic_cast that looks like this,
dynamic_cast<T*>(pObj)
generates code that basically looks like this:
if (the address of pObj's type_info object ==
the address of T's type_info object)
| Quote: | |
(strcmp(pObj's type's mangled name, T's mangled name))) |
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests
Hence, each failed address comparison leads to a call to strcmp.
I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.
If you read the Performance TR
([url]http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1487.pdf)[/url], you'll see
that RTTI is an area where QOI is all over the board.
Scott
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Hendrik Schober Guest
|
Posted: Wed Feb 11, 2004 8:40 pm Post subject: Re: RTTI Performance |
|
|
Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote:
| Quote: | [...]
I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.
|
Interesting. I wonder if this couldn't be
optimized (at least for most cases) by
hashing the strings?
Schobi
--
[email]SpamTrap (AT) gmx (DOT) de[/email] is never read
I'm Schobi at suespammers dot org
"Sometimes compilers are so much more reasonable than people."
Scott Meyers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Wed Feb 11, 2004 9:38 pm Post subject: Re: RTTI Performance |
|
|
Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote
| Quote: | On 8 Feb 2004 07:25:59 -0500, Michael Mellor wrote:
My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses
of vtables because they must be unique to each class?
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed
that each use generated calls to strcmp. I asked why MS didn't just
compare addresses of type_info objects (which is essentially the same
as comparing vtbl addresses), and they pointed out that when you have
DLLs, there may be multiple type_info objects, so a simple address
comparison doesn't work. As a result, a dynamic_cast that looks like
this,
dynamic_cast<T*>(pObj)
generates code that basically looks like this:
if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests
Hence, each failed address comparison leads to a call to strcmp.
I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.
|
It depends on how the implementation handles multiple definitions in a
dynamic link. (Presumably, the type_info object has some sort of
specially mangled global name.) Under Posix, if the name is already
present in the root or in a previously dynamically linked module, the
dynamic linker uses it, and not the instance of the name in the dynamic
object being loaded. (This is the default behavior on my system,
anyway. This is actually controled by an option in the dlopen
function.) Thus, RTTI should end up always seeing the same instance of
type_info, and just comparing addresses should be adequate.
There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Early Ehlinger Guest
|
Posted: Thu Feb 12, 2004 1:25 am Post subject: Re: RTTI Performance |
|
|
"Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
| Quote: | I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp.
...
a dynamic_cast that looks like this,
dynamic_cast<T*>(pObj)
generates code that basically looks like this:
if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests
Hence, each failed address comparison leads to a call to strcmp.
I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.
|
It seems to me that MS, or anybody else reading this who might be trying to
make dynamic_cast a smidge faster, could optimize the failed test by having
the RTTI info contain a hash of the mangled name *in addition to* the
mangled name. Then the code ends up looking more like:
// If they're the exact same type_info object:
if ( address of Obj's type_info == address of T's type_info object )
return static_cast<T>(Obj);
// They're different, but not necessarily un-equivalent. Check the name
hash...
if ( ( Obj's-type_info->name_hash != T's-type_info->name_hash )
return NULL;
// They're different, but have the same name hash. Check to see if they're
equivalent.
if ( 0 == strcmp( Obj's-type_info->name , T's-type_info->name ) ) )
return static_cast<T>(Obj);
// recurse
--
Best Regards,
- Early Ehlinger - CEO - www.ResPower.com - 866-737-7697
- 500+ GHz Self-Service Super-Computer from $0.50/GHz*Hr
- (yes, five hundred gigahertz. point-five terahertz.)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Dylan Nicholson Guest
|
Posted: Thu Feb 12, 2004 12:01 pm Post subject: Re: RTTI Performance |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote in message news:<d6652001.0402110205.1f44727e (AT) posting (DOT) google.com>...
| Quote: |
There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.
|
Although of course this isn't true if you link both caller and callee
with the same C runtime dll. I've ended up having to do this, because
I want to be able to pass std::string's between app and dll - and when
they're embedded in structures it's just too messy trying to
circumvent the ref-counted copying.
Personally I think if you really want autonomy and isolation, then
launch a separate process.
Dylan
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Stephen Howe Guest
|
Posted: Fri Feb 13, 2004 1:50 am Post subject: Re: RTTI Performance |
|
|
| Quote: | There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.
Although of course this isn't true if you link both caller and callee
with the same C runtime dll.
|
But this does not alter what James is saying. By doing this there is now
only 1 location where malloc() and free() are located, the C RT DLL and so
the problem is overcome.
Stephen Howe
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Roger Leigh Guest
|
Posted: Fri Feb 13, 2004 11:14 am Post subject: Re: RTTI Performance |
|
|
Scott Meyers <Usenet (AT) aristeia (DOT) com> writes:
| Quote: | I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.
|
AFAICT under GNU/Linux with GCC the vtables and typeinfo data are
implemented as "weak symbols" in the ELF object format, so that the
run-time linker (ld.so) ensures that only one instance is used, even
if instances exist in multiple translation units. This means that
they are guaranteed to be unique, and so pointer comparisons are
presumably perfectly safe.
--
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze@gabi-soft.fr Guest
|
Posted: Fri Feb 13, 2004 3:53 pm Post subject: Re: RTTI Performance |
|
|
"Early Ehlinger" <early (AT) respower (DOT) com> wrote
| Quote: | "Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had
observed that each use generated calls to strcmp.
...
a dynamic_cast that looks like this,
dynamic_cast<T*>(pObj)
generates code that basically looks like this:
if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests
Hence, each failed address comparison leads to a call to strcmp.
I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication)
legitimately complicates matters.
It seems to me that MS, or anybody else reading this who might be
trying to make dynamic_cast a smidge faster, could optimize the failed
test by having the RTTI info contain a hash of the mangled name *in
addition to* the mangled name. Then the code ends up looking more
like:
// If they're the exact same type_info object:
if ( address of Obj's type_info == address of T's type_info object )
return static_cast<T>(Obj);
// They're different, but not necessarily un-equivalent. Check the name
hash...
if ( ( Obj's-type_info->name_hash != T's-type_info->name_hash )
return NULL;
// They're different, but have the same name hash. Check to see if they're
equivalent.
if ( 0 == strcmp( Obj's-type_info->name , T's-type_info->name ) ) )
return static_cast<T>(Obj);
// recurse
|
It depends on the ratio of misses. In the case of a match, you still
have to do the string compare, so comparing the hash value first is a
pessimisation. If you tend to get matches most of the time (which I
suspect is the case), then comparing the hash value might be slower.
What will gain a smidgen is ensuring that the strings representing the
name differ near the start. I have the impression that for class types,
the first 9 or 10 characters are always the same. Prefixing a one byte
hash might speed things up; reordering the information in the decorated
name to put the user defined part up front definitly would. (Note that
the somewhat misguided practice of starting all class names with a C
also slows things down a little bit. I'd be simply amazed if the
difference were ever measurable, however.)
--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Stefan Heinzmann Guest
|
Posted: Fri Feb 13, 2004 3:58 pm Post subject: Re: RTTI Performance |
|
|
Early Ehlinger wrote:
| Quote: | "Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
[...]
It seems to me that MS, or anybody else reading this who might be trying to
make dynamic_cast a smidge faster, could optimize the failed test by having
the RTTI info contain a hash of the mangled name *in addition to* the
mangled name.
|
Whilst that is certainly possible, why is it desirable? In other words,
why does RTTI speed matter to such an extent as to warrant consuming
extra memory? I have found RTTI to be most useful in object I/O, and
there the effect of RTTI efficiency is barely noticeable.
Has anyone got an application where RTTI speed is crucial?
--
Cheers
Stefan
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Srik Guest
|
Posted: Fri Feb 13, 2004 3:59 pm Post subject: Re: RTTI Performance |
|
|
Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote
| Quote: | I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp. I asked why MS didn't just compare
addresses of type_info objects (which is essentially the same as comparing
vtbl addresses), and they pointed out that when you have DLLs, there may be
multiple type_info objects, so a simple address comparison doesn't work.
As a result, a dynamic_cast that looks like this,
dynamic_cast<T*>(pObj)
generates code that basically looks like this:
if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests
Hence, each failed address comparison leads to a call to strcmp.
|
I wonder if this explains a problem I had on VC++ 6.0.
This code deletes a large vector of objects derived from
base class 'Obj'with a virtual destructor.
--
for (vector<Obj*>::iterator a = objVector.begin(); a != objVector.end(); a++)
delete (*a);
--
The segment used to take ages to complete when using DLLs.
I linked with static run-time libraries, and that
cut down execution time from minutes to seconds.
Srikanth
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carl Daniel Guest
|
Posted: Sat Feb 14, 2004 12:08 pm Post subject: Re: RTTI Performance |
|
|
Srik wrote:
| Quote: | I wonder if this explains a problem I had on VC++ 6.0.
This code deletes a large vector of objects derived from
base class 'Obj'with a virtual destructor.
for (vector<Obj*>::iterator a = objVector.begin(); a !=
objVector.end(); a++) delete (*a);
The segment used to take ages to complete when using DLLs.
I linked with static run-time libraries, and that
cut down execution time from minutes to seconds.
|
I wouldn't think so. There's something else at play there - delete through
a virtual destructor doesn't make any use of RTTI.
-cd
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Hendrik Schober Guest
|
Posted: Sat Feb 14, 2004 12:09 pm Post subject: Re: RTTI Performance |
|
|
[email]kanze (AT) gabi-soft (DOT) fr[/email] <kanze (AT) gabi-soft (DOT) fr> wrote:
| Quote: | [...]
What will gain a smidgen is ensuring that the strings representing the
name differ near the start. [...]
|
The compiler in question doesn't seem to be
very good at this. This program (which uses
an extension of the compiler's std lib)
#include <iostream>
#include <typeinfo>
#include <vector>
#include <list>
template< typename T, class A = std::allocator
class list {};
template< typename T >
void test()
{
const std::type_info& ti = typeid(T);
std::cout << ti.name() << " t " << ti.raw_name() << 'n';
}
int main()
{
test< std::vector();
test< std::vector();
test< std::list ();
test< list ();
test< list ();
return 0 ;
}
produces this output:
class std::vector<int,class std::allocator .?AV?$vector@HV?$allocator@H@std@@@std@@
class std::vector<char,class std::allocator .?AV?$vector@DV?$allocator@D@std@@@std@@
class std::list<char,class std::allocator .?AV?$list@DV?$allocator@D@std@@@std@@
class list<char,class std::allocator .?AV?$list@DV?$allocator@D@std@@@@
class list<short,class std::allocator .?AV?$list@FV?$allocator@F@std@@@@
That's not exactly what I'd call optimial.
| Quote: | [...](Note that
the somewhat misguided practice of starting all class names with a C
also slows things down a little bit. I'd be simply amazed if the
difference were ever measurable, however.)
|
Having another letter in the above code
won't make much of a difference.
Schobi
--
[email]SpamTrap (AT) gmx (DOT) de[/email] is never read
I'm Schobi at suespammers dot org
"Sometimes compilers are so much more reasonable than people."
Scott Meyers
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|