C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

RTTI Performance
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated)
View previous topic :: View next topic  
Author Message
Michael Mellor
Guest





PostPosted: Sun Feb 08, 2004 12:25 pm    Post subject: RTTI Performance Reply with quote



I have been comparing a self wrote RTTI vs compiler RTTI and I am
finding the compiler generated one is much slower. The code listed
below, it is separated out in many files to try and stop the optimiser
being too clever, was compiled on g++ 3.3.1 and VC++ 7.0 both with full
optimisation.

Cast base to derived object(cast base* to B*):
my_cast g++ 2.237s
my_cast VC++ 1.537s

dynamic_cast g++ 5.699s
dynamic_cast VC++ 9.429s

When the cast isn't the object(cast A* to B*):
my_cast g++ 2.138s
my_cast VC++ 1.969s

dynamic_cast g++ 12.529s
dynamic_cast VC++ 14.589s

My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?

The code below is somewhat untested and so if I have made a mistake
please kindly inform me Smile.

Michael Mellor

--------a.h-------------
struct base {
enum { ID = 0 };
virtual bool am_i ( int id ) const;
};

struct A : base {
enum { ID = 1 };
virtual bool am_i ( int id ) const;
};

struct B : base {
enum { ID = 2 };
virtual bool am_i ( int id ) const;
};

void do_nothing ( B * );

base *get_obj ( int num );

--------a.cc-------------
#include "a.h"

bool base::am_i ( int id ) const {
return ID == id;
}

bool A::am_i ( int id ) const {
return ID == id || base::am_i(id);
}

bool B::am_i ( int id ) const {
return ID == id || base::am_i(id);
}

-------main.cc------------
#include "a.h"

static const int LOOPS = 100000000;

template<typename OBJ>
OBJ *my_cast ( base *b ) {
if ( b->am_i ( OBJ::ID ) )
return static_cast<OBJ*>(b);
return 0;
}

int main ( int argc, char *argv[] ) {
base *b = get_obj ( 2 );
if ( argc != 1 ) {
for ( int a = 0; a < LOOPS; ++a ) {
do_nothing ( dynamic_cast }
} else {
for ( int a = 0; a < LOOPS; ++a ) {
do_nothing ( my_cast }
}
}

------stuff.cc----------
#include "a.h"

base *get_obj ( int num ) {
return num == 1 ? (base*)new A : (base*)new B;
}

void do_nothing ( B * ) {
}

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Back to top
Ivan Vecerina
Guest





PostPosted: Mon Feb 09, 2004 12:49 am    Post subject: Re: RTTI Performance Reply with quote



"Michael Mellor" <news-at- (AT) michaelmellor-dot- (DOT) com> wrote

Hi,
Quote:
I have been comparing a self wrote RTTI vs compiler RTTI and I am
finding the compiler generated one is much slower. The code listed
below, it is separated out in many files to try and stop the optimiser
being too clever, was compiled on g++ 3.3.1 and VC++ 7.0 both with full
optimisation.
....
My question is: What makes RTTI at least twice as slow as the custom
implementation below?
The standard RTTI mechanism defined in the standard needs to do much

more than your specifically tuned implementation. It needs to support
multiple inheritance, dynamic_casts, and other run-time features --
all this within the context of separate compilation.

Also, RTTI has not been designed to be a replacement of virtual
function calls, but to fulfil other kinds of needs (e.g. the
streaming or "flattening" of objects, where i/o is the main bottleneck).

While it is still possible for an implementation to optimize the
specific scenario you describe, it may create overhead or negatively
impact the performance of other features.

Quote:
Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?
Not in a portable way. But this will work on some platforms.



Regards,
Ivan
--
http://ivan.vecerina.com/contact/?subject=NG_POST <- e-mail contact form



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Walter
Guest





PostPosted: Mon Feb 09, 2004 12:53 am    Post subject: Re: RTTI Performance Reply with quote




"Michael Mellor" <news-at- (AT) michaelmellor-dot- (DOT) com> wrote

Quote:
My question is: What makes RTTI at least twice as slow as the custom
implementation below?

An excellent question. The answer is that a simple virtual function call is
very fast. RTTI does not use a virtual function call, however, a library
function is called and types are compared. Because of this, I usually eschew
dynamic_cast and instead write something similar to your example:

struct B;
struct A
{
virtual B* isB() { return (B*)NULL; }
};

struct B : A
{
B* isB() { return this; }
};

void test(A* a)
{
B* b;
b = dynamic_cast<B*>(a); // slower
b = a->isB(); // faster
}

This, of course, requires A to know in advance that B will derive from A,
which is not possible if A is in a general purpose library. But it's smokin'
fast <g>.

Quote:
Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?

Comparing vtable addresses can tell you if two classes are the same, but
will not tell you if one is derived from the other. But since such isn't
explicitly supported in the language, be careful because it is possible for
the compiler to emit multiple vtables for the same class, in which case such
a technique will fail. Comparing the contents of the vtable won't help,
either, since it is unknown at runtime how many entries are in it.

-Walter
www.digitalmars.com free C/C++/D compilers


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Scott Meyers
Guest





PostPosted: Tue Feb 10, 2004 11:02 pm    Post subject: Re: RTTI Performance Reply with quote

On 8 Feb 2004 07:25:59 -0500, Michael Mellor wrote:
Quote:
My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses of
vtables because they must be unique to each class?

I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp. I asked why MS didn't just compare
addresses of type_info objects (which is essentially the same as comparing
vtbl addresses), and they pointed out that when you have DLLs, there may be
multiple type_info objects, so a simple address comparison doesn't work.
As a result, a dynamic_cast that looks like this,

dynamic_cast<T*>(pObj)

generates code that basically looks like this:

if (the address of pObj's type_info object ==
the address of T's type_info object)
Quote:
|
(strcmp(pObj's type's mangled name, T's mangled name)))

// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests

Hence, each failed address comparison leads to a call to strcmp.

I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.

If you read the Performance TR
([url]http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1487.pdf)[/url], you'll see
that RTTI is an area where QOI is all over the board.

Scott

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Hendrik Schober
Guest





PostPosted: Wed Feb 11, 2004 8:40 pm    Post subject: Re: RTTI Performance Reply with quote

Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote:
Quote:
[...]
I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.

Interesting. I wonder if this couldn't be
optimized (at least for most cases) by
hashing the strings?

Quote:
[...]
Scott

Schobi

--
[email]SpamTrap (AT) gmx (DOT) de[/email] is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Wed Feb 11, 2004 9:38 pm    Post subject: Re: RTTI Performance Reply with quote

Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote

Quote:
On 8 Feb 2004 07:25:59 -0500, Michael Mellor wrote:
My question is: What makes RTTI at least twice as slow as the custom
implementation below? Can RTTI be implemented by comparing addresses
of vtables because they must be unique to each class?

I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed
that each use generated calls to strcmp. I asked why MS didn't just
compare addresses of type_info objects (which is essentially the same
as comparing vtbl addresses), and they pointed out that when you have
DLLs, there may be multiple type_info objects, so a simple address
comparison doesn't work. As a result, a dynamic_cast that looks like
this,

dynamic_cast<T*>(pObj)

generates code that basically looks like this:

if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests

Hence, each failed address comparison leads to a call to strcmp.

I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.

It depends on how the implementation handles multiple definitions in a
dynamic link. (Presumably, the type_info object has some sort of
specially mangled global name.) Under Posix, if the name is already
present in the root or in a previously dynamically linked module, the
dynamic linker uses it, and not the instance of the name in the dynamic
object being loaded. (This is the default behavior on my system,
anyway. This is actually controled by an option in the dlopen
function.) Thus, RTTI should end up always seeing the same instance of
type_info, and just comparing addresses should be adequate.

There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Early Ehlinger
Guest





PostPosted: Thu Feb 12, 2004 1:25 am    Post subject: Re: RTTI Performance Reply with quote

"Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
Quote:
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp.
...
a dynamic_cast that looks like this,

dynamic_cast<T*>(pObj)

generates code that basically looks like this:

if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests

Hence, each failed address comparison leads to a call to strcmp.

I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.

It seems to me that MS, or anybody else reading this who might be trying to
make dynamic_cast a smidge faster, could optimize the failed test by having
the RTTI info contain a hash of the mangled name *in addition to* the
mangled name. Then the code ends up looking more like:

// If they're the exact same type_info object:
if ( address of Obj's type_info == address of T's type_info object )
return static_cast<T>(Obj);

// They're different, but not necessarily un-equivalent. Check the name
hash...
if ( ( Obj's-type_info->name_hash != T's-type_info->name_hash )
return NULL;

// They're different, but have the same name hash. Check to see if they're
equivalent.
if ( 0 == strcmp( Obj's-type_info->name , T's-type_info->name ) ) )
return static_cast<T>(Obj);

// recurse

--
Best Regards,
- Early Ehlinger - CEO - www.ResPower.com - 866-737-7697
- 500+ GHz Self-Service Super-Computer from $0.50/GHz*Hr
- (yes, five hundred gigahertz. point-five terahertz.)



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Dylan Nicholson
Guest





PostPosted: Thu Feb 12, 2004 12:01 pm    Post subject: Re: RTTI Performance Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] wrote in message news:<d6652001.0402110205.1f44727e (AT) posting (DOT) google.com>...
Quote:

There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.

Although of course this isn't true if you link both caller and callee
with the same C runtime dll. I've ended up having to do this, because
I want to be able to pass std::string's between app and dll - and when
they're embedded in structures it's just too messy trying to
circumvent the ref-counted copying.
Personally I think if you really want autonomy and isolation, then
launch a separate process.

Dylan

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Stephen Howe
Guest





PostPosted: Fri Feb 13, 2004 1:50 am    Post subject: Re: RTTI Performance Reply with quote

Quote:
There are advantages and disadvantages with both solutions. A classical
problem with Windows DLL's is that memory allocated in one DLL cannot be
freed in another -- each DLL has its own instance of malloc, with its
own separate free space pool. On the other hand, it provides a great
deal more autonomy and isolation for things like plug-ins.

Although of course this isn't true if you link both caller and callee
with the same C runtime dll.

But this does not alter what James is saying. By doing this there is now
only 1 location where malloc() and free() are located, the C RT DLL and so
the problem is overcome.

Stephen Howe



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Roger Leigh
Guest





PostPosted: Fri Feb 13, 2004 11:14 am    Post subject: Re: RTTI Performance Reply with quote

Scott Meyers <Usenet (AT) aristeia (DOT) com> writes:

Quote:
I don't know how other compilers implement dynamic_cast, but I think the
DLL issue (and resulting vtlb/type_info replication) legitimately
complicates matters.

AFAICT under GNU/Linux with GCC the vtables and typeinfo data are
implemented as "weak symbols" in the ELF object format, so that the
run-time linker (ld.so) ensures that only one instance is used, even
if instances exist in multiple translation units. This means that
they are guaranteed to be unique, and so pointer comparisons are
presumably perfectly safe.


--
Roger Leigh

Printing on GNU/Linux? http://gimp-print.sourceforge.net/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
kanze@gabi-soft.fr
Guest





PostPosted: Fri Feb 13, 2004 3:53 pm    Post subject: Re: RTTI Performance Reply with quote

"Early Ehlinger" <early (AT) respower (DOT) com> wrote

Quote:
"Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had
observed that each use generated calls to strcmp.
...
a dynamic_cast that looks like this,

dynamic_cast<T*>(pObj)

generates code that basically looks like this:

if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests

Hence, each failed address comparison leads to a call to strcmp.

I don't know how other compilers implement dynamic_cast, but I think
the DLL issue (and resulting vtlb/type_info replication)
legitimately complicates matters.

It seems to me that MS, or anybody else reading this who might be
trying to make dynamic_cast a smidge faster, could optimize the failed
test by having the RTTI info contain a hash of the mangled name *in
addition to* the mangled name. Then the code ends up looking more
like:

// If they're the exact same type_info object:
if ( address of Obj's type_info == address of T's type_info object )
return static_cast<T>(Obj);

// They're different, but not necessarily un-equivalent. Check the name
hash...
if ( ( Obj's-type_info->name_hash != T's-type_info->name_hash )
return NULL;

// They're different, but have the same name hash. Check to see if they're
equivalent.
if ( 0 == strcmp( Obj's-type_info->name , T's-type_info->name ) ) )
return static_cast<T>(Obj);

// recurse

It depends on the ratio of misses. In the case of a match, you still
have to do the string compare, so comparing the hash value first is a
pessimisation. If you tend to get matches most of the time (which I
suspect is the case), then comparing the hash value might be slower.

What will gain a smidgen is ensuring that the strings representing the
name differ near the start. I have the impression that for class types,
the first 9 or 10 characters are always the same. Prefixing a one byte
hash might speed things up; reordering the information in the decorated
name to put the user defined part up front definitly would. (Note that
the somewhat misguided practice of starting all class names with a C
also slows things down a little bit. I'd be simply amazed if the
difference were ever measurable, however.)

--
James Kanze GABI Software mailto:kanze (AT) gabi-soft (DOT) fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Stefan Heinzmann
Guest





PostPosted: Fri Feb 13, 2004 3:58 pm    Post subject: Re: RTTI Performance Reply with quote

Early Ehlinger wrote:
Quote:
"Scott Meyers" <Usenet (AT) aristeia (DOT) com> wrote:
[...]
It seems to me that MS, or anybody else reading this who might be trying to
make dynamic_cast a smidge faster, could optimize the failed test by having
the RTTI info contain a hash of the mangled name *in addition to* the
mangled name.

Whilst that is certainly possible, why is it desirable? In other words,
why does RTTI speed matter to such an extent as to warrant consuming
extra memory? I have found RTTI to be most useful in object I/O, and
there the effect of RTTI efficiency is barely noticeable.

Has anyone got an application where RTTI speed is crucial?

--
Cheers
Stefan


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Srik
Guest





PostPosted: Fri Feb 13, 2004 3:59 pm    Post subject: Re: RTTI Performance Reply with quote

Scott Meyers <Usenet (AT) aristeia (DOT) com> wrote

Quote:
I recently exchanged email with some people at MS about their
implementation of dynamic_cast, because a client of mine had observed that
each use generated calls to strcmp. I asked why MS didn't just compare
addresses of type_info objects (which is essentially the same as comparing
vtbl addresses), and they pointed out that when you have DLLs, there may be
multiple type_info objects, so a simple address comparison doesn't work.
As a result, a dynamic_cast that looks like this,

dynamic_cast<T*>(pObj)

generates code that basically looks like this:

if (the address of pObj's type_info object ==
the address of T's type_info object)
||
(strcmp(pObj's type's mangled name, T's mangled name)))
// the cast succeeds
else
// recursively work your way up through pObj's base classes
// performing the same tests

Hence, each failed address comparison leads to a call to strcmp.


I wonder if this explains a problem I had on VC++ 6.0.

This code deletes a large vector of objects derived from
base class 'Obj'with a virtual destructor.
--
for (vector<Obj*>::iterator a = objVector.begin(); a != objVector.end(); a++)
delete (*a);
--

The segment used to take ages to complete when using DLLs.
I linked with static run-time libraries, and that
cut down execution time from minutes to seconds.

Srikanth

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Carl Daniel
Guest





PostPosted: Sat Feb 14, 2004 12:08 pm    Post subject: Re: RTTI Performance Reply with quote

Srik wrote:
Quote:
I wonder if this explains a problem I had on VC++ 6.0.

This code deletes a large vector of objects derived from
base class 'Obj'with a virtual destructor.

for (vector<Obj*>::iterator a = objVector.begin(); a !=
objVector.end(); a++) delete (*a);

The segment used to take ages to complete when using DLLs.
I linked with static run-time libraries, and that
cut down execution time from minutes to seconds.

I wouldn't think so. There's something else at play there - delete through
a virtual destructor doesn't make any use of RTTI.

-cd


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Hendrik Schober
Guest





PostPosted: Sat Feb 14, 2004 12:09 pm    Post subject: Re: RTTI Performance Reply with quote

[email]kanze (AT) gabi-soft (DOT) fr[/email] <kanze (AT) gabi-soft (DOT) fr> wrote:
Quote:
[...]
What will gain a smidgen is ensuring that the strings representing the
name differ near the start. [...]

The compiler in question doesn't seem to be
very good at this. This program (which uses
an extension of the compiler's std lib)

#include <iostream>
#include <typeinfo>
#include <vector>
#include <list>

template< typename T, class A = std::allocator
class list {};

template< typename T >
void test()
{
const std::type_info& ti = typeid(T);
std::cout << ti.name() << " t " << ti.raw_name() << 'n';
}

int main()
{
test< std::vector();
test< std::vector();
test< std::list ();
test< list ();
test< list ();
return 0 ;
}

produces this output:

class std::vector<int,class std::allocator .?AV?$vector@HV?$allocator@H@std@@@std@@
class std::vector<char,class std::allocator .?AV?$vector@DV?$allocator@D@std@@@std@@
class std::list<char,class std::allocator .?AV?$list@DV?$allocator@D@std@@@std@@
class list<char,class std::allocator .?AV?$list@DV?$allocator@D@std@@@@
class list<short,class std::allocator .?AV?$list@FV?$allocator@F@std@@@@

That's not exactly what I'd call optimial.

Quote:
[...](Note that
the somewhat misguided practice of starting all class names with a C
also slows things down a little bit. I'd be simply amazed if the
difference were ever measurable, however.)


Having another letter in the above code
won't make much of a difference.

Schobi

--
[email]SpamTrap (AT) gmx (DOT) de[/email] is never read
I'm Schobi at suespammers dot org

"Sometimes compilers are so much more reasonable than people."
Scott Meyers



[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ Language (Moderated) All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.