 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
young Guest
|
Posted: Tue May 01, 2007 9:11 am Post subject: My Quick memcpy |
|
|
typedef unsigned char ylib_byte_t;
typedef unsigned int ylib_word_t;
enum YOUNG_LIBRARY_MEMORY_FUNCTION_CONSTANT
{
OUTSPREAD = 16,
RESIDUE = OUTSPREAD - 1
};
#define CYCLE_OUTSPREAD( expression ) \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression;
void* ylib_memcopy( void* dst, const void* src, size_t count )
{
ylib_byte_t *d, *s;
ylib_word_t* dstword = (ylib_word_t*)dst;
const ylib_word_t* srcword = (const ylib_word_t*)src;
size_t tmp, word_count = count / sizeof(ylib_word_t);
size_t word_outspread = word_count & ~((size_t)RESIDUE);
for( tmp = 0; tmp < word_outspread; tmp += OUTSPREAD )
{
CYCLE_OUTSPREAD( *dstword++ = *srcword++ )
}
for( ; tmp < word_count; ++tmp )
*dstword++ = *srcword++;
d = (ylib_byte_t*)dstword;
s = (ylib_byte_t*)srcword;
for( tmp *= sizeof(ylib_word_t); tmp < count; ++tmp )
*d++ = *s++;
return dst;
}
void test_memcpy(void)
{
double sec;
clock_t begin, end;
int i, dst[10000];
int src[10000];
begin = clock();
for( i = 0; i < 100000; ++i )
ylib_memcopy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nylib_memcopy time = %f\n", sec );
begin = clock();
for( i = 0; i < 100000; ++i )
memcpy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nmemcpy time = %f\n", sec );
}
--
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
Jonathan Leffler Guest
|
Posted: Mon May 07, 2007 9:12 am Post subject: Re: My Quick memcpy |
|
|
young wrote:
| Quote: | typedef unsigned char ylib_byte_t;
typedef unsigned int ylib_word_t;
enum YOUNG_LIBRARY_MEMORY_FUNCTION_CONSTANT
{
OUTSPREAD = 16,
RESIDUE = OUTSPREAD - 1
};
#define CYCLE_OUTSPREAD( expression ) \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression;
void* ylib_memcopy( void* dst, const void* src, size_t count )
[...]
|
Doesn't it perform rather horribly if your src or dst are not
sufficiently well aligned?
For example, if the src is aligned such that (src % 4) == 1 and (dst %
4) == 3, assuming 4-byte integers. Depending on the architecture, that
could be a crash (misaligned memory) or poor performance (while the CPU
fetches two ints, creates one int from appropriate sub-sections of the
two, and then fetches, assigns and writes to two ints.
What performance gain do you get with copies of smaller quantities?
Is the memcpy() on your compiler an inline assembler function, or a
plain old C function?
On my old (1 GHz G4 Mac, but running MacOS X 10.4.9 and GCC 4.0.1), the
timing I got was:
ylib_memcopy time = 3.840000
memcpy time = 1.570000
ylib_memcopy time = 7.570000
memcpy time = 1.670000
The first test is your test, verbatim; the second test was using the
mal-aligned code:
char dst[10000 * sizeof(int) + 8];
char src[10000 * sizeof(int) + 8];
...
for( i = 0; i < 100000; ++i )
ylib_memcopy( dst + 1, src + 3, sizeof(int) * 10000 );
...
for( i = 0; i < 100000; ++i )
memcpy( dst + 1, src + 3, sizeof(int) * 10000 );
The '+ 8' leaves space for trampling; it copies the same amount of data
as your test.
The system (built-in) memcpy() will almost inevitably out-perform
anything you write in C because it is not written in C, most usually.
--
Jonathan Leffler #include <disclaimer.h>
Email: jleffler (AT) earthlink (DOT) net, jleffler (AT) us (DOT) ibm.com
Guardian of DBD::Informix v2007.0226 -- http://dbi.perl.org/
--
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
Roberto Waltman Guest
|
Posted: Mon May 07, 2007 9:12 am Post subject: Re: My Quick memcpy |
|
|
young <younghuan1980 (AT) gmail (DOT) com> wrote:
| Quote: | typedef unsigned char ylib_byte_t;
typedef unsigned int ylib_word_t;
enum YOUNG_LIBRARY_MEMORY_FUNCTION_CONSTANT
{
OUTSPREAD = 16,
RESIDUE = OUTSPREAD - 1
};
#define CYCLE_OUTSPREAD( expression ) \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression;
void* ylib_memcopy( void* dst, const void* src, size_t count )
{
..
}
|
... assert(count > 16);
Roberto Waltman
[ Please reply to the group,
return address is invalid ]
--
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
Jonas Guest
|
Posted: Mon May 07, 2007 9:12 am Post subject: Re: My Quick memcpy |
|
|
"young" <younghuan1980 (AT) gmail (DOT) com> wrote in message
news:clcm-20070501-0005 (AT) plethora (DOT) net...
| Quote: | typedef unsigned char ylib_byte_t;
typedef unsigned int ylib_word_t;
enum YOUNG_LIBRARY_MEMORY_FUNCTION_CONSTANT
{
OUTSPREAD = 16,
RESIDUE = OUTSPREAD - 1
};
#define CYCLE_OUTSPREAD( expression ) \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression;
void* ylib_memcopy( void* dst, const void* src, size_t count )
{
ylib_byte_t *d, *s;
ylib_word_t* dstword = (ylib_word_t*)dst;
const ylib_word_t* srcword = (const ylib_word_t*)src;
size_t tmp, word_count = count / sizeof(ylib_word_t);
size_t word_outspread = word_count & ~((size_t)RESIDUE);
for( tmp = 0; tmp < word_outspread; tmp += OUTSPREAD )
{
CYCLE_OUTSPREAD( *dstword++ = *srcword++ )
}
for( ; tmp < word_count; ++tmp )
*dstword++ = *srcword++;
d = (ylib_byte_t*)dstword;
s = (ylib_byte_t*)srcword;
for( tmp *= sizeof(ylib_word_t); tmp < count; ++tmp )
*d++ = *s++;
return dst;
}
void test_memcpy(void)
{
double sec;
clock_t begin, end;
int i, dst[10000];
int src[10000];
begin = clock();
for( i = 0; i < 100000; ++i )
ylib_memcopy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nylib_memcopy time = %f\n", sec );
begin = clock();
for( i = 0; i < 100000; ++i )
memcpy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nmemcpy time = %f\n", sec );
}
|
On my system (after including headers necessary to make your code compile):
ylib_memcopy time = 1.328000
memcpy time = 0.672000
In general, trying to improve on a standard library function isn't such a
great idea. You are attempting to create a general solution that improves on
one that is most probably optimized for the system you are compiling for. In
the case of memcpy, optimization may for example include using SIMD
instructions where available.
--
Jonas
--
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
WillerZ Guest
|
Posted: Mon May 07, 2007 9:12 am Post subject: Re: My Quick memcpy |
|
|
If you want to challenge both algorithms, that's a bad test. See below.
young wrote:
| Quote: | void test_memcpy(void)
{
double sec;
clock_t begin, end;
int i, dst[10000];
int src[10000];
begin = clock();
for( i = 0; i < 100000; ++i )
ylib_memcopy( dst, src, sizeof(int) * 10000 );
|
Change this to:
ylib_memcopy( ((char *)&dst[0]) + 1, ((char *)&src[0]) + 1,
sizeof(int) * 10000 - 1 );
| Quote: | end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nylib_memcopy time = %f\n", sec );
begin = clock();
for( i = 0; i < 100000; ++i )
memcpy( dst, src, sizeof(int) * 10000 );
|
And this to:
memcpy( ((char *)&dst[0]) + 1, ((char *)&src[0]) + 1, sizeof(int) *
10000 - 1 );
| Quote: | end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nmemcpy time = %f\n", sec );
}
-- |
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
Barry Schwarz Guest
|
Posted: Mon May 07, 2007 9:12 am Post subject: Re: My Quick memcpy |
|
|
On 01 May 2007 07:38:48 GMT, young <younghuan1980 (AT) gmail (DOT) com> wrote:
| Quote: | typedef unsigned char ylib_byte_t;
typedef unsigned int ylib_word_t;
enum YOUNG_LIBRARY_MEMORY_FUNCTION_CONSTANT
{
OUTSPREAD = 16,
RESIDUE = OUTSPREAD - 1
};
#define CYCLE_OUTSPREAD( expression ) \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression; \
expression; expression; expression; expression;
void* ylib_memcopy( void* dst, const void* src, size_t count )
{
ylib_byte_t *d, *s;
ylib_word_t* dstword = (ylib_word_t*)dst;
|
Have you done something to insure that the address in dst is properly
aligned to point to an int.
| Quote: | const ylib_word_t* srcword = (const ylib_word_t*)src;
|
The cast is unnecessary in both preceding statements.
| Quote: | size_t tmp, word_count = count / sizeof(ylib_word_t);
size_t word_outspread = word_count & ~((size_t)RESIDUE);
for( tmp = 0; tmp < word_outspread; tmp += OUTSPREAD )
{
CYCLE_OUTSPREAD( *dstword++ = *srcword++ )
}
for( ; tmp < word_count; ++tmp )
*dstword++ = *srcword++;
d = (ylib_byte_t*)dstword;
s = (ylib_byte_t*)srcword;
for( tmp *= sizeof(ylib_word_t); tmp < count; ++tmp )
*d++ = *s++;
return dst;
}
void test_memcpy(void)
{
double sec;
clock_t begin, end;
int i, dst[10000];
int src[10000];
|
None of the elements of either array have been assigned values. They
are all indeterminate. Each of the last two for loops in ylib_memcpy
will invoke undefined behavior.
| Quote: |
begin = clock();
for( i = 0; i < 100000; ++i )
ylib_memcopy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
|
One of the casts is superfluous.
| Quote: | printf( "\nylib_memcopy time = %f\n", sec );
|
Why print it as a float instead of a double?
| Quote: |
begin = clock();
for( i = 0; i < 100000; ++i )
memcpy( dst, src, sizeof(int) * 10000 );
end = clock();
sec = (double)(end - begin) / (double)CLOCKS_PER_SEC;
printf( "\nmemcpy time = %f\n", sec );
}
|
Did you have a reason for posting the code, such as a question or
concern?
Remove del for email
--
comp.lang.c.moderated - moderation address: clcm (AT) plethora (DOT) net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|