 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
gkoreman@gmail.com Guest
|
Posted: Sun Nov 20, 2005 5:06 pm Post subject: SHA1 |
|
|
I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
The pseudo-code is on:
http://en.wikipedia.org/wiki/SHA-1
My code is working, but is not giving me the correct hashes.
This is being tested (initially) on a big-endian machine, and only
after I have it working on big-endian was I planning on making it work
on both big and little endian.
#include <stdio.h>
#include <vector>
#include <string>
#include <cmath>
using namespace std;
#define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) )
typedef unsigned int word32;
class SHA1Hash
{
private:
void Process(std::vector<word32>& message)
{
// Pre-processing:
// 1. append a single "1" bit to message
// 2. append "0" bits until message length = 490 = -32
(mod 512)
// 3. append length of message (before pre-processing),
in bits as 32-bit big-endian integer to message
word32 messageLength = message.size() * 32;
// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);
// Step 2
padding = 0;
while( (message.size() % 16) != 15 )
message.push_back(padding);
// Step 3
message.push_back(messageLength);
//
// Actual Hash Code
//
unsigned int chunkIndex = 0;
//Initialize variables:
h0 = 0x67452301;
h1 = 0xEFCDAB89;
h2 = 0x98BADCFE;
h3 = 0x10325476;
h4 = 0xC3D2E1F0;
// break message into 512-bit chunks
while( chunkIndex < message.size() )
{
// break chunk into sixteen 32-bit words w(i),
0 = i = 15
vector
for( unsigned int i = 0; i < 16; i++ )
w[i] = message[chunkIndex + i];
for( unsigned int i = 16; i < 80; i++ )
w[i] = ROTL( 1, (w[i-3] ^ w[i-8] ^
w[i-14] ^ w[i-16]));
word32 a = h0;
word32 b = h1;
word32 c = h2;
word32 d = h3;
word32 e = h4;
for( unsigned int i = 0; i < 80; i++ )
{
word32 f = 0;
word32 k = 0;
if( i < 20 )
{
f = (b & c) | ((~b) & d);
k = 0x5A827999;
}
else if( i < 40 )
{
f = b ^ c ^ d;
k = 0x6ED9EBA1;
}
else if( i < 60 )
{
f = (b & c) | (b & d) | (c &
d);
k = 0x8F1BBCDC;
}
else if( i < 80 )
{
f = b ^ c ^ d;
k = 0xCA62C1D6;
}
word32 temp = ROTL(5, a) + f + e + k +
w[i];
e = d;
d = c;
c = ROTL(30, b);
b = a;
a = temp;
}
// Add this chunk's hash to result so far:
h0 = h0 + a;
h1 = h1 + b;
h2 = h2 + c;
h3 = h3 + d;
h4 = h4 + e;
chunkIndex+=16;
}
}
... a public function that calls process
I have tried it with an empty vector and get
5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce
This is what I should get:
da39a3ee5e6b4b0d3255bfef95601890afd80709
I have looked through this for hours now and can't see anything wrong
with it. Can anyone give me a hand?
btw, this is NOT homework, this is a personal project.
Thanks
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Le Chaud Lapin Guest
|
Posted: Mon Nov 21, 2005 9:51 am Post subject: Re: SHA1 |
|
|
[email]gkoreman (AT) gmail (DOT) com[/email] wrote:
| Quote: | I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
// Pre-processing:
// 1. append a single "1" bit to message
// 2. append "0" bits until message length = 490 = -32
(mod 512)
// 3. append length of message (before pre-processing),
in bits as 32-bit big-endian integer to message
|
You are not allowed to change the size of the 64-bit appendage or any
other part of the alogrithm. The specification calls for padding until
the length is congruent to 448 mod 512.
It looks as if this is the source of your incorrect digest.
| Quote: | From Specification:
"append a single "1" bit to message |
append "0" bits until message length = 448 = -64 (mod 512)
append length of message (before pre-processing), in bits as 64-bit
big-endian integer to message"
| Quote: | word32 messageLength = message.size() * 32;
// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);
|
Are you sure you are on a big-endian machine? I tested your code and it
seems that youre machine is little endian. In any case, again, you
have to be fastidious in watching the spec in this section. In
general, you cannot slap on an entire 64-bit chunk onto the message.
For example, if the message is 440 bits long, you're only allowed to
add 1 byte for padding: the bit-pattern of a 1 followed by 7 zeros.
This will bring you to congruency mod 448, at which point you can tack
on the 64-bit length to get to a full 512 bits.
| Quote: | // Step 3
message.push_back(messageLength);
|
You punter. :)
Technically, you have to be prepared for crunching a digest whose size
is greater than (2^32). Here you're only taking care of lengths up to
2^32.
NOTES:
1. You want to be able to minimize required changes to code if you
suddendly decide to switch to a different hash algorithm, say SHA-256.
2. You want the output of the hashing to be an object itself so that
you can let it construct, assign it, compare it, store it in
containers, toss it around, etc., let it destruct.
3. Loading a sequence of byes into a vector<> just to hash might be
too slow.
4. You want the hash algorithm itself to be fast.
SUGGESTIONS:
I would define the algorithm itself as a namespace. Call it SHA_1:
namespace SHA_1;
Then you can define other hashing algorithms in their own namespaces:
namespace MD5;
namespace Adler32;
Within your SHA_1 namespace, define a Digest and a Hasher. A Digest
will contain your 160-bit output. A Digest should also be able to
yield itself as a std::string (for printing). The Hasher should
maintain a 64-byte state so that it can be used incrementally to hash
disjoint data buffers, maintaining remnants mod 64 bytes.
Overload two hashing functions in Hasher - one function for incremental
hashing, another that does the same thing as the first but also takes
by reference a Digest to collect the final digest and signal to the
Hasher that it should reset itself to prepare for the next stream to be
hashed:
namespace SHA-1
{
class Digest { unsigned char buffer[20];
operator string () const; // ...
bool operator == (const Digest &that) const;
bool operator != (const Digest &that) const {return !(*this ==
that);}
bool operator > (const Digest &that) const;
bool operator < (const Digest &that) const {return (that >
*this);}
bool operator >= (const Digest &that) const {return (*this >
that) || (*this == that);}
bool operator <= (const Digest &that) const {return (*this <
that) || (*this
} ;
class Hasher
{
unsigned char block[64]; // 512 bits crunched at a time.
unsigned long int lower_word_bit_count : 32;
unsigned long int upper_word_bit_count : 32;
unsigned long int h0 : 32;
unsigned long int h1 : 32;
unsigned long int h2 : 32;
unsigned long int h3 : 32;
unsigned long int h4 : 32;
};
Hasher &hash (const void *buffer, unsigned long int length);
Hasher &hash (const void *buffer, unsigned long int length,
Digest &digest);
}
The first hashing function should hash as many 512-bit chunks of the
incremental data streams as it can, retaining remnants of the
penultimate stream (mod 64-bytes) in its block, until the Digest is
asked for by the second hash function, at which point it can perform
one final hash of the ultimate stream, padding as necessary, tacking on
the 64-bit length, and crunching one final time the 64-byte block that
includes the 64-bit length.
Ron Rivest's implementation of MD5 hints at some of these principles,
but they are not readily apparent because the code is written in C.
See source code at end of: http://www.ietf.org/rfc/rfc1321.txt
-Le Chaud Lapin-
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Thomas J. Gritzan Guest
|
Posted: Mon Nov 21, 2005 9:54 am Post subject: Re: SHA1 |
|
|
[email]gkoreman (AT) gmail (DOT) com[/email] schrieb:
| Quote: | I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
The pseudo-code is on:
http://en.wikipedia.org/wiki/SHA-1
My code is working, but is not giving me the correct hashes.
This is being tested (initially) on a big-endian machine, and only
after I have it working on big-endian was I planning on making it work
on both big and little endian.
[...]
// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);
[...] |
You forget a zero. Should be:
word32 padding = 0x80000000; // or: 1 << 31;
Works for me on a little-endian machine (only for zero-length messages).
Thomas
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Carlos Moreno Guest
|
Posted: Mon Nov 21, 2005 9:55 am Post subject: Re: SHA1 |
|
|
[email]gkoreman (AT) gmail (DOT) com[/email] wrote:
| Quote: | I have tried it with an empty vector and get
5bc0ce0b20081573de49bed97b3e936af3f86ce
This is what I should get:
da39a3ee5e6b4b0d3255bfef95601890afd80709
I have looked through this for hours now and can't see anything wrong
with it. Can anyone give me a hand?
|
Two things -- your string is shorter than the expected/required
160 bits. I removed the spaces from your output, and the end is
not aligned with the end of what you should get (I'm looking at
it with non-proportional spacing font). Most likely, this is
because of one of your cout statements that is supposed to output
0X (where X is whatever digit), and it's choosing to output X
alone, omiting the leading 0.
Also, pay attention to endianness -- perhaps after adding the
presumably missing 0, you may notice a pattern of reversed bytes;
the SHA1 description talks about concatenated and expressed as
big-endian -- if you ran this on an x86 processor and did not
pay attention to endianness, you might end up getting the wrong
result.
HTH,
Carlos
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
James Kanze Guest
|
Posted: Mon Nov 21, 2005 10:33 am Post subject: Re: SHA1 |
|
|
[email]gkoreman (AT) gmail (DOT) com[/email] wrote:
I doubt that this is the problem here, but I'd generally prefer
the reference site for such things. The Wiki article you site
gives a link, and the reference site
(http://www.ietf.org/rfc/rfc3174.txt) contains a reference
implementation and a lot more explications concerning the
implementation than the Wiki article.
| Quote: | My code is working, but is not giving me the correct hashes.
|
So which is it? Is it working, or isn't it?
| Quote: | This is being tested (initially) on a big-endian machine, and
only after I have it working on big-endian was I planning on
making it work on both big and little endian.
|
Shouldn't make a difference. (I've no experience with SHA-1,
but my implementation of MD5 works equaly well on both big and
little endian machines, without any conditionals or whatever.)
Since I haven't actually written an SHA-1 myself, I'll probably
miss a number of important points, but several things strike me
immediately.
| Quote: | #include <stdio.h
#include
#include
#include
using namespace std;
#define ROTL(n,X) ( ( ( X ) << n ) | ( ( X ) >> ( 32 - n ) ) )
|
Why not an inline function?
| Quote: | typedef unsigned int word32;
|
A better solution might be to use uint32_t, from <stdint.h>.
It's not standard C++ (yet), but it is standard C, is fairly
widespread, and easy to implement in the specific cases where it
is missing.
| Quote: | class SHA1Hash
{
private:
void Process(std::vector<word32>& message)
|
I'm not sure how you handle this. I would have expected a
std::string, or something similar as parameter. You're counting
on the user having packed his bytes into the word32 correctly.
| Quote: | {
// Pre-processing:
// 1. append a single "1" bit to message
// 2. append "0" bits until message length = 490 = -32
(mod 512)
// 3. append length of message (before pre-processing),
in bits as 32-bit big-endian integer to message
word32 messageLength = message.size() * 32;
|
Isn't there a risk of overflow here? (Not in your test case, of
course, but in the general case.) I use a uint64_t in my MD5
code.
| Quote: | // Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
|
Again, I'm not sure about this. In the MD5 implementation, I
buffer bytes, and at this point, I append a *byte* with 0x80.
This may be a difference between the algorithms, however;
looking at my code, I have the impression that MD5 treats the in
little endian order when creating words. (At any rate, if you
are buffering bytes, appending 0x80 is the correct operation in
either case.)
Also, I avoid modifying the user's buffer. (In fact, I never see
the user's buffer; I work on either a pair of iterators, or
bytes passed in by means of the += operator.) I copy the bytes
into a local buffer until I have a block, then process that
block, and only save the resulting interblock state.
| Quote: | message.push_back(padding);
// Step 2
padding = 0;
while( (message.size() % 16) != 15 )
message.push_back(padding);
|
Similar comments. I insert 0x00 bytes into my local buffer
until the total byte count modulo 512/8 is 448/8.
| Quote: | // Step 3
message.push_back(messageLength);
|
And here, I'm sure there is a mistake. The Wiki page says that
the message length to be appended is a 64 bit unsigned integer.
This means two insertions of the value you calculated as an
int64_t. From my own code (for MD5):
appendWord( length & 0xFFFFFFFF ) ;
appendWord( length >> 32 ) ;
You'd probably have to invert the order -- as I said, looking at
my own code, I think that MD5 is LSB first, and the Wiki page
says that SHA-1 is MSB first. (But I'd want to verify all of
the details in the RFC. Because there may be an issue of bit
order in the bytes as well which has to be addresses -- all of
the hardware I've ever used uses LSB bit order within bytes.)
| Quote: | //
// Actual Hash Code
//
|
[I've cut the code here. Without detailed knowledge of the
algorithm, I can't really analyse it to see whether it
contains errors or not.]
| Quote: |
... a public function that calls process
I have tried it with an empty vector and get
5bc0ce0 b2008157 3de49bed 97b3e936 af3f86ce
This is what I should get:
da39a3ee5e6b4b0d3255bfef95601890afd80709
I have looked through this for hours now and can't see
anything wrong with it. Can anyone give me a hand?
|
For starters, append a 64 bit length, and not a 32 bit length.
And verify all of your byte ordering -- which is something that
I definitely wouldn't leave up to the caller.
| Quote: | btw, this is NOT homework, this is a personal project.
|
It would be a pretty strange homework assignment. Unless the
prof needed it for one of his personal projects, of course:-).
--
James Kanze mailto: [email]james.kanze (AT) free (DOT) fr[/email]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 pl. Pierre Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze Guest
|
Posted: Mon Nov 21, 2005 12:35 pm Post subject: Re: SHA1 |
|
|
Thomas J. Gritzan wrote:
| Quote: | gkoreman (AT) gmail (DOT) com schrieb:
I am trying to implement SHA1 based on the pseudo-code on
Wikipedia. The pseudo-code is on:
http://en.wikipedia.org/wiki/SHA-1
My code is working, but is not giving me the correct hashes.
This is being tested (initially) on a big-endian machine,
and only after I have it working on big-endian was I
planning on making it work on both big and little endian.
[...]
// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);
[...]
You forget a zero. Should be:
word32 padding = 0x80000000; // or: 1 << 31;
Works for me on a little-endian machine (only for zero-length messages).
|
Only for zero-length messages. Of course, his interface only
allows for message lengths which are a multiple of 32 bits, so
this might be OK.
And how does the endian-ness make a difference. I implemented
MD5 on a Sparc (big-endian) -- I've since compiled the code on a
Linux based PC (little-endian), and it works perfectly well. Is
there something fundamentally different between MD5 and SHA-1
which means that SHA-1 is endian dependant? They look very
similar in priciples to me.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
kanze Guest
|
Posted: Mon Nov 21, 2005 5:51 pm Post subject: Re: SHA1 |
|
|
Le Chaud Lapin wrote:
| Quote: | gkoreman (AT) gmail (DOT) com wrote:
I am trying to implement SHA1 based on the pseudo-code on Wikipedia.
// Pre-processing:
// 1. append a single "1" bit to message
// 2. append "0" bits until message length = 490 = -32
(mod 512)
// 3. append length of message (before pre-processing),
in bits as 32-bit big-endian integer to message
You are not allowed to change the size of the 64-bit appendage
or any other part of the alogrithm. The specification calls
for padding until the length is congruent to 448 mod 512. It
looks as if this is the source of your incorrect digest.
From Specification:
"append a single "1" bit to message append "0" bits until
message length = 448 = -64 (mod 512) append length of message
(before pre-processing), in bits as 64-bit big-endian integer
to message"
word32 messageLength = message.size() * 32;
// Step 1
word32 padding = 0x8000000; // 1000 0000 0000 0000
message.push_back(padding);
Are you sure you are on a big-endian machine? I tested your
code and it seems that youre machine is little endian.
|
That shouldn't make a difference, if you've implemnented the
code correctly.
| Quote: | In any case, again, you have to be fastidious in watching the
spec in this section. In general, you cannot slap on an
entire 64-bit chunk onto the message. For example, if the
message is 440 bits long, you're only allowed to add 1 byte
for padding: the bit-pattern of a 1 followed by 7 zeros. This
will bring you to congruency mod 448, at which point you can
tack on the 64-bit length to get to a full 512 bits.
|
His interface forced the user to provide the 32 bit chunks.
Technically, of course, you might have to pad as little as a
single bit. In practice, you can pad using the same multiple of
bits as the user is required to provide (supposing that is a
power of 2). Of course, for most uses, you'll want to support
at least as small as bytes.
| Quote: | // Step 3
message.push_back(messageLength);
You punter. :)
Technically, you have to be prepared for crunching a digest
whose size is greater than (2^32). Here you're only taking
care of lengths up to 2^32.
|
Depends on the application. My own implementation of MD5 is
limited to the SIZE_T_MAX * 8 bits. Perhaps internally, there
are other limits which means that the messages will be small.
The problem here isn't what he wants to support. The problem is
that the algorithm says that you append 64 bits of size, and he
is only appending 32. So something will be wrong somewhere.
| Quote: | NOTES:
1. You want to be able to minimize required changes to code
if you suddendly decide to switch to a different hash
algorithm, say SHA-256.
|
You want to get the algorithm working first:-).
| Quote: | 2. You want the output of the hashing to be an object itself
so that you can let it construct, assign it, compare it, store
it in containers, toss it around, etc., let it destruct.
|
Maybe. In my case, my MD5 calculator is an accumulator. I
suppose that there is some argument for defining a result type,
but in practice, I've almost always read the results using the
<< operator, e.g.:
file << myText
<< std::accumulate( myText.begin(), myText.end(), MD5() ) ;
Internally, the << operator uses a member function:
template< typename OutIter >
void resultsTo( OutIter dest ) const ;
which I made public, more because there wasn't any reason not
too, and I thought it might be useful to someone.
| Quote: | 3. Loading a sequence of byes into a vector<> just to hash
might be too slow.
|
That's why I used the accumulate interface. The first two
parameters to std::accumulate can be any input iterators or
better.
(Because std::accumulate is designed to make things as slow as
possible, I also offer a member function append, which takes two
input iterators -- although the code currently says FwdIter --
and so avoids all of the copying of the object.)
| Quote: | 4. You want the hash algorithm itself to be fast.
|
You want it to be correct first. It isn't, so it's not yet time
to worry about speed.
| Quote: | SUGGESTIONS:
I would define the algorithm itself as a namespace.
|
Why?
After a quick glance at SHA-1, I suspect that you could use some
sort of common machine for the entire family, including SHA-1
and MD-5, with a policy class to customize, much as is standard
practice for CRC's. I'd certainly evaluate this route before
implementing my second digest in the family. (As mentionned
above, I've already implemented MD5.)
| Quote: | Call it SHA_1:
namespace SHA_1;
Then you can define other hashing algorithms in their own namespaces:
namespace MD5;
namespace Adler32;
Within your SHA_1 namespace, define a Digest and a Hasher. A
Digest will contain your 160-bit output. A Digest should also
be able to yield itself as a std::string (for printing). The
Hasher should maintain a 64-byte state so that it can be used
incrementally to hash disjoint data buffers, maintaining
remnants mod 64 bytes.
Overload two hashing functions in Hasher - one function for
incremental hashing, another that does the same thing as the
first but also takes by reference a Digest to collect the
final digest and signal to the Hasher that it should reset
itself to prepare for the next stream to be hashed:
|
This is IMHO the big question with regards to the interface. In
my current implementation, extracting the value would modify the
internal state, so that the accumulator itself could not be
reused. However, all of the functions which one would normally
use to do this work on a copy, and not the original object, so
the orginal object remains unmodified. This copy isn't free, of
course, but the object was designed to be as cheap to copy as
possible -- all data members are arrays of integral types, so
basically, the copy constructor just does a memcpy. And given
the cost of calculating an MD5 digest anyway, I suspect that it
doesn't matter.
For those interested, the code is actually already on the web,
although the cover pages with the links to it aren't complete
yet. If you don't mind browsing raw code, however, you can look
at it at http://gabisoft.free.fr/code/Util/IO/MD5. It won't be
officially there until I've finished the cover pages (with the
disclaimers, etc.), so this is not an official announcement, and
I'm not guaranteeing anything.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
gkoreman@gmail.com Guest
|
Posted: Tue Nov 22, 2005 1:22 am Post subject: Re: SHA1 |
|
|
I know it is shorter, I was just mentally adding 0's when needed. The
digest isn't really going to be printed out.
I am not running this on an x86. I am running it on a big endian
machine to start out, then once I get it working on big-endian, I will
make it work on little-endian.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
gkoreman@gmail.com Guest
|
Posted: Tue Nov 22, 2005 1:23 am Post subject: Re: SHA1 |
|
|
First of all, yes. I am sure this is a big-endian machine.
Also, I am aware that I am severely limiting my message size by only
using 1 word32. This is not a bug, but an early implementation
limitation
I didn't want to introduce any errors when making my own word64, so I
just didn't (yet). I wanted to get the main chunk of the SHA-1 algo
working on small hashes before moving up to large ones.
As for the rest of your suggestions, thanks! I will have to look them
over in more detail, and I will probably modify my design.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
gkoreman@gmail.com Guest
|
Posted: Tue Nov 22, 2005 1:23 am Post subject: Re: SHA1 |
|
|
Is it working?
No, not working. But it is running.
The link at the top looks very useful. Thank you.
The 64bit size issue: An early implementation decision to limit
complexity. This limits the message size, but this was not a bug.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Jakob Bieling Guest
|
Posted: Tue Nov 22, 2005 1:24 am Post subject: Re: SHA1 |
|
|
Le Chaud Lapin <unoriginal_username (AT) yahoo (DOT) com> wrote:
| Quote: | gkoreman (AT) gmail (DOT) com wrote:
// Step 3
message.push_back(messageLength);
Technically, you have to be prepared for crunching a digest whose size
is greater than (2^32). Here you're only taking care of lengths up to
2^32.
|
I might be missing something .. but why 2^32?
regards
--
jb
(reply address in rot13, unscramble first)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
gkoreman@gmail.com Guest
|
Posted: Tue Nov 22, 2005 1:25 am Post subject: Re: SHA1 |
|
|
That fixed it! Thank you! I checked the initialization numbers over and
over and didn't find any typos, but I totally missed that one.
And yes, I have only been testing on zero length messages to start. I
figured that I should get zero length messages working before I try to
go any further.
Thank you very much.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Le Chaud Lapin Guest
|
Posted: Tue Nov 22, 2005 1:28 am Post subject: Re: SHA1 |
|
|
kanze wrote:
| Quote: | Le Chaud Lapin wrote:
In any case, again, you have to be fastidious in watching the
spec in this section. In general, you cannot slap on an
entire 64-bit chunk onto the message. For example, if the
message is 440 bits long, you're only allowed to add 1 byte
for padding: the bit-pattern of a 1 followed by 7 zeros. This
will bring you to congruency mod 448, at which point you can
tack on the 64-bit length to get to a full 512 bits.
His interface forced the user to provide the 32 bit chunks.
|
Not sure if that is the best way to approach this. To view the input
data to the hasher as a stream of bytes is far more general than
forcing the view as vector of 32 bit chunks, and in most cases
eliminates unnecessary copying.
| Quote: | Technically, of course, you might have to pad as little as a
single bit. In practice, you can pad using the same multiple of
bits as the user is required to provide (supposing that is a
power of 2). Of course, for most uses, you'll want to support
at least as small as bytes.
|
This is true, and it also hints at a fact that many of us overlook in
certain kinds of programming (embbeded, communications, etc): when you
send a byte from one machine to another, there is already a guarantee
that the bit order will be preserved for you. Imagine a world where
you had to worry about bit-order too.
| Quote: | Technically, you have to be prepared for crunching a digest
whose size is greater than (2^32). Here you're only taking
care of lengths up to 2^32.
Depends on the application. My own implementation of MD5 is
limited to the SIZE_T_MAX * 8 bits. Perhaps internally, there
are other limits which means that the messages will be small.
|
Hmm...not sure you can call this an implementation of MD5. I would make
a concession that it is ok to view the data as a string of bytes
instead of a string of bits, which, as you pointed out, is what the
spec calls for, for true compliance. But I think tossing out the
upper-half of the 64-bit word for the length and calling it an
implementation of the algorithm is stretching it. This presumes of
course that you are making a generally resuable component, which you
intend to polish off and make available for others to use.
| Quote: | 2. You want the output of the hashing to be an object itself
so that you can let it construct, assign it, compare it, store
it in containers, toss it around, etc., let it destruct.
Maybe. In my case, my MD5 calculator is an accumulator. I
suppose that there is some argument for defining a result type,
but in practice, I've almost always read the results using the
operator, e.g.:
file << myText
std::accumulate( myText.begin(), myText.end(), MD5() ) ;
Internally, the << operator uses a member function:
template< typename OutIter
void resultsTo( OutIter dest ) const ;
which I made public, more because there wasn't any reason not
too, and I thought it might be useful to someone.
|
Where is the digest? Certainly it would be nice to write
map
class that someone else would want to use. I can hardly think of a
more opportune moment to let the object-oriented features of C++ bear
down on the design of a component.
| Quote: | 4. You want the hash algorithm itself to be fast.
You want it to be correct first. It isn't, so it's not yet time
to worry about speed.
|
Well, when I do my designs, I like to take everything into account all
at once, including performance (macro, not micro). I spend most of my
time mulling. Then I fidget to verify propositions, then back off, then
walk around in woods some, then eat a snack, then tinker, then only
when I think I have a visceral feel for what's going on do I begin to
bear down on the implementation. For me the code always comes out
cleaner than it would have if I had taken the incremental approach, and
takes less time to get a finished product (contratry to what others
claim). You're going to have to think about these things at some point
anyway, so you might as well mull it over as much as possible before
you start typing, because the mind is faster than the hand (though both
work better with the help of the other).
| Quote: | I would define the algorithm itself as a namespace.
Why?
|
Well, to give you a real example, lets say you had a massive software
probject that depended on primitives involved in security, including
hashing. One master #include file contained:
namespace Security_Set
{
using MD5::Hasher;
using MD5::Digest;
using namespace RSA;
typedef RC6::Cipher<32,20,16> Symmetric_Cipher;
typedef RC6::Key<16> Symmetric_Key;
}
Now let's say two years later, someone found a flaw in your chosen
Hasher! Good grief!! No worries, you could simply modify your
namespace:
namespace Security_Set
{
using SHA_256::Hasher;
using SHA_256::Digest;
using namespace RSA;
typedef RC6::Cipher<32,20,16> Symmetric_Cipher;
typedef RC6::Key<16> Symmetric_Key;
}
If you have been using Hasher and Digest objects throughout your
system, all you would have to do is recompile, and everything should
work, data stored on disk notwithstanding. As a matter of fact, this
is precisely what I intend to do, selecting from multiple Hashers,
multiple symmetric cipher systems, and multiple asymmetric cipher
systems, and multiple nonce sizes (128-bit versus 192-bit versus
256-bit) to test performances, and also verify that it is possible to
do a simple recompile of a very large system and have it work with the
new cipher primitives without incident. I cannot express in words how
convenient this is. With this technique, if someone were to find a
flaw in Rijndael 2 days before we shipped our product, we could
actually make the change I just mentioned, recompile, and ship using an
alternative symmetric cipher. The marketing people would have to
update their collateral.
| Quote: | After a quick glance at SHA-1, I suspect that you could use some
sort of common machine for the entire family, including SHA-1
and MD-5, with a policy class to customize, much as is standard
practice for CRC's. I'd certainly evaluate this route before
implementing my second digest in the family. (As mentionned
above, I've already implemented MD5.)
|
Yikes. This is good example, where the whole resusability hunt is too
agressive IMO. Hashers are very well defined. You know exactly what
they are, the sizes of their digests, the expectations on input. They
are small in terms of the code size. Trying to force them to be common
just so you can say, "it took me a week but I finally tweaked these
things enough to use common code" is a bit much, don't you think?
| Quote: | This is IMHO the big question with regards to the interface. In
my current implementation, extracting the value would modify the
internal state, so that the accumulator itself could not be
reused. [snipped]
|
This is the fun part of software design - imagining all the situations
in which the primitive might be used, and then designing it so that it
will feel good for the user once the primitive is wielded in the
construction of a system. Note that "feel good" comes from eliminating
as much uncertainty surrounding the primitive as possible. Said in a
different way, "feel good" comes from a reduction on requisite mental
burden.
1. Hash.
2. Hash.
3. Hash.
4. Hash.
5. Hash, but get the digest and reset the hasher.
This feels right, and it will work in almost any context, including
hashing a sequences of 32-bit words stored in a vector<>. The
programmer does not have to worry about inefficiencies from copying.
| Quote: | However, all of the functions which one would normally
use to do this work on a copy, and not the original object, so
the orginal object remains unmodified. This copy isn't free, of
course, but the object was designed to be as cheap to copy as
possible -- all data members are arrays of integral types, so
basically, the copy constructor just does a memcpy. And given
the cost of calculating an MD5 digest anyway, I suspect that it
doesn't matter.
|
Hmm..I still prefer the idea of letting the programmer supply a const
void * to the data, its length (unsigned of course), then doing a hash,
and asking for the digest if it is time to get the digest, and if not,
letting the hasher maintain internal state until such digest is wanted.
Also, if you allow explicit contrsuction of the Digest from a
std::string, and allow it to yield itself as a std::string, you've
pretty much finsihed off the Digest as a class (unless you are doing
crypto, in which case allowing the Digest to yield itself as a big
integer can be very nice.)
-Le Chaud Lapin
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Thomas J. Gritzan Guest
|
Posted: Tue Nov 22, 2005 1:44 am Post subject: Re: SHA1 |
|
|
kanze schrieb:
| Quote: | Thomas J. Gritzan wrote:
Works for me on a little-endian machine (only for zero-length messages).
Only for zero-length messages. Of course, his interface only
allows for message lengths which are a multiple of 32 bits, so
this might be OK.
And how does the endian-ness make a difference. I implemented
MD5 on a Sparc (big-endian) -- I've since compiled the code on a
Linux based PC (little-endian), and it works perfectly well. Is
there something fundamentally different between MD5 and SHA-1
which means that SHA-1 is endian dependant? They look very
similar in priciples to me.
|
I changed the interface, adding an length-parameter to the function. Now
it works for the text examples on wikipedia as well.
The endian-ness makes no difference except for the order of the
high-part and the low-part of the 64-bit "length of the message". Since
the OP did it as 32-bit, he bypasses this problem.
So the code works without change on little & big endian machines.
Thomas
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
Le Chaud Lapin Guest
|
Posted: Tue Nov 22, 2005 2:10 am Post subject: Re: SHA1 |
|
|
kanze wrote:
| Quote: | And how does the endian-ness make a difference. I implemented
MD5 on a Sparc (big-endian) -- I've since compiled the code on a
Linux based PC (little-endian), and it works perfectly well. Is
there something fundamentally different between MD5 and SHA-1
which means that SHA-1 is endian dependant? They look very
similar in priciples to me.
|
It definitely makes a difference, but first, let me rant:
Big-endian is silly, IMO. It was created as a substitute for
little-endian to help those who, for whatever reason, find
little-endian confusing when it's printed out. Ironically, it doesn't
help, because it is precisely these people are most likely to get
confused no matter what, so what you end up with is two models, one for
people who not confused in the least but have to acknowledge the
nuisance that is big-endian, and one for the people who are still
confused and upset that big-endian didn't rid them of the confusion:
http://groups.google.com/group/comp.protocols.tcp-ip/browse_frm/thread/3f89dd1fb5a3fa66
Getting of my rant box, if the input data is 0x01234567, that's 4
bytes. In a buffer, begining at offset 0, either the 0x01 will be
there or the 0x67 will be there.
When the programmer goes to form a word from this buffer to be used in
the plethora of Boolean operations of the hash algorithm, s/he will
have to choose how to extract the bytes from this stream to form the
word. Note that implicit in all of this is that when the stream of
bytes move from one machine to another, the bit order is preserved for
individual bytes, and the byte order is preserved for a stream. So if
0x01 is the byte at offset 0 (big-endian), it will be the byte at
offset 0 when it is sent as a sequence of bytes. But if a
little-endian receiving machine gets these 4-bytes, and puts in them in
increasing memory locations starting at 0, and then extracts the 4
bytes as a word, the word will become 0x76543210.
So there are two subtleties here. The first is that bit-order is
guranteed no matter what from machine to machine. We take that for
granted, because it is. The second is that no one ever said that we
were allowed to treat the input stream as a sequence of words. The
most that we are allowed is bytes. So during both extraction and
implantation of words, we have to be careful about byte-sequences.
-Le Chaud Lapin-
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|