 |
C++Talk.NET C++ language newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Guest
|
Posted: Tue Jul 11, 2006 7:25 am Post subject: transforming array of unsigned chars or BYTEs to an int usin |
|
|
I am trying to write some glue code to parse 4 raw bytes into an
integer value. This trick works for reading from an array of register
values, but for some reason I cannot get it to work with an array of
bytes or unsigned chars. I have tried both types of BYTE[] and
unsigned char[]. The Value always ends up as the same number
regardless of the byte array passed in.
void ReadByteArrayIntoInteger(BYTE data[])
{
// Here the data shows up as expected.
for (int i=0; i<4; i++)
{
iprintf("Data: %i\r\n", (int) *data);
data++;
}
int* pValue = (int *) data;
int Value = * pValue;
// Value is always the same regardless of varying contents in data
iprintf("Got %i\r\n", Value);
}
Any insight is much appreciated! Many thanks in advance...
--Alexandra
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Wed Jul 12, 2006 1:41 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
alexx.stehman (AT) gmail (DOT) com wrote:
| Quote: | I am trying to write some glue code to parse 4 raw bytes into
an integer value.
|
Which isn't enough specification to work on. A single raw byte
can easily be interpreted as an integer value (provided we agree
to define a raw byte as the arbitrary contents of an unsigned
char). There are any number of ways to interpret four raw bytes
as a single integer value; without knowing what representation
is actually being used, you can't start writing the code.
| Quote: | This trick works for reading from an array of register
values,
|
What's an "array of register values"?
| Quote: | but for some reason I cannot get it to work with an array of
bytes or unsigned chars. I have tried both types of BYTE[] and
unsigned char[].
|
What is type BYTE?
| Quote: | The Value always ends up as the same number
regardless of the byte array passed in.
void ReadByteArrayIntoInteger(BYTE data[])
{
// Here the data shows up as expected.
for (int i=0; i<4; i++)
{
iprintf("Data: %i\r\n", (int) *data);
data++;
}
int* pValue = (int *) data;
int Value = * pValue;
|
This is, of course, undefined behavior. Still, it should give
something dependent on data on most machines. At least if it
doesn't core dump. (Depending on the value passed as data, it
sometimes core dumps on my machine.)
| Quote: | // Value is always the same regardless of varying contents in data
iprintf("Got %i\r\n", Value);
}
Any insight is much appreciated! Many thanks in advance...
|
Well, I can't really explain your results; I would expect either
a core dump, or some random value which did vary according to
data. (Supposing, of course, that the function iprintf
interprets things more or less like printf. When posting code,
it would help if 1) you'd use types and functions that are
generally known, and 2) you'd post a complete example---a
priori, you're problem is somehow related to the way you pass
the parameter to this function, but we cannot really know,
because we don't see it.)
Anyway, the usual way of doing this, supposing you are on a 32
bit 2's complement machine, and that the bytes correspond to the
four byte binary integer representation used in Internet
protocols, would be something like:
int
asInt( unsigned char const* source )
{
return source[ 0 ] << 24
| source[ 1 ] << 16
| source[ 2 ] << 8
| source[ 3 ] ;
}
(To be 100% portable, you'd have to use something like:
long
asInt( unsigned char const* source )
{
unsigned long tmp
= static_cast< unsigned long >( source[ 0 ] ) << 24
| static_cast< unsigned long >( source[ 1 ] ) << 16
| static_cast< unsigned long >( source[ 2 ] ) << 8
| static_cast< unsigned long >( source[ 1 ] ) ;
return tmp >= 0x7FFFFFFFUL
? static_cast< long >( tmp - 0x7FFFFFFFUL ) - 1
: static_cast< long >( tmp ) ;
}
..)
Of course, this supposes a particular representation of int,
that used in the Internet protocols.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Crosbie Fitch Guest
|
Posted: Wed Jul 12, 2006 1:42 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
<alexx.stehman (AT) gmail (DOT) com> wrote in message
| Quote: | iprintf("Data: %i\r\n", (int) *data);
data++;
}
Any insight is much appreciated! Many thanks in advance...
|
Seems like you're incrementing 'data' there, so by the time you look at
where it's pointing, it's pointing at something that doesn't change.
Try using data[i] to read bytes.
Others will be introducing you to the world of endianness...
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Ian Wakeling Guest
|
Posted: Wed Jul 12, 2006 1:43 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
alexx.stehman (AT) gmail (DOT) com wrote:
| Quote: | I am trying to write some glue code to parse 4 raw bytes into an
integer value. This trick works for reading from an array of register
values, but for some reason I cannot get it to work with an array of
bytes or unsigned chars. I have tried both types of BYTE[] and
unsigned char[]. The Value always ends up as the same number
regardless of the byte array passed in.
void ReadByteArrayIntoInteger(BYTE data[])
{
// Here the data shows up as expected.
for (int i=0; i<4; i++)
{
iprintf("Data: %i\r\n", (int) *data);
data++;
}
int* pValue = (int *) data;
int Value = * pValue;
// Value is always the same regardless of varying contents in data
iprintf("Got %i\r\n", Value);
}
Any insight is much appreciated! Many thanks in advance...
--Alexandra
|
In your loop, you are incrementing data, so by the time you assign to
pValue, data is pointing to random rubbish.
I suspect there's another problem lurking here, although there's not enough
context to be sure. What order is data in? Are you sure it will always be
the same order as the host endianness?
On an x86, for example, the array { 0x00, 0x01, 0x02, 0x03 } will yield the
integer 0x03020100, whereas on a sparc, that same array will yield the
integer 0x00010203.
There's another assumption as well: what makes you think that either an
integer or a register are necessarily 4 bytes?
Maybe portability doesn't matter for the code you're writing, but you ought
to state your assumptions clearly, if only for the sake of the poor
maintenance programmer who comes after you... after all, it might be you in
six months time when you've forgotten all about it :-)
Ian
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Ulrich Eckhardt Guest
|
Posted: Wed Jul 12, 2006 1:46 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
alexx.stehman (AT) gmail (DOT) com wrote:
| Quote: | I am trying to write some glue code to parse 4 raw bytes into an
integer value. This trick works for reading from an array of register
values, but for some reason I cannot get it to work with an array of
bytes or unsigned chars. I have tried both types of BYTE[] and
unsigned char[]. The Value always ends up as the same number
regardless of the byte array passed in.
void ReadByteArrayIntoInteger(BYTE data[])
{
// Here the data shows up as expected.
for (int i=0; i<4; i++)
{
iprintf("Data: %i\r\n", (int) *data);
data++;
^^^^^^
}
int* pValue = (int *) data;
^^^^ |
Don't you want to keep data constant?
Further, the code
- is not endian-safe.
- is not const-correct.
- invalidly assumes an int has size 4.
- should use sized integers (e.g. UINT32 or uint32_t).
Lastly, 'BYTE data[]' is a pointer, I personally find the syntax is a
dangerous way, as it doesn't pass a copy of the array but instead you can
modify the original through it.
Uli
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Carl Barron Guest
|
Posted: Wed Jul 12, 2006 1:47 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
In article <1152561650.939336.146400 (AT) h48g2000cwc (DOT) googlegroups.com>,
<alexx.stehman (AT) gmail (DOT) com> wrote:
| Quote: | I am trying to write some glue code to parse 4 raw bytes into an
integer value. This trick works for reading from an array of register
values, but for some reason I cannot get it to work with an array of
bytes or unsigned chars. I have tried both types of BYTE[] and
unsigned char[]. The Value always ends up as the same number
regardless of the byte array passed in.
void ReadByteArrayIntoInteger(BYTE data[])
{
// Here the data shows up as expected.
for (int i=0; i<4; i++)
{
iprintf("Data: %i\r\n", (int) *data);
data++;
}
int* pValue = (int *) data;
int Value = * pValue;
// Value is always the same regardless of varying contents in data
iprintf("Got %i\r\n", Value);
}
Any insight is much appreciated! Many thanks in advance...
--Alexandra
lots of problems your code changes data to point beyond the intial 4 |
bytes[chars] pValue points to memory just beyond the initial 4 bytes,
solution int *pValue = (int *)data; before your loop.
further endianness is not garuanteed by the language, neither is the
garaunntee that the cast to an int * will succeed since ints often have
more restricted access requirements than chars. The order of the bytes
in the representation of an int is implementation defined. It should
work in that it produces results, and Value will change with different
data. whether the results are as expected depends on the
implementation.
a more C++ solution that still has implementation dependent results is:
void read_bytes(char data[])
{
union
{
char c[4];
int i;
};
for(int i=0;i!=4;++i)
c[i] = data[i];
int *p = reinterpret_cast<int *>(c);
// now we have the four bytes properly aligned
// and a pointer to an int pointing to them legally.
// ... still depends on byte ordering and sizeof(int).
// at least now it won't crash and burn, but might
// be incorrect...
}
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Thu Jul 13, 2006 3:36 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
Carl Barron wrote:
| Quote: | In article <1152561650.939336.146400 (AT) h48g2000cwc (DOT) googlegroups.com>,
alexx.stehman (AT) gmail (DOT) com> wrote:
|
[...]
| Quote: | a more C++ solution that still has implementation dependent results is:
void read_bytes(char data[])
{
union
{
char c[4];
int i;
};
for(int i=0;i!=4;++i)
c[i] = data[i];
int *p = reinterpret_cast<int *>(c);
// now we have the four bytes properly aligned
// and a pointer to an int pointing to them legally.
// ... still depends on byte ordering and sizeof(int).
// at least now it won't crash and burn, but might
// be incorrect...
|
Or cause a core dump, or...
According to the language rules, it is illegal to access data in
a union other than the last element written. In practice, this
will usually work (modulo the size of int and byte ordering
issues you mention), but the standard explicitly allows ints to
have trapping representations, and for an implementation to
somehow track assignments to a union, and trap if a read isn't
to the last element assigned. (This last possibility is more
theoretical than real, I think. The standard may formally allow
it, but given that assignments and reading can take place
through pointers, I don't think it is practically
implementable.)
The classical way of handling this is to manipulate the values
of the raw data (normally unsigned char), in order to construct
the value of the target type, according to the definition of how
the data is laid out. Typically, this involves shifting and
or'ing, but I suppose that for particular formats, other
operations can be involved.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Earl Purple Guest
|
Posted: Thu Jul 13, 2006 3:39 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
Carl Barron wrote:
| Quote: | a more C++ solution that still has implementation dependent results is:
void read_bytes(char data[])
{
union
{
char c[4];
int i;
};
for(int i=0;i!=4;++i)
c[i] = data[i];
int *p = reinterpret_cast<int *>(c);
// now we have the four bytes properly aligned
// and a pointer to an int pointing to them legally.
// ... still depends on byte ordering and sizeof(int).
// at least now it won't crash and burn, but might
// be incorrect...
}
It might crash and burn if you try reading the 'i' value from the union |
because that's techically undefined behaviour.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
kanze Guest
|
Posted: Fri Jul 14, 2006 2:10 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
Frederick Gotham wrote:
| Quote: | posted:
I am trying to write some glue code to parse 4 raw bytes
into an integer value.
Caveat number 1: Watch out for padding within integer types.
|
In theory, yes. In practice, this is probably the least likely
problem you will encounter. The most likely is byte order and
size---both vary a lot over everyday machines, and the second
most likely is the actual representation; at least one machine
still being sold uses 36 bit 1's complement integers, where as
the only machine I knew which had padding when out of production
some years ago. (It also used signed magnitude, and 48 bit
ints, which meant that you also had to solve those problems as
well.)
In practice, you really don't care about the internal format of
an int, as long as it is large enough to contain all of the
values you're interested in. The format specification of the
input data should indicate how each byte (or even each bit, in
extreme cases) affects the value; your program manipulates the
value (not the bit image) of the internal int to produce the
correct final value.
| Quote: | If I were to do what you are doing, I'd probably write the
code something like:
#include <iostream
#include <limits
#include <climits
#define SomeKindOfCompileTimeAssert(expr) typedef char Compass[(expr)?4:-4]
unsigned Amalg(unsigned char const * const p)
{
/* Firstly, ensure no padding: */
SomeKindOfCompileTimeAssert(
CHAR_BIT * sizeof(unsigned)
== std::numeric_limits<unsigned>::digits
);
return reinterpret_cast<unsigned const&>(*p);
|
I'm not sure I understand this line; it looks like a recepe for
a core dump on my machine. According to the standard, this is
the equivalent of:
return *reinterpret_cast< unsigned const* >( p ) ;
(which seems the clearer and more natural way of writing it to
me). Dereferencing the result of the reinterpret_cast, however,
is undefined behavior, and in fact, doesn't work on most of the
architectures I've used (including the Sun Sparc on which I work
today): with the exception of Intel, all of the architectures
I've used have alignment restrictions.
In addition, it seems to totally ignore any specification as to
the format of the four bytes. Admittedly, the original poster
didn't provide any such format specification (and a simple
"return 0" would also have met all of the requirements he
specified), but it's hard to imagine any useful application of
this where there wasn't some format specification.
| Quote: | }
int main()
{
unsigned char array[sizeof(unsigned)] = { 1, 2, 3, 4 };
/* Assuming 4 bytes per int */
unsigned i = Amalg(array);
std::cout << i;
}
If Endianness is an issue, then things get a little muddier
(I've actually got half-written code somewhere on my harddisk
for doing this).
|
If you don't know the format, you can't convert, that's for
sure. But endianness is only part of the format.
--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Frederick Gotham Guest
|
Posted: Sat Jul 15, 2006 2:01 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
kanze posted:
| Quote: | Frederick Gotham wrote:
posted:
I am trying to write some glue code to parse 4 raw bytes
into an integer value.
Caveat number 1: Watch out for padding within integer types.
In theory, yes. In practice, this is probably the least likely
problem you will encounter. The most likely is byte order and
size---both vary a lot over everyday machines, and the second
most likely is the actual representation; at least one machine
still being sold uses 36 bit 1's complement integers, where as
the only machine I knew which had padding when out of production
some years ago. (It also used signed magnitude, and 48 bit
ints, which meant that you also had to solve those problems as
well.)
In practice, you really don't care about the internal format of
an int, as long as it is large enough to contain all of the
values you're interested in. The format specification of the
input data should indicate how each byte (or even each bit, in
extreme cases) affects the value; your program manipulates the
value (not the bit image) of the internal int to produce the
correct final value.
If I were to do what you are doing, I'd probably write the
code something like:
#include <iostream
#include <limits
#include <climits
#define SomeKindOfCompileTimeAssert(expr) typedef char
Compass[(expr)?4:-4]
unsigned Amalg(unsigned char const * const p)
{
/* Firstly, ensure no padding: */
SomeKindOfCompileTimeAssert(
CHAR_BIT * sizeof(unsigned)
== std::numeric_limits<unsigned>::digits
);
return reinterpret_cast<unsigned const&>(*p);
I'm not sure I understand this line; it looks like a recepe for
a core dump on my machine. According to the standard, this is
the equivalent of:
return *reinterpret_cast< unsigned const* >( p ) ;
(which seems the clearer and more natural way of writing it to
me). Dereferencing the result of the reinterpret_cast, however,
is undefined behavior, and in fact, doesn't work on most of the
architectures I've used (including the Sun Sparc on which I work
today): with the exception of Intel, all of the architectures
I've used have alignment restrictions.
|
You're correct that my code may violate alignment restrictions if "*p"
isn't suitably aligned. This could be overcome by:
unsigned Amalg(unsigned char const * const p)
{
/* Firstly, ensure no padding: */
SomeKindOfCompileTimeAssert(
CHAR_BIT * sizeof(unsigned)
== std::numeric_limits<unsigned>::digits
);
unsigned aligned_buffer;
memcpy(&aligned_buffer,p,sizeof aligned_buffer);
return aligned_buffer;
}
--
Frederick Gotham
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
Frederick Gotham Guest
|
Posted: Sun Jul 16, 2006 2:35 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
James Kanze posted:
Okay, how about something like:
#include <climits>
unsigned Amalg(unsigned char const *p)
{
unsigned char const * const p_over = p + sizeof(unsigned);
unsigned value = *p++;
unsigned byte_number = 1;
while(p != p_over) value |= *p++ << CHAR_BIT * byte_number++;
return value;
}
#include <iostream>
int main()
{
unsigned char array[4] = {166,166,166,166};
std::cout << Amalg(array);
}
--
Frederick Gotham
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
James Kanze Guest
|
Posted: Sun Jul 16, 2006 2:37 am Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
James Kanze wrote:
| Quote: | Frederick Gotham wrote:
kanze posted:
Frederick Gotham wrote:
return reinterpret_cast<unsigned const&>(*p);
I'm not sure I understand this line; it looks like a recepe for
a core dump on my machine. According to the standard, this is
the equivalent of:
return *reinterpret_cast< unsigned const* >( p ) ;
|
Just another point here: not too long ago, there was a
discussion about something similar in one of the newsgroups I
follow (either here or de.comp.lang.iso-c++, I forget which),
where it was pointed out that under certain conditions, this
type of cast (written either way) will cause problems with the
g++ optimizer. Basically, since the standard declares that
accessing a char[] (or unsigned char[]) as an unsigned is
undefined behavior, the compiler assumes that you don't do it,
and simply doesn't take any accesses through an unsigned* into
account when doing live analysis. The result is that under
certain circumstances, it will detect that you never read the
bytes in the array, and suppress the assignment to them.
My personal opinion is that this is not a good policy on the
part of the compiler---IMHO, as soon as the compiler sees a
reinterpret_cast, it should ignore all type information of this
sort in optimizing. But technically, what they are doing IS
legal.
--
James Kanze kanze.james (AT) neuf (DOT) fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
James Kanze Guest
|
Posted: Sun Jul 16, 2006 5:43 pm Post subject: Re: transforming array of unsigned chars or BYTEs to an int |
|
|
Frederick Gotham wrote:
| Quote: | James Kanze posted:
Okay, how about something like:
#include <climits
unsigned Amalg(unsigned char const *p)
{
unsigned char const * const p_over = p + sizeof(unsigned);
unsigned value = *p++;
unsigned byte_number = 1;
while(p != p_over) value |= *p++ << CHAR_BIT * byte_number++;
return value;
}
#include <iostream
|
Just a nit, but the standard says you also need:
#include <ostream>
| Quote: | int main()
{
unsigned char array[4] = {166,166,166,166};
std::cout << Amalg(array);
}
|
Your program has perfectly defined behavior; it's actually very
close to the solution I posted originally.
The real question, of course, is what are the requirements? Why
are you trying to convert 4 (or N) bytes of raw memory into an
integer like this? And that comes down to where the bytes come
from.
In my code, I used a lot of magic numbers: 4 bytes in the
integer, and 8 bits in the char. In the past, a lot of people
have jumped on me for that, and suggested CHAR_BIT and
sizeof(int). But in this case, that's probably wrong. If you
have an array of raw bytes, it is almost certain that you
obtained them from an external source (e.g. an Internet
connection). In which case, you don't want your shifting based
on the number of bits in a character on your machine, you want
it based on the number of bits in a byte in the protocol you are
reading.
Similar issues hold for things like byte order, and even the
representation of negative values.
One should, arguably, use some symbolic constant like
InternetProtocol::bitsPerByte. But frankly, this is something
stable enough that I'm willing to accept the literal 8, and
count on the context of what I'm doing for people to recognize
what it stands for. The day the internet switches to 9 bit
bytes, so much software will have to be rewritten, that it
really doesn't make a difference. (Similarly, I don't bother
with a symbol hoursPerDay, but just write 24. And use names for
the variables which make it clear what I'm dealing with.)
--
James Kanze kanze.james (AT) neuf (DOT) fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ] |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|