C++Talk.NET Forum Index C++Talk.NET
C++ language newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

single double precision question ....... more

 
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++)
View previous topic :: View next topic  
Author Message
ma740988@gmail.com
Guest





PostPosted: Fri Sep 30, 2005 1:44 am    Post subject: single double precision question ....... more Reply with quote




I've got an unpacker that unpacks a 32 bit word into 3-10 bits samples.
Bits 0 and 1 are dont cares. For the purposes of perfoming an FFT and
an inverse FFT, I cast the 10 bit values into doubles. I'm told:

"floats and doubles have bits for mantissa and exponent. (I knew that).
If you are subtract bits that partly belong to mantissa and partly
belong to exponent, that hardly makes sense. Anyway, looking on bits
only, a float has 32 bits and doesn't behave different than a integer."

My advisor went on to say:
If I unpacked the samples into an - say an array of floats, it's quite
possible that I could get a left side value whose bits were
*very* different to th eright-hand integer *and* most likely wouldn't
match exactly to the integer.

So now:
float[0] = 0x2AA;
or better:
float f = (float)0x2AA;
cout << f << endl;

would show perhaps
682.00000123

Makes absolutely no sense to me. All machines I ran the source on
produced 682.00000000000 (depending of course on your precision).

Am I being mislead here?

Back to top
David White
Guest





PostPosted: Fri Sep 30, 2005 2:36 am    Post subject: Re: single double precision question ....... more Reply with quote



[email]ma740988 (AT) gmail (DOT) com[/email] wrote:
Quote:
I've got an unpacker that unpacks a 32 bit word into 3-10 bits
samples. Bits 0 and 1 are dont cares. For the purposes of perfoming
an FFT and an inverse FFT, I cast the 10 bit values into doubles.
I'm told:

"floats and doubles have bits for mantissa and exponent. (I knew
that). If you are subtract bits that partly belong to mantissa and
partly belong to exponent, that hardly makes sense. Anyway, looking
on bits only, a float has 32 bits and doesn't behave different than a
integer."

My advisor went on to say:
If I unpacked the samples into an - say an array of floats, it's quite
possible that I could get a left side value whose bits were
*very* different to th eright-hand integer *and* most likely wouldn't
match exactly to the integer.

I don't know what you mean here by "left side" and "right" side.

Quote:

So now:
float[0] = 0x2AA;
or better:
float f = (float)0x2AA;
cout << f << endl;

would show perhaps
682.00000123

Makes absolutely no sense to me. All machines I ran the source on
produced 682.00000000000 (depending of course on your precision).

All you are doing is a standard conversion from 0x2AA (682) to a float.
Floats are capable of representing integer values exactly, and that's what's
happening. There's no reason to expect a slight error.

Quote:
Am I being mislead here?

I don't know because I'm not sure what you are getting at. If you are
talking about reinterpreting the bit pattern of an integer as a float, then
you could end up with any nonsense value, but if you are just doing a
standard conversion, in which the compiler takes care of correctly mapping
the integer bit pattern to the float bit pattern of the equivalant value,
then you got the expected result.

DW



Back to top
Jack Klein
Guest





PostPosted: Fri Sep 30, 2005 3:21 am    Post subject: Re: single double precision question ....... more Reply with quote



On 29 Sep 2005 18:44:51 -0700, [email]ma740988 (AT) gmail (DOT) com[/email] wrote in
comp.lang.c++:

Quote:

I've got an unpacker that unpacks a 32 bit word into 3-10 bits samples.
Bits 0 and 1 are dont cares. For the purposes of perfoming an FFT and
an inverse FFT, I cast the 10 bit values into doubles. I'm told:

Why are you casting? Values with accessible bits are integer types,
and you can just assign them to doubles. A case is redundant and has
no effect at all in this case.

Quote:
"floats and doubles have bits for mantissa and exponent. (I knew that).
If you are subtract bits that partly belong to mantissa and partly
belong to exponent, that hardly makes sense. Anyway, looking on bits
only, a float has 32 bits and doesn't behave different than a integer."

C++ does not say how many bits a float has. It may have 32, it may
have 64, it can't have as few as 6, on a conforming implementation.

Quote:
My advisor went on to say:
If I unpacked the samples into an - say an array of floats, it's quite
possible that I could get a left side value whose bits were
*very* different to th eright-hand integer *and* most likely wouldn't
match exactly to the integer.

There is no way to "unpack" bits into an array of floats defined by
the C++ language. You can extract some of the bits from an integer
type into another suitably sized integer type object. Or you can
extract some of the bits and assign them to floating point type, that
is defined.

Quote:
So now:
float[0] = 0x2AA;

This is a syntax error, you can't have an array with a name identical
to a keyword. So let's assume that there is an array of floats named
'my_float'.

Now given:

my_float[0] = 0x2AA;

....then the integer literal '0x2AA' has type int and the value 682.
You assign this value to a float, which causes the compiler to convert
the value 682 into the float representation for 682.0 and assign it to
the float.

Quote:
or better:
float f = (float)0x2AA;

This is no better, it is worse because the cast is redundant and it
shows a lack of understanding.

Quote:
cout << f << endl;

would show perhaps
682.00000123

No it wouldn't, not with the code you posted.

Quote:
Makes absolutely no sense to me. All machines I ran the source on
produced 682.00000000000 (depending of course on your precision).

Am I being mislead here?

There are two things you are missing here. The first is that your
advisor is discussing ideas that you are not yet ready for, and may
never need.

But the main thing that you are missing here is that in C++
conversions are based on value, not on differences in the internal
bitwise implementation of the data.

Your advisor is talking about the differences in the internal bitwise
implementation of the int value 682 and the float value 682.0. If
programs are design correctly, very, very few of them will ever need
to be concerned about the difference.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html

Back to top
ma740988@gmail.com
Guest





PostPosted: Fri Sep 30, 2005 11:07 am    Post subject: Re: single double precision question ....... more Reply with quote


Thanks gents. I have a hard time expressing my ideas sometimes but
Jack you pointed out what I was alluding to: internal representation.


Quote:
| Your advisor is talking about the differences in the internal
bitwise
| implementation of the int value 682 and the float value 682.0. If
| programs are design correctly, very, very few of them will ever need

| to be concerned about the difference.
Now can you highlight a case where I would be concerned about the

difference. That was my 'real' question because for some reason I
cant see it.

You see in my mind and for my case:

my_float[0] = 0x2AA;

The value 682 converted from int to float representation would amount
to 682.0. I cant see how it will be 682.00000000000023 or ...


Back to top
Christian Meier
Guest





PostPosted: Fri Sep 30, 2005 11:25 am    Post subject: Re: single double precision question ....... more Reply with quote

<ma740988 (AT) gmail (DOT) com> schrieb im Newsbeitrag
news:1128078465.905091.142210 (AT) g14g2000cwa (DOT) googlegroups.com...
Quote:

Thanks gents. I have a hard time expressing my ideas sometimes but
Jack you pointed out what I was alluding to: internal representation.


|| Your advisor is talking about the differences in the internal
bitwise
|| implementation of the int value 682 and the float value 682.0. If
|| programs are design correctly, very, very few of them will ever need

|| to be concerned about the difference.
Now can you highlight a case where I would be concerned about the
difference. That was my 'real' question because for some reason I
cant see it.

You see in my mind and for my case:

my_float[0] = 0x2AA;

The value 682 converted from int to float representation would amount
to 682.0. I cant see how it will be 682.00000000000023 or ...


floating point datatypes have a deviation....



Back to top
ma740988@gmail.com
Guest





PostPosted: Fri Sep 30, 2005 11:32 am    Post subject: Re: single double precision question ....... more Reply with quote

uhmnn, David and Jack just told me it doesnt. I'll get teh correct
result always. i.e 682.0000000000000000

What deviation are you referring to?

Back to top
Kai-Uwe Bux
Guest





PostPosted: Fri Sep 30, 2005 12:17 pm    Post subject: Re: single double precision question ....... more Reply with quote

[email]ma740988 (AT) gmail (DOT) com[/email] wrote:

Quote:

Thanks gents. I have a hard time expressing my ideas sometimes but
Jack you pointed out what I was alluding to: internal representation.


|| Your advisor is talking about the differences in the internal
bitwise
|| implementation of the int value 682 and the float value 682.0. If
|| programs are design correctly, very, very few of them will ever need

|| to be concerned about the difference.
Now can you highlight a case where I would be concerned about the
difference. That was my 'real' question because for some reason I
cant see it.

You see in my mind and for my case:

my_float[0] = 0x2AA;

The value 682 converted from int to float representation would amount
to 682.0. I cant see how it will be 682.00000000000023 or ...

It will not. Small integer values have exact representations in float or
double on any decent c++ implementation. (It is true that this is a quality
of implementation issue as the standard is remarkably shy to give any
guarantees about floating point arithmetic). Floating point arithmetic
represents numbers by sign, mantissa, and exponent. Since a float or a
double uses only finitely many bits, only finitely many real numbers are
representable as a float. Usually, we deal with the missing reals by
considering a nearby float approximating them as their representation.
However, as long as the bitlength of the mantissa can host your integer, it
can be represented as a float without being just approximated.

Now, as for the bit patterns, they will generally look vastly different.
However, that should not be of your concern. The compiler will generate the
code taking care of all necessary bit-shuffling when converting an int to a
float.


Best

Kai-Uwe Bux

Back to top
David White
Guest





PostPosted: Sat Oct 01, 2005 9:09 pm    Post subject: Re: single double precision question ....... more Reply with quote

<ma740988 (AT) gmail (DOT) com> wrote

Quote:

Thanks gents. I have a hard time expressing my ideas sometimes but
Jack you pointed out what I was alluding to: internal representation.


|| Your advisor is talking about the differences in the internal
bitwise
|| implementation of the int value 682 and the float value 682.0. If
|| programs are design correctly, very, very few of them will ever need

|| to be concerned about the difference.
Now can you highlight a case where I would be concerned about the
difference. That was my 'real' question because for some reason I
cant see it.

You see in my mind and for my case:

my_float[0] = 0x2AA;

The value 682 converted from int to float representation would amount
to 682.0. I cant see how it will be 682.00000000000023 or ...

I can't highlight a case concerning a direct conversion from integer 682, or
similar sized value, to a float, because on any implementation you are
likely to come across you'll get an exact conversion. I can, however,
describe a real case I had recently: I was reading the speed of a vacuum
pump that has a maximum speed of 1500 Hz. I needed to display it to the user
as a percentage of maximum speed using a Number object (our own class) that
was created with limits 0 to 100. I read the Hz value and converted it to a
percentage like this:
userDisplayParam.setValue(speedInHz / 15.f);
The problem was that if you started the program with the pump already at
full speed it would sometimes take minutes to show anything but the default
zero value for the speed, even though it was being read every second, and it
strangely would never show 100. The highest you ever saw was 99.9. It turned
out that the "division" in the expression above was turned into a
multiplication by the compiler, i.e., instead of speed / 15.f it was doing
speed * (1/15.f), where 1/15.f was pre-calculated by the compiler. This
makes sense because multiplications can be done faster than divisions.
Unfortunately, 1/15.f cannot be represented exactly in a binary float value,
so there was a slight error in the result even when the speed in Hz is
exactly divisible by 15. My assumed exact 100% result for 1500/15 turned out
to be something like 100.0001, which exceeded the limit of the user-display
Number, so it never showed 100% for the speed.

DW




Back to top
Display posts from previous:   
Post new topic   Reply to topic    C++Talk.NET Forum Index -> C++ language (comp.lang.c++) All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.