A
computer can use only two kinds of values. That is, fixed point and
floating point. The fixed point values are stored in the computer
memory in binary format representing their ASCII value.
For
example:-
Character
‘A’ can be stored as- 1000001. Because, 65 is ASCII value of ‘a’.
In case of floating point values, these follow the IEEE 754
standard to store in memory. Whenever any programming language
declared- float a; Then the variable 'a's value will be stored
in memory by following IEEE 754 standard.
This
standard specifies the single precision and double precision format.
In case of C, C++ and Java, float and double data types
specify the single and double precision which requires 32 bits
(4-bytes) and 64 bits (8-bytes) respectively to store the data.
Lets
have a look at these precision formats.
Single
Precision:-
It
requires 32 bit to store. Following is the format of single
precision.
In
order to store a float value in computer memory, a specified
algorithm is followed.
Take
an example at float value- 3948.125
- Covert 3948 to binary. i.e. 111101101100
- Convert .125 to binary,
0.125
x 2 = 0.25 0
0.25
x 2 = 0.5 0
0.5
x 2 = 1 1
=
0.001
Now
3948.125 = 111101101100.001
- Normalize the number so that the decimal point will be placed after MSB-1. i.e.
111101101100.001
= 1.11101101100001 x 211
- Now, for this number s=0, as the number is positive.
Exponent'
= 11 and
Mantissa
= 11101101100001
- Bias for single precision used is 127 so,
Final
exponent = exponent' + 127 i.e.
E=
11 + 127= 138 = 10001010 in binary.
- Final value-
In
this format the number 3948.125 will be stored in main memory.
For
double precision values following changes are expected:
Total
bits required – 64
Exponent
– 11 bits
Mantissa
– 52 bits
Bias
value – 1023
Now,
if you want to find the IEEE 754 representation at any floating point
number, following program can be used.
#include<stdio.h>
int
binary(int n, int i)
{
int k;
for (i--; i >=
0; i--)
{
k = n >>
i;
if (k & 1)
printf("1");
else
printf("0");
}
}
typedef
union
{
float f;
struct
{
unsigned int
mantissa : 23;
unsigned int
exponent : 8;
unsigned int
sign : 1;
} field;
}
myfloat;
int
main()
{
myfloat var;
printf("Enter
any float number: ");
scanf("%f",&var.f);
printf("%d
",var.field.sign);
binary(var.field.exponent,
8);
printf("
");
binary(var.field.mantissa,
23);
printf("\n");
return 0;
}
Explanation-
The
function binary( ) is used to convert the number ‘n’ into binary
format and print its ‘i’ number of bits.
In
C, structure members can be specified with no. of bits with size. It
is known as bit fields. As ‘float
f’ is declared in ‘union myfloat’. It
can use 23 bits to store mantissa exponent can use 8 and sign can use
one! The variable ‘var’ is at myfloat type. So, in
order to access mantissa, we can use ‘var.field. mantissa’.
Here, mantissa is the name of internal structure. So, float value’s
internal bits can be accessed bitwise with sign, exponent
and mantissa separately.
Run
the program and see the output of the said example!