Floating Point Number
This page is about standard implementations and the concept. For platform information, please see the relevant page
|Real numbers, coprocessors and vector units|
Floating Point Number
Floating point numbers are a way to represent real numbers inside computer memory which (usually) operates with binary digits. As opposed to fixed point numbers which have a fixed number of digits before and after the decimal mark, floating point numbers can be considered to have a certain amount of significant digits or 'accurate leading numbers' that we consider to carry an accurate approximation to some value. Floating point numbers are inherently not a precise storage medium precisely because they store an approximation to a value and not the precise value itself This requires care when writing code both in low or high-level languages to avoid running into precision errors which may break your program.
Because floating point numbers have a different representation than integers, they can not be treated as ordinary integers with ordinary instructions. Luckily for you, modern processors come with a dedicated on-chip FPU (Floating Point Unit) co-processor that quickly performs floating point calculations in hardware. On x86 the FPU and it's instructions are named 'x87' because of historical reasons. Embedded devices may or may not come with an FPU, depending on the complexity and price range of these devices. Check your spec! (Before FPUs became an affordable upgrade for desktop computers, floating point emulation libraries would be either present or installable on computers. Sometimes these libraries even required manual installation by the end user!)
This article will explain the IEEE 754 standard for floating point number representation. This standard is extremely common, not x86-specific and has multiple 'precisions' or representation sizes which are all similar in how they work except for the number of bits used.
|single precison||32 bits||±1.18x10^-38 to ±3.4x10^38||approx 7 digits|
|double precison||64 bits||±2.23x10^-308 to ±1.8x10^308||approx 15 digits|
|half precison||16 bits||±4.8x10^-4 to ±3.24x10^4||approx 7 digits|
- Float to String Idea - also contains some good general information about floating-point numbers