How are numbers with decimal point handled in an MCU?

Numbers inside typical micrcontrollers don't have decimal points at all. They are binary integers. There is no decimal going on inside the machine. The compiler or assembler may let you specify constants that way, but they get converted to binary before the machine sees them.

However, you can decide whatever units you like for the integer values. For example, suppose you wanted to represent dollars inside a micro. It can't natively do $3.21, but it could do 321 cents. The micro is just operating on the value 321, but you know that it represents units of 1/100 dollars.

That's just one example to illustrate the concept of arbitrary units. Often numbers are represented with several binary fraction bits. That's the same as saying each count represents a value of 2-N, where N is the number of fraction bits. This representation is called "fixed point". You decide up front how much resolution you need, and pretend there are enough bits to the right of the imagined binary point to support that resolution. For example, lets say you need to represent something to at least a resolution of 1/100. In that case you'd use at least 7 fraction bits since 27 = 128. That will actually give you a resolution of 1/128.

The machine has no idea this is going on. It will add and subtract these numbers as ordinary integers, but everything still works out. It gets a little tricky when you multiply and divide fixed point values. The product of two fixed point values with N fraction bits will have 2N fraction bits. Sometimes you just keep track of the fact that the new number has 2N fraction bits, or sometimes you might shift it right by N bits to get back to the same representation as before.

Floating point is the same thing, but the number of fraction bits are stored along with the integer part so that this adjustment can be made at runtime. Performing math operations on floating point numbers can take a bunch of cycles. Floating point hardware does all this for you so that the operations complete quickly. However, the same manipulations can be performed in software too. There is no reason you can't write a subroutine to add two floating point numbers, just that it would take a lot longer than dedicated hardware doing the same thing.

I have defined a 3-byte floating point format for 8 bit PICs and written a bunch of routines to manipulate them. Microcontrollers are usually dealing with real world values with 10 or 12 bits precision at most. My floating point format uses 16 bits of precision, which is good enough for several intermediate calculations.

I also have a 32-bit format for the 16 bit PICs. This uses one 16-bit word for the mantissa, which speeds calculations since these PICs can operate on 16 bits at a time.

These routines are included in my PIC Development Tools release. After installation, look at files with "fp24" in their name in the SOURCE > PIC directory, and "fp32f" in the SOURCE > DSPIC directory.


Fixed point arithmetic is usually used for performing fractional calculations in MCUs.

The trick is to say that (for example), the upper 16 bits of a uint32_t are before the decimal point and the lower 16 are after, ie. the stored integer is in 1/2^16ths. With some small caveats, regular arithmetic "just works".

Here's an overview.


Unless you MCU is a DSP with floating point multiplier, everything is stored as 16 bit (or 8 or 32 depending on your platform) numbers. That is all the actual MCU knows about.

Above this you have your "C" code and C compiler. The compiler "knows" about various data types such as char, int's, uint's, floats, doubles and so on.

The most common representation of floats on hardware is with a IEEE format. This separates the mantissa from the exponent and uses two 16 bit words to store the information. Check out this wiki article on IEEE number formats.

So it is the compiler that knows where the mantissa and exponent is and applies the maths to it. Remeber learning about logarithms? how they made maths easier by adding power when you wanted to multiple? well the c compiler does something similar with the exponnents and multiplies the mantissa to calculate the answer. So for a floating point multiplication the compiler will create assembler code that adds the exponents and performs multiplication of mantissa.

the MCU knows nothing of the number!!! just what it is told to do, load a memory into a register, add a memory to the register and set the carry flag if required, and so on until the multiplication is complete.

It is the C compiler and your code that "abstracts" the concept of numbers, decimal points and so on from the MCU.

On a side note some language also support the "decimal" data type which is useful for financial systems - not common on embedded platforms as floats use less memory and perform efficently.