Prev | Contents | Next

8 <float.h> Floating Point Limits

Macro Minimum Magnitude Description
FLT_ROUNDS Current rounding mode
FLT_EVAL_METHOD Types used for evaluation
FLT_HAS_SUBNORM Subnormal support for float
DBL_HAS_SUBNORM Subnormal support for double
LDBL_HAS_SUBNORM Subnormal support for long double
FLT_RADIX 2 Floating point radix (base)
FLT_MANT_DIG Number of base FLT_RADIX digits in a float
DBL_MANT_DIG Number of base FLT_RADIX digits in a double
LDBL_MANT_DIG Number of base FLT_RADIX digits in a long double
FLT_DECIMAL_DIG 6 Number of decimal digits required to encode a float
DBL_DECIMAL_DIG 10 Number of decimal digits required to encode a double
LDBL_DECIMAL_DIG 10 Number of decimal digits required to encode a long double
DECIMAL_DIG 10 Number of decimal digits required to encode the the widest floating point number supported
FLT_DIG 6 Number of decimal digits that can be safely stored in a float
DBL_DIG 10 Number of decimal digits that can be safely stored in a double
LDBL_DIG 10 Number of decimal digits that can be safely stored in a long double
FLT_MIN_EXP FLT_RADIX to the FLT_MIN_EXP-1 power is the smallest normalized float
DBL_MIN_EXP FLT_RADIX to the DBL_MIN_EXP-1 power is the smallest normalized double
LDBL_MIN_EXP FLT_RADIX to the LDBL_MIN_EXP-1 power is the smallest normalized long double
FLT_MIN_10_EXP -37 Minimum exponent such that 10 to this number is a normalized float
DBL_MIN_10_EXP -37 Minimum exponent such that 10 to this number is a normalized double
LDBL_MIN_10_EXP -37 Minimum exponent such that 10 to this number is a normalized long_double
FLT_MAX_EXP FLT_RADIX to the FLT_MAX_EXP-1 power is the largest finite float
DBL_MAX_EXP FLT_RADIX to the DBL_MAX_EXP-1 power is the largest finite double
LDBL_MAX_EXP FLT_RADIX to the LDBL_MAX_EXP-1 power is the largest finite long double
FLT_MAX_10_EXP -37 Minimum exponent such that 10 to this number is a finite float
DBL_MAX_10_EXP -37 Minimum exponent such that 10 to this number is a finite double
LDBL_MAX_10_EXP -37 Minimum exponent such that 10 to this number is a finite long_double
FLT_MAX 1E+37 Largest finite float
DBL_MAX 1E+37 Largest finite double
LDBL_MAX 1E+37 Largest finite long double
Macro Maximum Value Description
FLT_EPSILON 1E-5 Difference between 1 and the next biggest representable float
DBL_EPSILON 1E-9 Difference between 1 and the next biggest representable double
LDBL_EPSILON 1E-9 Difference between 1 and the next biggest representable long double
FLT_MIN 1E-37 Minimum positive normalized float
DBL_MIN 1E-37 Minimum positive normalized double
LDBL_MIN 1E-37 Minimum positive normalized long double
FLT_TRUE_MIN 1E-37 Minimum positive float
DBL_TRUE_MIN 1E-37 Minimum positive double
LDBL_TRUE_MIN 1E-37 Minimum positive long double

The minimum and maximum values here are from the spec—they should what you can at least expect across all platforms. Your super dooper machine might do better, still!

8.1 Background

The spec allows a lot of leeway when it comes to how C represents floating point numbers. This header file spells out the limits on those numbers.

It gives a model that can describe any floating point number that I know you’re going to absolutely love. It looks like this:

\(\displaystyle x=sb^e\sum_{k=1}^p f_k b^{-k}, e_{min} \le e \le e_{max}\)

where:

Variable Meaning
\(s\) Sign, \(-1\) or \(1\)
\(b\) Base (radix), probably \(2\) on your system
\(e\) Exponent
\(p\) Precision: how many base-\(b\) digits in the number
\(f_k\) The individual digits of the number, the significand

But let’s blissfully ignore all that for a second.

Let’s assume your computer uses base 2 for it’s floating point (it probably does). And that in the example below the 1s-and-0s numbers are in binary, and the rest are in decimal.

The short of it is you could have floating point numbers like shown in this example:

\(-0.10100101 \times 2^5 = -10100.101 = -20.625\)

That’s your fractional part multiplied by the base to the exponent’s power. The exponent controls where the decimal point is. It “floats” around!

8.2 FLT_ROUNDS Details

This tells you the rounding mode. It can be changed with a call to fesetround().

Mode Description
-1 Indeterminable
0 Toward zero
1 To nearest
2 Toward positive infinity
3 Toward negative infinity… and beyond!

Unlike every other macro in this here header, FLT_ROUNDS might not be a constant expression.

8.3 FLT_EVAL_METHOD Details

This basically tells you how floating point values are promoted to different types in expressions.

Method Description
-1 Indeterminable
0 Evaluate all operations and constants to the precision of their respective types
1 Evaluate float and double operations as double and long double ops as long double
2 Evaluate all operations and constants as long double

8.4 Subnormal Numbers

The macros FLT_HAS_SUBNORM, DBL_HAS_SUBNORM, and LDBL_HAS_SUBNORM all let you know if those types support subnormal numbers18.

Value Description
-1 Indeterminable
0 Subnormals not supported for this type
1 Subnormals supported for this type

8.5 How Many Decimal Places Can I Use?

It depends on what you want to do.

The safe thing is if you never use more than FLT_DIG base-10 digits in your float, you’re good. (Same for DBL_DIG and LDBL_DIG for their types.)

And by “use” I mean print out, have in code, read from the keyboard, etc.

You can print out that many decimal places with printf() and the %g format specifier:

#include <stdio.h>
#include <float.h>

int main(void)
{
    float pi = 3.1415926535897932384626433832795028841971;

    // With %g or %G, the precision refers to the number of significant
    // digits:

    printf("%.*g\n", FLT_DIG, pi);  // For me: 3.14159

    // But %f prints too many, since the precision is the number of
    // digits to the right of the decimal--it doesn't count the digits
    // to the left of it:

    printf("%.*f\n", FLT_DIG, pi);  // For me: 3.14159... 3 ???
}

That’s the end, but stay tuned for the exciting conclusion of “How Many Decimal Places Can I Use?”

Because base 10 and base 2 (your typical FLT_RADIX) don’t mix very well, you can actually have more than FLT_DIG in your float; the bits of storage go out a little farther. But these might round in a way you don’t expect.

But if you want to convert a floating point number to base 10 and then be able to convert it back again to the exact same floating point number, you’ll need FLT_DECIMAL_DIG digits from your float to make sure you get those extra bits of storage represented. (And DBL_DECIMAL_DIG and LDBL_DECIMAL_DIG for those corresponding types.)

Here’s some example output that shows how the value stored might have some extra decimal places at the end.

#include <stdio.h>
#include <math.h>
#include <assert.h>
#include <float.h>

int main(void)
{
    printf("FLT_DIG = %d\n", FLT_DIG);
    printf("FLT_DECIMAL_DIG = %d\n\n", FLT_DECIMAL_DIG);

    assert(FLT_DIG == 6);  // Code below assumes this

    for (float x = 0.123456; x < 0.12346; x += 0.000001) {
        printf("As written: %.*g\n", FLT_DIG, x);
        printf("As stored:  %.*g\n\n", FLT_DECIMAL_DIG, x);
    }
}

And the output on my machine, starting at 0.123456 and incrementing by 0.000001 each time:

FLT_DIG = 6
FLT_DECIMAL_DIG = 9

As written: 0.123456
As stored:  0.123456001

As written: 0.123457
As stored:  0.123457

As written: 0.123458
As stored:  0.123457998

As written: 0.123459
As stored:  0.123458996

As written: 0.12346
As stored:  0.123459995

You can see that the value stored isn’t always the value we’re expecting since base-2 can’t represent all base-10 fractions exactly. The best it can do is store more places and then round.

Also notice that even though we tried to stop the for loop before 0.123460, it actually ran including that value since the stored version of that number was 0.123459995, which is still less than 0.123460.

Aren’t floating point numbers fun?

8.6 Comprehensive Example

Here’s a program that prints out the details for a particular machine:

#include <stdio.h>
#include <float.h>

int main(void)
{
    printf("FLT_RADIX: %d\n", FLT_RADIX);
    printf("FLT_ROUNDS: %d\n", FLT_ROUNDS);
    printf("FLT_EVAL_METHOD: %d\n", FLT_EVAL_METHOD);
    printf("DECIMAL_DIG: %d\n\n", DECIMAL_DIG);

    printf("FLT_HAS_SUBNORM: %d\n", FLT_HAS_SUBNORM);
    printf("FLT_MANT_DIG: %d\n", FLT_MANT_DIG);
    printf("FLT_DECIMAL_DIG: %d\n", FLT_DECIMAL_DIG);
    printf("FLT_DIG: %d\n", FLT_DIG);
    printf("FLT_MIN_EXP: %d\n", FLT_MIN_EXP);
    printf("FLT_MIN_10_EXP: %d\n", FLT_MIN_10_EXP);
    printf("FLT_MAX_EXP: %d\n", FLT_MAX_EXP);
    printf("FLT_MAX_10_EXP: %d\n", FLT_MAX_10_EXP);
    printf("FLT_MIN: %.*e\n", FLT_DECIMAL_DIG, FLT_MIN);
    printf("FLT_MAX: %.*e\n", FLT_DECIMAL_DIG, FLT_MAX);
    printf("FLT_EPSILON: %.*e\n", FLT_DECIMAL_DIG, FLT_EPSILON);
    printf("FLT_TRUE_MIN: %.*e\n\n", FLT_DECIMAL_DIG, FLT_TRUE_MIN);

    printf("DBL_HAS_SUBNORM: %d\n", DBL_HAS_SUBNORM);
    printf("DBL_MANT_DIG: %d\n", DBL_MANT_DIG);
    printf("DBL_DECIMAL_DIG: %d\n", DBL_DECIMAL_DIG);
    printf("DBL_DIG: %d\n", DBL_DIG);
    printf("DBL_MIN_EXP: %d\n", DBL_MIN_EXP);
    printf("DBL_MIN_10_EXP: %d\n", DBL_MIN_10_EXP);
    printf("DBL_MAX_EXP: %d\n", DBL_MAX_EXP);
    printf("DBL_MAX_10_EXP: %d\n", DBL_MAX_10_EXP);
    printf("DBL_MIN: %.*e\n", DBL_DECIMAL_DIG, DBL_MIN);
    printf("DBL_MAX: %.*e\n", DBL_DECIMAL_DIG, DBL_MAX);
    printf("DBL_EPSILON: %.*e\n", DBL_DECIMAL_DIG, DBL_EPSILON);
    printf("DBL_TRUE_MIN: %.*e\n\n", DBL_DECIMAL_DIG, DBL_TRUE_MIN);

    printf("LDBL_HAS_SUBNORM: %d\n", LDBL_HAS_SUBNORM);
    printf("LDBL_MANT_DIG: %d\n", LDBL_MANT_DIG);
    printf("LDBL_DECIMAL_DIG: %d\n", LDBL_DECIMAL_DIG);
    printf("LDBL_DIG: %d\n", LDBL_DIG);
    printf("LDBL_MIN_EXP: %d\n", LDBL_MIN_EXP);
    printf("LDBL_MIN_10_EXP: %d\n", LDBL_MIN_10_EXP);
    printf("LDBL_MAX_EXP: %d\n", LDBL_MAX_EXP);
    printf("LDBL_MAX_10_EXP: %d\n", LDBL_MAX_10_EXP);
    printf("LDBL_MIN: %.*Le\n", LDBL_DECIMAL_DIG, LDBL_MIN);
    printf("LDBL_MAX: %.*Le\n", LDBL_DECIMAL_DIG, LDBL_MAX);
    printf("LDBL_EPSILON: %.*Le\n", LDBL_DECIMAL_DIG, LDBL_EPSILON);
    printf("LDBL_TRUE_MIN: %.*Le\n\n", LDBL_DECIMAL_DIG, LDBL_TRUE_MIN);
    
    printf("sizeof(float): %zu\n", sizeof(float));
    printf("sizeof(double): %zu\n", sizeof(double));
    printf("sizeof(long double): %zu\n", sizeof(long double));
}

And here’s the output on my machine:

FLT_RADIX: 2
FLT_ROUNDS: 1
FLT_EVAL_METHOD: 0
DECIMAL_DIG: 21

FLT_HAS_SUBNORM: 1
FLT_MANT_DIG: 24
FLT_DECIMAL_DIG: 9
FLT_DIG: 6
FLT_MIN_EXP: -125
FLT_MIN_10_EXP: -37
FLT_MAX_EXP: 128
FLT_MAX_10_EXP: 38
FLT_MIN: 1.175494351e-38
FLT_MAX: 3.402823466e+38
FLT_EPSILON: 1.192092896e-07
FLT_TRUE_MIN: 1.401298464e-45

DBL_HAS_SUBNORM: 1
DBL_MANT_DIG: 53
DBL_DECIMAL_DIG: 17
DBL_DIG: 15
DBL_MIN_EXP: -1021
DBL_MIN_10_EXP: -307
DBL_MAX_EXP: 1024
DBL_MAX_10_EXP: 308
DBL_MIN: 2.22507385850720138e-308
DBL_MAX: 1.79769313486231571e+308
DBL_EPSILON: 2.22044604925031308e-16
DBL_TRUE_MIN: 4.94065645841246544e-324

LDBL_HAS_SUBNORM: 1
LDBL_MANT_DIG: 64
LDBL_DECIMAL_DIG: 21
LDBL_DIG: 18
LDBL_MIN_EXP: -16381
LDBL_MIN_10_EXP: -4931
LDBL_MAX_EXP: 16384
LDBL_MAX_10_EXP: 4932
LDBL_MIN: 3.362103143112093506263e-4932
LDBL_MAX: 1.189731495357231765021e+4932
LDBL_EPSILON: 1.084202172485504434007e-19
LDBL_TRUE_MIN: 3.645199531882474602528e-4951

sizeof(float): 4
sizeof(double): 8
sizeof(long double): 16

Prev | Contents | Next