ANSI-C: maximum number of characters printing a decimal int

I don't know if it is any trick to do what you want in plain ANSI-C, but in C++ you can easily use template metaprogramming to do:

#include    <iostream>
#include    <limits>
#include    <climits>

template< typename T, unsigned long N = INT_MAX >
class   MaxLen
{
public:
    enum
    {
        StringLen = MaxLen< T, N / 10 >::StringLen + 1
    };
};

template< typename T >
class   MaxLen< T, 0 >
{
public:
    enum
    {
        StringLen = 1
    };
};

And you can call it from your pure-C code creating an additional C++ function like this:

extern "C"
int int_str_max( )
{
    return  MaxLen< int >::StringLen;
}

This has a ZERO execution time overhead and calculates the exact space needed.

You can test the above templates with something like:

int main( )
{
std::cout << "Max: " << std::numeric_limits< short >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< short >::digits10 << std::endl;
std::cout << "A \"short\" is " << sizeof( short ) << " bytes." << std::endl
    << "A string large enough to fit any \"short\" is "
    << MaxLen< short, SHRT_MAX >::StringLen << " bytes wide." << std::endl;

std::cout << "Max: " << std::numeric_limits< int >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< int >::digits10 << std::endl;
std::cout << "An \"int\" is " << sizeof( int ) << " bytes." << std::endl
    << "A string large enough to fit any \"int\" is "
    << MaxLen< int >::StringLen << " bytes wide." << std::endl;

std::cout << "Max: " << std::numeric_limits< long >::max( ) << std::endl;
std::cout << "Digits: " << std::numeric_limits< long >::digits10 << std::endl;
std::cout << "A \"long\" is " << sizeof( long ) << " bytes." << std::endl
    << "A string large enough to fit any \"long\" is "
    << MaxLen< long, LONG_MAX >::StringLen << " bytes wide." << std::endl;

    return  0;
}

The output is:

Max: 32767
Digits: 4
A "short" is 2 bytes.
A string large enough to fit any "short" is 6 bytes wide.
Max: 2147483647
Digits: 9
An "int" is 4 bytes.
A string large enough to fit any "int" is 11 bytes wide.
Max: 9223372036854775807
Digits: 18
A "long" is 8 bytes.
A string large enough to fit any "long" is 20 bytes wide.

Note the slightly different values from std::numeric_limits< T >::digits10 and MaxLen< T, N >::StringLen, as the former does not take into account digits if if can't reach '9'. Of course you can use it and simply add two if you don't care wasting a single byte in some cases.

EDIT:

Some may have found weird including <climits>. If you can count with C++11, you won't need it, and will earn an additional simplicity:

#include    <iostream>
#include    <limits>

template< typename T, unsigned long N = std::numeric_limits< T >::max( ) >
class   MaxLen
{
public:
    enum
    {
        StringLen = MaxLen< T, N / 10 >::StringLen + 1
    };
};

template< typename T >
class   MaxLen< T, 0 >
{
public:
    enum
    {
        StringLen = 1
    };
};

Now you can use

MaxLen< short >::StringLen

instead of

MaxLen< short, SHRT_MAX >::StringLen

Good, isn't?

The simplest canonical and arguably most portable way is to ask snprintf() how much space would be required:

char sbuf[2];
int ndigits;

ndigits = snprintf(sbuf, (size_t) 1, "%lld", (long long) INT_MIN);

slightly less portable perhaps using intmax_t and %j:

ndigits = snprintf(sbuf, (size_t) 1, "%j", (intmax_t) INT_MIN);

One could consider that to be too expensive to do at runtime though, but it can work for any value, not just the MIN/MAX values of any integer type.

You could of course also just directly calculate the number of digits that a given integer would require to be expressed in Base 10 notation with a simple recursive function:

unsigned int
numCharsB10(intmax_t n)
{
        if (n < 0)
                return numCharsB10((n == INTMAX_MIN) ? INTMAX_MAX : -n) + 1;
        if (n < 10)
                return 1;

        return 1 + numCharsB10(n / 10);
}

but that of course also requires CPU at runtime, even when inlined, though perhaps a little less than snprintf() does.

@R.'s answer above though is more or less wrong, but on the right track. Here's the correct derivation of some very well and widely tested and highly portable macros that implement the calculation at compile time using sizeof(), using a slight correction of @R.'s initial wording to start out:

First we can easily see (or show) that sizeof(int) is the log base 2 of UINT_MAX divided by the number of bits represented by one unit of sizeof() (8, aka CHAR_BIT):

sizeof(int) == log2(UINT_MAX) / 8

because UINT_MAX is of course just 2 ^ (sizeof(int) * 8)) and log2(x) is the inverse of 2^x.

We can use the identity "logb(x) = log(x) / log(b)" (where log() is the natural logarithm) to find logarithms of other bases. For example, you could compute the "log base 2" of "x" using:

log2(x) = log(x) / log(2)

and also:

log10(x) = log(x) / log(10)

So, we can deduce that:

log10(v) = log2(v) / log2(10)

Now what we want in the end is the log base 10 of UINT_MAX, so since log2(10) is approximately 3, and since we know from above what log2() is in terms of sizeof(), we can say that log10(UINT_MAX) is approximately:

log10(2^(sizeof(int)*8)) ~= (sizeof(int) * 8) / 3

That's not perfect though, especially since what we really want is the ceiling value, but with some minor adjustment to account for the integer rounding of log2(10) to 3, we can get what we need by first adding one to the log2 term, then subtracting 1 from the result for any larger-sized integer, resulting in this "good-enough" expression:

#if 0
#define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
    ((((sizeof(t) * CHAR_BIT) + 1) / 3) - ((sizeof(t) > 2) ? 1 : 0))
#endif

Even better we can multiply our first log2() term by 1/log2(10) (multiplying by the reciprocal of the divisor is the same as dividing by the divisor), and doing so makes it possible to find a better integer approximation. I most recently (re?)encountered this suggestion while reading Sean Anderson's bithacks: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10

To do this with integer math to the best approximation possible, we need to find the ideal ratio representing our reciprocal. This can be found by searching for the smallest fractional part of multiplying our desired value of 1/log2(10) by successive powers of 2, within some reasonable range of powers of 2, such as with the following little AWK script:

    awk 'BEGIN {
            minf=1.0
    }
    END {
            for (i = 1; i <= 31; i++) {
                    a = 1.0 / (log(10) / log(2)) * 2^i
                    if (a > (2^32 / 32))
                            break;
                    n = int(a)
                    f = a - (n * 1.0)
                    if (f < minf) {
                            minf = f
                            minn = n
                            bits = i
                    }
                    # printf("a=%f, n=%d, f=%f, i=%d\n", a, n, f, i)
            }
            printf("%d + %f / %d, bits=%d\n", minn, minf, 2^bits, bits)
    }' < /dev/null

    1233 + 0.018862 / 4096, bits=12

So we can get a good integer approximation of multiplying our log2(v) value by 1/log2(10) by multiplying it by 1233 followed by a right-shift of 12 (2^12 is 4096 of course):

log10(UINT_MAX) ~= ((sizeof(int) * 8) + 1) * 1233 >> 12

and, together with adding one to do the equivalent of finding the ceiling value, that gets rid of the need to fiddle with odd values:

#define __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) \
    (((((sizeof(t) * CHAR_BIT)) * 1233) >> 12) + 1)

/*
 * for signed types we need room for the sign, except for int64_t
 */
#define __MAX_B10STRLEN_FOR_SIGNED_TYPE(t) \
    (__MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t) + ((sizeof(t) == 8) ? 0 : 1))

/*
 * NOTE: this gives a warning (for unsigned types of int and larger) saying
 * "comparison of unsigned expression < 0 is always false", and of course it
 * is, but that's what we want to know (if indeed type 't' is unsigned)!
 */
#define __MAX_B10STRLEN_FOR_INT_TYPE(t)                     \
    (((t) -1 < 0) ? __MAX_B10STRLEN_FOR_SIGNED_TYPE(t)      \
                  : __MAX_B10STRLEN_FOR_UNSIGNED_TYPE(t))

whereas normally the compiler will evaluate at compile time the expression my __MAX_B10STRLEN_FOR_INT_TYPE() macro becomes. Of course my macro always calculates the maximum space required by a given type of integer, not the exact space required by a particular integer value.

If you assume CHAR_BIT is 8 (required on POSIX, so a safe assumption for any code targetting POSIX systems as well as any other mainstream system like Windows), a cheap safe formula is 3*sizeof(int)+2. If not, you can make it 3*sizeof(int)*CHAR_BIT/8+2, or there's a slightly simpler version.

In case you're interested in the reason this works, sizeof(int) is essentially a logarithm of INT_MAX (roughly log base 2^CHAR_BIT), and conversion between logarithms of different bases (e.g. to base 10) is just multiplication. In particular, 3 is an integer approximation/upper bound on log base 10 of 256.

The +2 is to account for a possible sign and null termination.

ANSI-C: maximum number of characters printing a decimal int

Tags:

Type Conversion

C

String

Int

Related

Recent Posts