Integer cube root

I've adapted the algorithm presented in 1.5.2 (the kth root) in Modern Computer Arithmetic (Brent and Zimmerman). For the case of (k == 3), and given a 'relatively' accurate over-estimate of the initial guess - this algorithm seems to out-perform the 'Hacker's Delight' code above.

Not only that, but MCA as a text provides theoretical background as well as a proof of correctness and terminating criteria.

Provided that we can produce a 'relatively' good initial over-estimate, I haven't been able to find a case that exceeds (7) iterations. (Is this effectively related to 64-bit values having 2^6 bits?) Either way, it's an improvement over the (21) iterations in the HacDel code - with linear O(b) convergence, despite having a loop body that is evidently much faster.

The initial estimate I've used is based on a 'rounding up' of the number of significant bits in the value (x). Given (b) significant bits in (x), we can say: 2^(b - 1) <= x < 2^b. I state without proof (though it should be relatively easy to demonstrate) that: 2^ceil(b / 3) > x^(1/3)


static inline uint32_t u64_cbrt (uint64_t x)
{
    uint64_t r0 = 1, r1;

    /* IEEE-754 cbrt *may* not be exact. */

    if (x == 0) /* cbrt(0) : */
        return (0);

    int b = (64) - __builtin_clzll(x);
    r0 <<= (b + 2) / 3; /* ceil(b / 3) */

    do /* quadratic convergence: */
    {
        r1 = r0;
        r0 = (2 * r1 + x / (r1 * r1)) / 3;
    }
    while (r0 < r1);

    return ((uint32_t) r1); /* floor(cbrt(x)); */
}

A crbt call probably isn't all that useful - unlike the sqrt call which can be efficiently implemented on modern hardware. That said, I've seen gains for sets of values under 2^53 (exactly represented in IEEE-754 doubles), which surprised me.

The only downside is the division by: (r * r) - this can be slow, as the latency of integer division continues to fall behind other advances in ALUs. The division by a constant: (3) is handled by reciprocal methods on any modern optimising compiler.

It's interesting that Intel's 'Icelake' microarchitecture will significantly improve integer division - an operation that seems to have been neglected for a long time. I simply won't trust the 'Hacker's Delight' answer until I can find a sound theoretical basis for it. And then I have to work out which variant is the 'correct' answer.


The book "Hacker's Delight" has algorithms for this and many other problems. The code is online here. EDIT: That code doesn't work properly with 64-bit ints, and the instructions in the book on how to fix it for 64-bit are somewhat confusing. A proper 64-bit implementation (including test case) is online here.

I doubt that your squareroot function works "correctly" - it should be ulong a for the argument, not n :) (but the same approach would work using cbrt instead of sqrt, although not all C math libraries have cube root functions).