Companion to hypot()

The first thing to do is factorize:

b = sqrt(h*h - a*a) = sqrt((h-a)*(h+a))

We have not only avoided some overflow, but also gained accuracy.

If any factor is close to 1E+154 = sqrt(1E+308) (max with IEEE 754 64 bits float) then we must also avoid overflow:

sqrt((h-a)*(h+a)) = sqrt(h-a) * sqrt(h+a)

This case is very unlikely, so the two sqrt's are justified, even if its slower than just a sqrt.

Notice that if h ~ 5E+7 * a then h ~ b which means that there are not enough digits to represent b as different from h.


This answer assumes a platform that uses floating-point arithmetic compliant with IEEE-754 (2008) and provides fused multiply-add (FMA) capability. Both conditions are met by common architectures such as x86-64, ARM64, and Power. FMA is exposed in ISO C99 and later C standards as a standard math function fma(). On hardware that does not provide an FMA instruction, this requires emulation, which could be slow and functionally deficient.

Mathematically, the length of one leg (cathetus) in a right triangle, given the length of the hypotenuse and the other leg, is simply computed as √(h²-a²), where h is the length of the hypotenuse. But when computed with finite-precision floating-point arithmetic, we face two problems: Overflow or underflow to zero may take place when computing the squares, and subtraction of the squares gives rise to subtractive cancellation when the squares have similar magnitude.

The first issue is easily taken care of by scaling by 2n such that the term larger in magnitude is moved closer to unity. As subnormal numbers may be involved, this cannot be accomplished by manipulating the exponent field, as there may be a need to normalize / denormalize. But we can compute the required scale factors by exponent field bit manipulation, the multiply by the factors. We know that the hypotenuse has to be longer or the same length as the given leg for non-exceptional cases, so can base the scaling on that argument.

Dealing with subtractive cancellation is harder, but we are lucky that computation very similar to our computation h²-a² occurs in other important problems. For example, the grandmaster of floating-point computation looked into the accurate computation of the discriminant of the quadratic formula, b²-4ac:

William Kahan, "On the Cost of Floating-Point Computation Without Extra-Precise Arithmetic", Nov. 21, 2004 (online)

More recently, French researchers addressed the more general case of the difference of two products, ad-bc:

Claude-Pierre Jeannerod, Nicolas Louvet, Jean-Michel Muller, "Further analysis of Kahan's algorithm for the accurate computation of 2 x 2 determinants." Mathematics of Computation, Vol. 82, No. 284, Oct. 2013, pp. 2245-2264 (online)

The FMA-based algorithm in the second paper computes the difference of two products with a proven maximum error of 1.5 ulp. With this building block, we arrive at the straightforward ISO C99 implementation of the cathetus computation below. A maximum error of 1.2 ulp was observed in one billion random trials as determined by comparing with the results from an arbitrary-precision library:

#include <stdint.h>
#include <string.h>
#include <float.h>
#include <math.h>

uint64_t __double_as_uint64 (double a)
{
    uint64_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

double __uint64_as_double (uint64_t a)
{
    double r;
    memcpy (&r, &a, sizeof r);
    return r;
}

/*
  diff_of_products() computes a*b-c*d with a maximum error < 1.5 ulp

  Claude-Pierre Jeannerod, Nicolas Louvet, and Jean-Michel Muller, 
  "Further Analysis of Kahan's Algorithm for the Accurate Computation 
  of 2x2 Determinants". Mathematics of Computation, Vol. 82, No. 284, 
  Oct. 2013, pp. 2245-2264
*/
double diff_of_products (double a, double b, double c, double d)
{
    double w = d * c;
    double e = fma (-d, c, w);
    double f = fma (a, b, -w);
    return f + e;
}

/* compute sqrt (h*h - a*a) accurately, avoiding spurious overflow */
double my_cathetus (double h, double a)
{
    double fh, fa, res, scale_in, scale_out, d, s;
    uint64_t expo;

    fh = fabs (h);
    fa = fabs (a);

    /* compute scale factors */
    expo = __double_as_uint64 (fh) & 0xff80000000000000ULL;
    scale_in = __uint64_as_double (0x7fc0000000000000ULL - expo);
    scale_out = __uint64_as_double (expo + 0x0020000000000000ULL);

    /* scale fh towards unity */
    fh = fh * scale_in;
    fa = fa * scale_in;

    /* compute sqrt of difference of scaled arguments, avoiding overflow */
    d = diff_of_products (fh, fh, fa, fa);
    s = sqrt (d);

    /* reverse previous scaling */
    res = s * scale_out;

    /* handle special arguments */
    if (isnan (h) || isnan (a)) {
        res = h + a;
    }

    return res;
}