Is there an elegant and fast way to test for the 1-bits in an integer to be in a contiguous region?

static _Bool IsCompact(unsigned x)
{
    return (x & x + (x & -x)) == 0;
}

Briefly:

x & -x gives the lowest bit set in x (or zero if x is zero).

x + (x & -x) converts the lowest string of consecutive 1s to a single 1 (or wraps to zero).

x & x + (x & -x) clears those 1 bits.

(x & x + (x & -x)) == 0 tests whether any other 1 bits remain.

Longer:

-x equals ~x+1 (for the int in the question, we assume two’s complement, but unsigned is preferable). After the bits are flipped in ~x, adding 1 carries so that it flips back the low 1 bits in ~x and the first 0 bit but then stops. Thus, the low bits of -x up to and including its first 1 are the same as the low bits of x, but all higher bits are flipped. (Example: ~10011100 gives 01100011, and adding 1 gives 01100100, so the low 100 are the same, but the high 10011 are flipped to 01100.) Then x & -x gives us the only bit that is 1 in both, which is that lowest 1 bit (00000100). (If x is zero, x & -x is zero.)

Adding this to x causes a carry through all the consecutive 1s, changing them to 0s. It will leave a 1 at the next higher 0 bit (or carry through the high end, leaving a wrapped total of zero) (10100000.)

When this is ANDed with x, there are 0s in the places where the 1s were changed to 0s (and also where the carry changed a 0 to a 1). So the result is not zero only if there is another 1 bit higher up.


There is actually no need to use any intrinsics.

First flip all the 0s before the first 1. Then test if the new value is a mersenne number. In this algo, zero is mapped to true.

bool has_compact_bits( unsigned const x )
{
    // fill up the low order zeroes
    unsigned const y = x | ( x - 1 );
    // test if the 1's is one solid block
    return not ( y & ( y + 1 ) );
}

Of course, if you want to use intrinsics, here is the popcount method:

bool has_compact_bits( unsigned const x )
{
    size_t const num_bits = CHAR_BIT * sizeof(unsigned);
    size_t const sum = __builtin_ctz(x) + __builtin_popcount(x) + __builtin_clz(z);
    return sum == num_bits;
}

Actually you don't need to count leading zeros. As suggested by pmg in the comments, exploiting the fact that the numbers you are looking for are those of sequence OEIS A023758, i.e. Numbers of the form 2^i - 2^j with i >= j, you may just count trailing zeros (i.e. j - 1), toggle those bits in the original value (equivalent to add 2^j - 1), and then check if that value is of the form 2^i - 1. With GCC/clang intrinsics,

bool has_compact_bits(int val) {
    if (val == 0) return true; // __builtin_ctz undefined if argument is zero
    int j = __builtin_ctz(val) + 1;
    val |= (1 << j) - 1; // add 2^j - 1
    val &= (val + 1); // val set to zero if of the form (2^i - 1)
    return val == 0;
}

This version is slightly faster then yours and the one proposed by KamilCuk and the one by Yuri Feldman with popcount only.

If you are using C++20, you may get a portable function by replacing __builtin_ctz with std::countr_zero:

#include <bit>

bool has_compact_bits(int val) {
    int j = std::countr_zero(static_cast<unsigned>(val)) + 1; // ugly cast
    val |= (1 << j) - 1; // add 2^j - 1
    val &= (val + 1); // val set to zero if of the form (2^i - 1)
    return val == 0;
}

The cast is ugly, but it is warning you that it is better to work with unsigned types when manipulating bits. Pre-C++20 alternatives are boost::multiprecision::lsb.

Edit:

The benchmark on the strikethrough link was limited by the fact that no popcount instruction had been emitted for Yuri Feldman version. Trying to compile them on my PC with -march=westmere, I've measured the following time for 1 billion iterations with identical sequences from std::mt19937:

  • your version: 5.7 s
  • KamilCuk's second version: 4.7 s
  • my version: 4.7 s
  • Eric Postpischil's first version: 4.3 s
  • Yuri Feldman's version (using explicitly __builtin_popcount): 4.1 s

So, at least on my architecture, the fastest seems to be the one with popcount.

Edit 2:

I've updated my benchmark with the new Eric Postpischil's version. As requested in the comments, code of my test can be found here. I've added a no-op loop to estimate the time needed by the PRNG. I've also added the two versions by KevinZ. Code has been compiled on clang with -O3 -msse4 -mbmi to get popcnt and blsi instruction (thanks to Peter Cordes).

Results: At least on my architecture, Eric Postpischil's version is exactly as fast as Yuri Feldman's one, and at least twice faster than any other version proposed so far.