Count leading zeroes in an Int32

In .NET Core 3.0 there are BitOperations.LeadingZeroCount() and BitOperations.TrailingZeroCount() which map to the x86's LZCNT/BSR and TZCNT/BSF directly. As a result currently they'll be the most efficient solution


NOTE Using dotnet core >=3.0? Look here.

Let's take the number 20 as an example. It can be stated in binary as follows:

    00000000000000000000000000010100

First we "smear" the most significant bit over the lower bit positions by right shifting and bitwise-ORing over itself.

    00000000000000000000000000010100
 or 00000000000000000000000000001010 (right-shifted by 1)
 is 00000000000000000000000000011110

then

    00000000000000000000000000011110
 or 00000000000000000000000000000111 (right-shifted by 2)
 is 00000000000000000000000000011111

Here, because it's a small number, we've already completed the job, but by continuing the process with shifts of 4, 8 and 16 bits, we can ensure that for any 32-bit number, we have set all of the bits from 0 to the MSB of the original number to 1.

Now, if we count the number of 1s in our "smeared" result, we can simply subtract it from 32, and we are left with the number of leading zeros in the original value.

How do we count the number of set bits in an integer? This page has a magical algorithm for doing just that ("a variable-precision SWAR algorithm to perform a tree reduction"... if you get it, you're cleverer than me!), which translates to C# as follows:

int PopulationCount(int x)
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return (x & 0x0000003f);
}

By inlining this method with our "smearing" method above, we can produce a very fast, loop-free and conditional-free method for counting the leading zeros of an integer.

int LeadingZeros(int x)
{
    const int numIntBits = sizeof(int) * 8; //compile time constant
    //do the smearing
    x |= x >> 1; 
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    //count the ones
    x -= x >> 1 & 0x55555555;
    x = (x >> 2 & 0x33333333) + (x & 0x33333333);
    x = (x >> 4) + x & 0x0f0f0f0f;
    x += x >> 8;
    x += x >> 16;
    return numIntBits - (x & 0x0000003f); //subtract # of 1s from 32
}

If you'd like to mix assembly code in for peak performance. Here's how you do that in C#.

First the supporting code to make it possible:

using System.Runtime.InteropServices;
using System.Runtime.CompilerServices;
using static System.Runtime.CompilerServices.MethodImplOptions;

/// <summary> Gets the position of the right most non-zero bit in a UInt32.  </summary>
[MethodImpl(AggressiveInlining)] public static int BitScanForward(UInt32 mask) => _BitScanForward32(mask);

/// <summary> Gets the position of the left most non-zero bit in a UInt32.  </summary>
[MethodImpl(AggressiveInlining)] public static int BitScanReverse(UInt32 mask) => _BitScanReverse32(mask);


[DllImport("kernel32.dll", SetLastError = true)]
private static extern IntPtr VirtualAlloc(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);

private static TDelegate GenerateX86Function<TDelegate>(byte[] x86AssemblyBytes) {
    const uint PAGE_EXECUTE_READWRITE = 0x40;
    const uint ALLOCATIONTYPE_MEM_COMMIT = 0x1000;
    const uint ALLOCATIONTYPE_RESERVE = 0x2000;
    const uint ALLOCATIONTYPE = ALLOCATIONTYPE_MEM_COMMIT | ALLOCATIONTYPE_RESERVE;
    IntPtr buf = VirtualAlloc(IntPtr.Zero, (uint)x86AssemblyBytes.Length, ALLOCATIONTYPE, PAGE_EXECUTE_READWRITE);
    Marshal.Copy(x86AssemblyBytes, 0, buf, x86AssemblyBytes.Length);
    return (TDelegate)(object)Marshal.GetDelegateForFunctionPointer(buf, typeof(TDelegate));
}

Then here's the assembly to generate the functions:

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate Int32 BitScan32Delegate(UInt32 inValue);

private static BitScan32Delegate _BitScanForward32 = (new Func<BitScan32Delegate>(() => { //IIFE   
   BitScan32Delegate del = null;
   if(IntPtr.Size == 4){
      del = GenerateX86Function<BitScan32Delegate>(
         x86AssemblyBytes: new byte[20] {
         //10: int32_t BitScanForward(uint32_t inValue) {
            0x51,                                       //51                   push        ecx  
            //11:    unsigned long i;
            //12:    return _BitScanForward(&i, inValue) ? i : -1;
            0x0F, 0xBC, 0x44, 0x24, 0x08,               //0F BC 44 24 08       bsf         eax,dword ptr [esp+8] 
            0x89, 0x04, 0x24,                           //89 04 24             mov         dword ptr [esp],eax 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,               //B8 FF FF FF FF       mov         eax,-1               
            0x0F, 0x45, 0x04, 0x24,                     //0F 45 04 24          cmovne      eax,dword ptr [esp]
            0x59,                                       //59                   pop         ecx 
            //13: }
            0xC3,                                       //C3                   ret  
      });
   } else if(IntPtr.Size == 8){
      del = GenerateX86Function<BitScan32Delegate>( 
         //This code also will work for UInt64 bitscan.
         // But I have it limited to UInt32 via the delegate because UInt64 bitscan would fail in a 32bit dotnet process.  
            x86AssemblyBytes: new byte[13] {
            //15:    unsigned long i;
            //16:    return _BitScanForward64(&i, inValue) ? i : -1; 
            0x48, 0x0F, 0xBC, 0xD1,            //48 0F BC D1          bsf         rdx,rcx
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,      //B8 FF FF FF FF       mov         eax,-1 
            0x0F, 0x45, 0xC2,                  //0F 45 C2             cmovne      eax,edx  
            //17: }
            0xC3                              //C3                   ret 
         });
   }
   return del;
}))();


private static BitScan32Delegate _BitScanReverse32 = (new Func<BitScan32Delegate>(() => { //IIFE   
   BitScan32Delegate del = null;
   if(IntPtr.Size == 4){
      del = GenerateX86Function<BitScan32Delegate>(
         x86AssemblyBytes: new byte[20] {
            //18: int BitScanReverse(unsigned int inValue) {
            0x51,                                       //51                   push        ecx  
            //19:    unsigned long i;
            //20:    return _BitScanReverse(&i, inValue) ? i : -1;
            0x0F, 0xBD, 0x44, 0x24, 0x08,               //0F BD 44 24 08       bsr         eax,dword ptr [esp+8] 
            0x89, 0x04, 0x24,                           //89 04 24             mov         dword ptr [esp],eax 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,               //B8 FF FF FF FF       mov         eax,-1  
            0x0F, 0x45, 0x04, 0x24,                     //0F 45 04 24          cmovne      eax,dword ptr [esp]  
            0x59,                                       //59                   pop         ecx 
            //21: }
            0xC3,                                       //C3                   ret  
      });
   } else if(IntPtr.Size == 8){
      del = GenerateX86Function<BitScan32Delegate>( 
         //This code also will work for UInt64 bitscan.
         // But I have it limited to UInt32 via the delegate because UInt64 bitscan would fail in a 32bit dotnet process. 
            x86AssemblyBytes: new byte[13] {
            //23:    unsigned long i;
            //24:    return _BitScanReverse64(&i, inValue) ? i : -1; 
            0x48, 0x0F, 0xBD, 0xD1,            //48 0F BD D1          bsr         rdx,rcx 
            0xB8, 0xFF, 0xFF, 0xFF, 0xFF,      //B8 FF FF FF FF       mov         eax,-1
            0x0F, 0x45, 0xC2,                  //0F 45 C2             cmovne      eax,edx  
            //25: }
            0xC3                              //C3                   ret 
         });
   }
   return del;
}))();

In order to generate the assembly I started a new VC++ project, created the functions I wanted, then went to Debug-->Windows-->Disassembly. For compiler options I disabled inlining, enabled intrinsics, favored fast code, omitted frame pointers, disabled security checks and SDL checks. The code for that is:

#include "stdafx.h"
#include <intrin.h>  

#pragma intrinsic(_BitScanForward)  
#pragma intrinsic(_BitScanReverse) 
#pragma intrinsic(_BitScanForward64)  
#pragma intrinsic(_BitScanReverse64) 


__declspec(noinline) int _cdecl BitScanForward(unsigned int inValue) {
    unsigned long i;
    return _BitScanForward(&i, inValue) ? i : -1; 
}
__declspec(noinline) int _cdecl BitScanForward64(unsigned long long inValue) {
    unsigned long i;
    return _BitScanForward64(&i, inValue) ? i : -1;
}
__declspec(noinline) int _cdecl BitScanReverse(unsigned int inValue) {
    unsigned long i;
    return _BitScanReverse(&i, inValue) ? i : -1; 
}
__declspec(noinline) int _cdecl BitScanReverse64(unsigned long long inValue) {
    unsigned long i;
    return _BitScanReverse64(&i, inValue) ? i : -1;
}

Look at https://chessprogramming.wikispaces.com/BitScan for good info on bitscanning.

If you're able to mix assembly code then use the modern LZCNT, TZCNT and POPCNT processor commands.

Other than that take a look at Java's implementation for Integer.

/**
 * Returns the number of zero bits preceding the highest-order
 * ("leftmost") one-bit in the two's complement binary representation
 * of the specified {@code int} value.  Returns 32 if the
 * specified value has no one-bits in its two's complement representation,
 * in other words if it is equal to zero.
 *
 * <p>Note that this method is closely related to the logarithm base 2.
 * For all positive {@code int} values x:
 * <ul>
 * <li>floor(log<sub>2</sub>(x)) = {@code 31 - numberOfLeadingZeros(x)}
 * <li>ceil(log<sub>2</sub>(x)) = {@code 32 - numberOfLeadingZeros(x - 1)}
 * </ul>
 *
 * @param i the value whose number of leading zeros is to be computed
 * @return the number of zero bits preceding the highest-order
 *     ("leftmost") one-bit in the two's complement binary representation
 *     of the specified {@code int} value, or 32 if the value
 *     is equal to zero.
 * @since 1.5
 */
public static int numberOfLeadingZeros(int i) {
    // HD, Figure 5-6
    if (i == 0)
        return 32;
    int n = 1;
    if (i >>> 16 == 0) { n += 16; i <<= 16; }
    if (i >>> 24 == 0) { n +=  8; i <<=  8; }
    if (i >>> 28 == 0) { n +=  4; i <<=  4; }
    if (i >>> 30 == 0) { n +=  2; i <<=  2; }
    n -= i >>> 31;
    return n;
}