Finding the total number of set-bits from 1 to n

If n= 2^k-1, then F(n)=k*(n+1)/2

For a general n, let m be the largest number such that m = 2^k-1 and m<=n. F(n) = F(m) + F(n-m-1) + (n-m).

Corner condition: F(0)=0 and F(-1)=0.


consider the below:

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

If you want to find total number of set bits from 1 to say 14 (1110) Few Observations:

  1. 0th bit (LSB) 1 bit appears once every two bit (see vertically) so number of set bits = n/2 +(1 if n's 0th bit is 1 else 0)
  2. 1st bit : 2 consecutive 1s appear every four bits (see 1st bit vertically along all numbers) number of set bits in 1st bit position = (n/4 *2) + 1 (since 1st bit is a set, else 0)
  3. 2nd bit: 4 consecutive 1s appear every 8 bits ( this one is a bit tricky) number of set bits in 2nd position = (n/8*4 )+ 1( since 2nd bit is set, else 0) + ((n%8)%(8/2)) The last term is to include the number of 1s that were outside first (n/8) group of bits (14/8 =1 considers only 1 group ie. 4 set bits in 8 bits. we need to include 1s found in last 14-8 = 6 bits)
  4. 3rd bit: 8 consecutive 1s appear every 16 bits (similar to above) number of set bits in 3rd position = (n/16*8)+1(since 3rd bit is set, else 0)+ ((n%16)%(16/2))

so we do O(1) calculation for each bit of a number n. a number contains log2(n) bits. so when we iterate the above for all positions of n and add all the set bits at each step, we get the answer in O(logn) steps


The way to solve these sorts of problems is to write out the first few values, and look for a pattern

Number  binary   # bits set   F(n)
1       0001     1            1
2       0010     1            2
3       0011     2            4
4       0100     1            5
5       0101     2            7
6       0110     2            9
7       0111     3            12
8       1000     1            13
9       1001     2            15
10      1010     2            17
11      1011     3            20
12      1100     2            22
13      1101     3            25
14      1110     3            28
15      1111     4            32

It takes a bit of staring at, but with some thought you notice that the binary-representations of the first 8 and the last 8 numbers are exactly the same, except the first 8 have a 0 in the MSB (most significant bit), while the last 8 have a 1. Thus, for example to calculate F(12), we can just take F(7) and add to it the number of set bits in 8, 9, 10, 11 and 12. But that's the same as the number of set-bits in 0, 1, 2, 3, and 4 (ie. F(4)), plus one more for each number!

#    binary
0    0 000
1    0 001
2    0 010
3    0 011
4    0 100
5    0 101
6    0 110
7    0 111

8    1 000  <--Notice that rightmost-bits repeat themselves
9    1 001     except now we have an extra '1' in every number!
10   1 010
11   1 011
12   1 100

Thus, for 8 <= n <= 15, F(n) = F(7) + F(n-8) + (n-7). Similarly, we could note that for 4 <= n <= 7, F(n) = F(3) + F(n-4) + (n-3); and for 2 <= n <= 3, F(n) = F(1) + F(n-2) + (n-1). In general, if we set a = 2^(floor(log(n))), then F(n) = F(a-1) + F(n-a) + (n-a+1)


This doesn't quite give us an O(log n) algorithm; however, doing so is easy. If a = 2^x, then note in the table above that for a-1, the first bit is set exactly a/2 times, the second bit is set exactly a/2 times, the third bit... all the way to the x'th bit. Thus, F(a-1) = x*a/2 = x*2^(x-1). In the above equation, this gives us

F(n) = x*2x-1 + F(n-2x) + (n-2x+1)

Where x = floor(log(n)). Each iteration of calculating F will essentially remove the MSB; thus, this is an O(log(n)) algorithm.

Tags:

Algorithm