Count the number of Hamming distance sequences

Haskell, score 9

import Data.Bits
import Data.List
import qualified Data.IntSet as S

main = mapM_ (print . S.size . S.fromList . hd) [1..9]

hd :: Int -> [Int]
hd n = [foldl' (\s e->s*m+e) 0 [popCount $ xor p (shiftR t i .&. m)|i<-[(0::Int)..n-1]] | let m=2^n-1,p<-[(0::Int)..2^n-1],t<-[(0::Int)..2^(2*n-1)-1]]

Compile with -O3. It takes 6min35s on my 6 year old laptop hardware to run up to n=9, so maybe it's under 5min on the reference hardware.

> time ./113785
2
9
48
297
2040
15425
125232
1070553
9530752

real  6m35.763s
user  6m27.690s
sys   0m5.025s

JavaScript, score 10

findHamming = m => { 
    if (m < 2) return 2;
    let popcnt = x => x && (x & 1) + popcnt(x >> 1);
    let a = [...Array(1 << m)].map((_,i) => popcnt(i));
    let t = 0;
    let n = (1 << m) - 1;
    for (let c = 0; c <= m; c++) {
        for (let g = 0; g <= c; g++) {
            let s = new Set;
            for (let i = 1 << m; i--; ) {
                for (let j = 1 << (m - 1); j--; ) {
                    if (a[i ^ j + j] != c) continue;
                    for (let k = 1 << m - 1; k--; ) {
                        if (a[i ^ k] != g) continue;
                        let f = j << m | k;
                        let h = 0;
                        for (l = m - 1; --l; ) h = h * (m + 1) + a[i ^ f >> l & n];
                        s.add(h);
                    }
                }
            }
            t += s.size * (g < c ? 2 : 1);
        }
    }
    return t;
};
let d = Date.now(); for (let m = 1; m < 11; m++) console.log(m, findHamming(m), Date.now() - d);

Explanation: Calculating n=10 is difficult because there are over two billion pairs and over 26 billion potential sequences. In order to speed things up I split the calculation up into 121 bins. Because the sequences are invariant under bitwise complement, I can assume without loss of generality that the middle bit of T is zero. This means that I can determine the first and last elements of the sequence independently from the the top n-1 and bottom n-1 bits of T. Each bin corresponds to a different pair of first and last elements; I then loop through all the possible sets of top and bottom bits that correspond to each bin and calculate the remaining elements of the sequence, finally counting the unique sequences for each bin. It then remains to total all 121 bins. Originally taking 45 hours, this now completed in a little under three and a half minutes on my AMD FX-8120. Edit: Thanks to @ChristianSievers for a 50% speedup. Full output:

1 2 0
2 9 1
3 48 1
4 297 2
5 2040 7
6 15425 45
7 125232 391
8 1070553 1844
9 9530752 15364
10 86526701 153699

C++, score 10 11

This is a translation of @Neil's answer into C++, with some simple parallelization. n=9 completes in 0.4 seconds, n=10 in 4.5 seconds, and n=11 in approximately 1 minute on my 2015 Macbook Pro. Also, thanks to @ChristianSievers. Due to his comments on @Neil's answer, I noticed some additional symmetries. From the original 121 buckets (for n=10), to 66 buckets when accounting for reversal, I've gotten down to just 21 buckets.

#include <iostream>
#include <cstdint>
#include <unordered_set>
#include <thread>
#include <future>
#include <vector>

using namespace std;

constexpr uint32_t popcnt( uint32_t v ) {
    uint32_t c = v - ( ( v >> 1 ) & 0x55555555 );
    c = ( ( c >> 2 ) & 0x33333333 ) + ( c & 0x33333333 );
    c = ( ( c >> 4 ) + c ) & 0x0F0F0F0F;
    c = ( ( c >> 8 ) + c ) & 0x00FF00FF;
    c = ( ( c >> 16 ) + c ) & 0x0000FFFF;
    return c;
}

template<uint32_t N>
struct A {
    constexpr A() : arr() {
        for( auto i = 0; i != N; ++i ) {
            arr[i] = popcnt( i );
        }
    }
    uint8_t arr[N];
};

uint32_t n = ( 1 << M ) - 1;
constexpr auto a = A < 1 << M > ();

uint32_t work( uint32_t c, uint32_t g, uint32_t mult ) {
    unordered_set<uint64_t> s;
    // Empirically derived "optimal" value
    s.reserve( static_cast<uint32_t>( pow( 5, M ) ) );

    for( int i = ( 1 << M ) - 1; i >= 0; i-- ) {
        for( uint32_t j = 1 << ( M - 1 ); j--; ) {
            if( a.arr[i ^ j + j] != c ) {
                continue;
            }

            for( uint32_t k = 1 << ( M - 1 ); k--; ) {
                if( a.arr[i ^ k] != g ) {
                    continue;
                }

                uint64_t f = j << M | k;
                uint64_t h = 0;

                for( uint32_t l = M - 1; --l; ) {
                    h = h * ( M + 1 ) + a.arr[i ^ ( f >> l & n )];
                }

                s.insert( h );
            }
        }
    }

    return s.size() * mult;

}

int main() {
    auto t1 = std::chrono::high_resolution_clock::now();

    if( M == 1 ) {
        auto t2 = std::chrono::high_resolution_clock::now();
        auto seconds = chrono::duration_cast<chrono::milliseconds>( t2 - t1 ).count() / 1000.0;
        cout << M << ": " << 2 << ", " << seconds << endl;
        return 0;
    }

    uint64_t t = 0;
    vector<future<uint32_t>> my_vector;

    if( ( M & 1 ) == 0 ) {
        for( uint32_t c = 0; c <= M / 2; ++c ) {
            for( uint32_t g = c; g <= M / 2; ++g ) {
                uint32_t mult = 8;

                if( c == M / 2 && g == M / 2 ) {
                    mult = 1;
                } else if( g == c || g == M / 2 ) {
                    mult = 4;
                }

                my_vector.push_back( async( work, c, g, mult ) );
            }

        }

        for( auto && f : my_vector ) {
            t += f.get();
        }

    } else {
        for( uint32_t c = 0; c <= ( M - 1 ) / 2; ++c ) {
            for( uint32_t g = c; g <= M - c; ++g ) {
                uint32_t mult = 4;

                if( g == c || g + c == M ) {
                    mult = 2;
                }

                my_vector.push_back( async( work, c, g, mult ) );
            }

        }

        for( auto && f : my_vector ) {
            t += f.get();
        }

    }

    auto t2 = std::chrono::high_resolution_clock::now();
    auto seconds = chrono::duration_cast<chrono::milliseconds>( t2 - t1 ).count() / 1000.0;
    cout << M << ": " << t << ", " << seconds << endl;
    return 0;
}

Use the following script to execute the code:

#!/usr/bin/env bash

for i in {1..10}
do
    clang++ -std=c++14 -march=native -mtune=native -Ofast -fno-exceptions -DM=$i hamming3.cpp -o hamming
    ./hamming
done

The output was as follows: (The format is M: result, seconds)

1: 2, 0
2: 9, 0
3: 48, 0
4: 297, 0
5: 2040, 0
6: 15425, 0.001
7: 125232, 0.004
8: 1070553, 0.029
9: 9530752, 0.419
10: 86526701, 4.459
11: 800164636, 58.865

n=12 took 42 minutes to calculate on a single thread, and gave a result of 7368225813.