Generating all permutations efficiently

Here's a NumPy solution that builds the permutations of size m by modifying the permutations of size m-1 (see more explanation further down):

def permutations(n):
    a = np.zeros((np.math.factorial(n), n), np.uint8)
    f = 1
    for m in range(2, n+1):
        b = a[:f, n-m+1:]      # the block of permutations of range(m-1)
        for i in range(1, m):
            a[i*f:(i+1)*f, n-m] = i
            a[i*f:(i+1)*f, n-m+1:] = b + (b >= i)
        b += 1
        f *= m
    return a

Demo:

>>> permutations(3)
array([[0, 1, 2],
       [0, 2, 1],
       [1, 0, 2],
       [1, 2, 0],
       [2, 0, 1],
       [2, 1, 0]], dtype=uint8)

For n=10, the itertools solution takes 5.5 seconds for me while this NumPy solution takes 0.2 seconds.

How it proceeds: It starts with a zero-array of the goal size, which already contains the permutations for range(1) at the top-right (I "dotted out" the other parts of the array):

[[. . 0]
 [. . .]
 [. . .]
 [. . .]
 [. . .]
 [. . .]]

Then it turns that into the permutations of range(2):

[[. 0 1]
 [. 1 0]
 [. . .]
 [. . .]
 [. . .]
 [. . .]]

And then into the permutations of range(3):

[[0 1 2]
 [0 2 1]
 [1 0 2]
 [1 2 0]
 [2 0 1]
 [2 1 0]]

It does so by filling the next-left column and by copying/modifying the previous block of permutations downwards.


Update: faster version

This solution is about 5x faster than my original answer, due to avoiding unnecessary computation. This approach builds an array by expanding from one corner, using the same basic idea explained in superb rain's answer, but is much faster since it avoids unnecessary work.

def faster_permutations(n):
    # empty() is fast because it does not initialize the values of the array
    # order='F' uses Fortran ordering, which makes accessing elements in the same column fast
    perms = np.empty((np.math.factorial(n), n), dtype=np.uint8, order='F')
    perms[0, 0] = 0

    rows_to_copy = 1
    for i in range(1, n):
        perms[:rows_to_copy, i] = i
        for j in range(1, i + 1):
            start_row = rows_to_copy * j
            end_row = rows_to_copy * (j + 1)
            splitter = i - j
            perms[start_row: end_row, splitter] = i
            perms[start_row: end_row, :splitter] = perms[:rows_to_copy, :splitter]  # left side
            perms[start_row: end_row, splitter + 1:i + 1] = perms[:rows_to_copy, splitter:i]  # right side

        rows_to_copy *= i + 1

    return perms

Timings on my machine with n=11:

faster_permutations():                          0.12 seconds 
permutations() [superb rain's approach]:        1.44 seconds
permutations() with memory order optimization:  0.62 seconds

Original Answer:

Based on superb rain's answer, this is a faster version with more efficient memory access patterns:

def fast_permutations(n):
    a = np.zeros((n, np.math.factorial(n)), np.uint8)
    f = 1
    for m in range(2, n + 1):
        b = a[n - m + 1:, :f]  # the block of permutations of range(m-1)
        for i in range(1, m):
            a[n - m, i * f:(i + 1) * f] = i
            a[n - m + 1:, i * f:(i + 1) * f] = b + (b >= i)
        b += 1
        f *= m
    return a.T

This is essentially the transpose of superb rain's version. It's more efficient because the memory locations accessed are closer together.

On my machine, it's about 2x as fast as the original version (0.05 seconds vs 0.12 seconds for n=10).