Enumerate valid Brainf**k programs

Python 3, 443 158 155 154 134 131 128 124 117 116 115 bytes

c=d=C=D=0
for e in input():v='[<>,.-+]'.find(e);d=d*8+v;c+=c<0<6<v;c-=d>1>v;C,D=(c,C+1,d,D)[v>6::2]
print(-~D*8**C)

Several bytes thanks to Sp3000 and Mitch Schwartz :D

How this works:

This maps all valid BF programs into all possible, valid or invalid, BF programs, that don't start with a [, in a one-to-one ratio. After that, the new program is simply converted into octal.

Here is the mapping formula:

  1. Separate a BF program into 3 parts. The first part is the largest prefix consisting of only [ characters. The third part is the largest postfix consisting of only ] characters. The second part is the middle.
  2. Dispose of the first part. These can be recomputed later.
  3. Remove all ] brackets in the third part that match [ brackets in the second part. These can also be recomputed later.
  4. Concatenate the second and third parts together.

If you don't understand this explanation, you can find an extended explanation in chat starting here.

For reference, here are the first 20 programs:

1 : 
2 : <
3 : >
4 : ,
5 : .
6 : -
7 : +
8 : []
9 : <[]
10 : <<
11 : <>
12 : <,
13 : <.
14 : <-
15 : <+
16 : [<]
17 : >[]
18 : ><
19 : >>
20 : >,

Here are the first 1000 programs: http://pastebin.com/qykBWhmD
Here is the program I used to generate them: http://ideone.com/e8oTVl

Here is Hello, World!:

>>> ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
457711481836430915510337664562435564418569135809989841510260388418118348571803953323858180392373

Python 2, 157 bytes

def f(s,o=0,d=0,D={}):T=s,o,d;x=D[T]=D[T]if T in D else~o and 0**o+sum(f(s[1:],cmp(c,"[")%-3-~o,d or cmp(c,s[0]))for c in"+,-.<>[]")if s else~d<0==o;return+x

Still looks pretty golfable, but I'm posting this for now. It uses recursion with a bit of caching. Annoyingly, D.get doesn't short circuit for the caching, so I can't save 9 bytes that way...

The mapping prioritises length first, then lexicographical order over the ordering "][><.-,+" (see output examples below). The main idea is to compare prefixes.

The variable o keeps track of the number of [ brackets still open for the current prefix, while the variable d takes one of three values indicating:

  • d = 1: The current prefix is lexicographically earlier than s. Add all programs with this prefix and length <= s,
  • d = -1: The current prefix is lexicographically later than s. Add all programs with this prefix and length < s.
  • d = 0: The current prefix is a prefix of s, so we might change d to 1 or -1 later.

For example, if we have s = "[-]" and our current prefix is p = "+", since p is later than s lexicographically we know only to add the programs starting with p which are strictly shorter than s.

To give a more detailed example, suppose we have an input program s = "-[]". The first recursive expansion does this:

  (o == 0)               # Adds a program shorter than s if it's valid
                         # For the first expansion, this is 1 for the empty program
+ f(s[1:], o=-1, d=1)    # ']', o goes down by one due to closing bracket
+ f(s[1:], o=1, d=1)     # '[', o goes up by one due to opening bracket
+ f(s[1:], o=0, d=1)     # '>'
+ f(s[1:], o=0, d=1)     # '<'
+ f(s[1:], o=0, d=1)     # '.', d is set to 1 for this and the previous branches
                         # since they are lexicographically earlier than s's first char
+ f(s[1:], o=0, d=0)     # '-', d is still 0 since this is equal to s's first char
+ f(s[1:], o=0, d=-1)    # ',', d is set to -1 for this and the later branches
                         # since they are lexicographically later than s's first char
+ f(s[1:], o=0, d=-1)    # '+'

Note how we don't actually use the prefixes in the recursion - all we care about them is captured through the variables d, o and the shrinking input program s. You'll notice a lot of repetition above - this is where caching comes in, allowing us to process 100-char programs well within the time limit.

When s is empty, we look at (d>=0 and o==0), which decides whether to return 1 (count this program because it's lexicographically early/equal and the program is valid), or 0 (don't count this program).

Any situtation with o < 0 immediately returns 0, since any programs with this prefix have more ]s than [, and are thus invalid.


The first 20 outputs are:

 1
> 2
< 3
. 4
- 5
, 6
+ 7
[] 8
>> 9
>< 10
>. 11
>- 12
>, 13
>+ 14
<> 15
<< 16
<. 17
<- 18
<, 19
<+ 20

Using the same Hello World example as @TheNumberOne's answer:

>>> f("++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.")
3465145076881283052460228065290888888678172704871007535700516169748342312215139431629577335423L

Python 2, 505 (not golfed)

I enjoyed developing this approach, but I may not bother golfing it because it isn't competitive compared with other approaches. I'm posting it for diversity's sake and possible aesthetic interest. It involves recursion and a bit of math.

F={0:1}

def f(n):
    if n not in F:
        F[n]=6*f(n-1) + sum(f(i)*f(n-2-i) for i in range(n-1))

    return F[n]

def h(x):
    if x=='': return 0

    if len(x)==1: return '+-<>,.'.find(x)

    if x[0]!='[':
        return h(x[0]) * f(len(x)-1) + h(x[1:])

    d=i=1
    while d:
        if x[i]==']': d-=1
        elif x[i]=='[': d+=1
        i+=1

    a=i-2
    b=len(x)-i

    return 6*f(a+b+1) + sum(f(i)*f(a+b-i) for i in range(a)) + h(x[1:i-1]) * f(b) + h(x[i:])

def g(x):
    return sum(f(i) for i in range(len(x))) + h(x) + 1

print g(raw_input())

The function f(n) counts the number of valid brainfuck programs of length n. h(x) maps programs of length n to [0..f(n)-1], and g(x) is the bijective ranking function in question.

The main idea is that a non-empty program can either start with [ or with one of the 6 non-[] characters. In the former case, we can iterate over the possible locations of the matching ] and recurse on the enclosed part and on the tail (where tail means the substring following the ]). In the latter case, we can recurse on the tail (where tail means drop the first character). This reasoning can be used both for counting and for computing rank.

Shorter programs will always have lower rank than longer programs, and the bracket pattern is a secondary determining factor. The non-[] characters are sorted according to "+-<>,." (which is arbitrary).

For example with n=4 we have these cases:

zxxx
[]xx
[x]x
[xx]

where z stands for non-[] character and x stands for any character, under the restriction that the ] has to match the initial [. Programs are ranked according to that order, and recursively on the x subsections, with the left section prioritised over the right section in the latter cases. The rank calculation is similar to mixed-radix numeral systems, and f is important for computing the current "radix".