Is it possible to enumerate computer programs?

Yes, there are ways that this is possible. A few months ago I made a little experimental project to do something like it, which works reasonably well considering what it's doing. You give it a type and some tests to pass, and it searches for a suitable function that passes the tests. I never put in the work to make it mature, but you can see that it does in fact eventually generate functions that pass the tests in the case of the examples. Requiring the type was an essential component for the practicality of this search -- it drastically reduced the number of programs that needed to be tried.

It is important to have a firm grasp of the theory here before making assertions about what is and is not possible -- there are many who will jump to the halting problem and tell you that it's impossible, when the truth is quite a bit more nuanced than that.

  • You can easily generate all syntactically valid programs in a given language. Naively, you can generate all strings and filter out the ones that parse/typecheck; but there are better ways.
  • If the language is not turing complete -- e.g. the simply-typed lambda calculus, or even something very powerful like Agda, you can enumerate all programs that generate 42, and given any program you can decide "this generates 42" or "this does not generate 42". (The language I used in my experimental project is in this class). The tension here is that in any such language, there will be some computable functions which you cannot write.
  • Even if the language is turing complete, you can still enumerate all programs which eventually generate 42 (do this by running them all in parallel, and reporting success as they finish).

What you cannot do to a turing complete language is decide that a program will definitely not ever generate 42 -- it could run forever trying, and you won't be able to tell whether it will eventually succeed until it does. There are some programs which you can tell will infinite loop, though, just not all of them. So if you have a table of programs, you might expect your enumerator program in the case of a non-turing complete language to be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6  |  P7  | ...
42?     | No   | No   | No   | Yes  | No   | No   | No   | ...

Whereas in a turing complete language, your output (at a given time) would be like this:

Program |  P1  |  P2  |  P3  |  P4  |  P5  |  P6    |  P7  | ...
42?     | No   | No   | Loop | Yes  | No   | Dunno  | No   | ...

And later that Dunno might turn into a Yes or a No, or it might stay dunno forever -- you just dunno.

This is all very theoretical, and actually generating programs in a turing complete language to search for ones that do a certain thing is pretty hard and takes a long time. Certainly you don't want to do it one by one, because the space is so big; you probably want to use a genetic search algorithm or something.

Another subtle point in the theory: talking about programs which "generate 42" is missing some important detail. Usually when we talk about generating programs, we want it to generate a certain function, not just output something specific. And this is when things you might want to do get more impossible. If you have an infinite space of possible inputs -- say, inputting a single number, then

  • You can't in general tell whether a program computes a given function. You can't check every input manually because infinity is too many to check. You can't search for proofs that your function does the right thing, because for any computable function f, for any axiom system A, there are programs that compute f such that A has no proof that they compute f.
  • You can't tell whether a program is going to give any kind of answer for every possible input. It might work for the first 400,000,000 of them but infinite loop on the 400,000,001st.
  • Similarly, you can't tell whether two programs compute the same function (i.e. same inputs -> same outputs).

There you have it, a daily dose of decidability theory phenomenology.

If you are interested in doing it for real, look into genetic algorithms, and put timeouts on the functions you try and/or use a decidable (non-turing-complete) language.


It is certainly possible to enumerate all programs in a given language that are syntactically valid (and in a statically typed language even only those that type check): You could simply enumerate all strings like you proposed, try to feed each one into a parser for the language and then reject those that can't be parsed. So if that's your definition of a valid program, then yes, it's possible.

It is however not true that your program would necessarily output a function that returns 42 eventually - even if you replace string with a function that only returns syntactically valid programs. If a returned function contains an infinite loop, it will run forever and thus your program will never get to try another function. Thus you might never reach a function that returns 42.

To avoid this issue, you might say that the string(n) function should only produce code that is syntactically valid and does not contain an infinite loop (while still being able to return all such functions). That, however, is not possible because that would entail deciding the Halting Problem (which is, of course, undecidable).