Is publishing runnable code instead of pseudo code shunned?

There are cases where real code is preferable, and cases where pseudocode is preferable. You shouldn't rely on a simple iron rule, but rather on judgement of what is appropriate to the situation.

Some things to consider:

Programming languages come and go. In the 60s, Fortran was considered a really nice and readable programming language, much easier to read than Assembly. But if you'd written an article using Fortran code samples instead of peudocode, it would be harder to read for us now. Right now, Python looks pretty good to us, but will it still look that good in the future? If I handed you a piece of Python code with the following code in it:

a = 3 / 2

What is the value of a? Is it 1 or 1.5? Because Python 2 and 3 handle integer division differently. Now, I've gone more or less native programming in Python 3, so I actually had to look up which of the division operations in Python 2 and 3 are different and which aren't. Just to show you, using real code in a paper may reduce the shelf life of your paper.

Pseudocode lets you abstract stuff away In pseudocode you can just state something like:

WHILE stopping criterion not reached DO
    (stuff)

And then later on in your paper you can argue about different possible stopping criteria for your algorithm. You could do that with actual Python code too, but the result would be basically that you're twisting your Python code to do what pseudocode does by nature.

You can be pretty standardized in pseudocode Just use the various algorithm typesetting options for LaTeX.

You can use mathematical notation in pseudocode Using mathematical set notation is a lot more universal than relying on all of your readers understanding Python set operations. Consider:

a = set([1, 2, 3])
b = set([1, 2])
c = 1 if b.issubset(a) else 0

versus

A ← {1, 2, 3}
B ← {1, 2}
C = 1 if B ⊆ A 
    0 otherwise

Someone not familiar with Python looking at the first example will be wondering: is [1, 2, 3] a ... maybe a list? Well the list [1, 2, 3] isn't the same as the list [1, 2], so the set A and the set B contain different elements so B can't be a subset of A.

Algorithms vs. implementations Suppose in 2019 you write an interesting algorithm in Python 3 using some state of the art libraries. In 2025 I come up with an alternative algorithm for the same problem in Go and want to compare performance to prove that yours is better. To get a fair comparison, I'm going to have to implement my algorithm in Go or yours in Python. Suppose by then nobody uses Python for high performance stuff anymore because Go does it better. (It might, I dunno.) Now I have to go research the seven year old libraries you used to find out exactly what functions you used and which Go functions are equivalent to them. That's very hard. So quite likely, the Go implementation I make of your algorithm won't be all that good. And big surprise! My algorithm benchmarks better than yours!

Now instead if you'd used an implementation-independent description of your algorithm, things might turn out better for your publication.

So the two big disadvantages of using real code are: it limits the shelf life of your publication, and you reach a smaller audience.

So when should you use real code?

When the topic of your paper is not the algorithm, but the implementation or the programming language. Maybe you're trying to show that Python is a really good language in which to solve problem X because with libraries Y and Z you get an easy and efficient implementation.
It's absolutely encouraged to also publish your real code as an appendix or, better yet, in a repository where people can download your code and suggest improvements. A nontrivial algorithm is probably too big to copy by typing by hand or even copy-pasting out of a paper anyway. As soon as you start getting into something like a new Deep Learning algorithm, you're probably looking at multiple files or even nested packages.

In my research, I often write algorithms, which may contains statements like:

Find a dominant subspace of a given hermitian matrix A with relative accuracy Ɛ.
Find a nonnegative solution of this system of equations / inequalities.
Sort these eigenvalues from large to small in modulus, discard small ones and reshuffle the eigenvectors accordingly.

These instructions are perfectly simple and clear to anyone doing Numerical Linear Algebra, regardless of their favourite programming language. I see no reason to use a particular programming language in the paper and risk capping my readership. I believe that the natural language is faster to read and understand. Particularly, it allows me to talk clearly about what is the purpose of each step, rather than about how to achieve it. There are often more than one way to, e.g. "solve a linear system", and the use of pseudo-code allows me to distinguish between "solve it somehow" and "solve it using this particular algorithm". Hence, I use pseudo-code which allows me to better express nuances like this.

I always provide actual code alongside the paper in an open repository, which readers can clone and explore, saving them a bother of retyping the code from the pdf / printed version of the paper.

Pseudo-code is forever; real languages change all the time.

If you'd published a paper with an algorithm in Python in the Python 2 days there is a significant possibility that the "executable" code that you wrote then will no longer operate correctly if people run it under the latest release, even in less dramatic cases the advance of new libraries and algorithms is likely to leave readers confused as to your archaic choices. Imagine if papers from 40 years ago had used the languages of the day; would you understand the subtleties of some Fortran or Pascal code? Programmers then made the same claims for understandability you make about Python.

So pseudo-code is better because will clearly express the key ideas just as well in a hundred years.

Pseudo-code expresses the intent of code better than real languages

In order to write working code I must, nearly always, carry out a number of steps in order to get the program to work that are not required in order to set things up, format things or whatever, these steps can be ignored or simplified in pseudo-code in order to communicate the important information.

What's more, in real languages I must make decisions about how data is stored, which algorithm is adopted for sorting, etc. that are incidental to the algorithm discussed in the paper. By including these incidental choices in your description of the algorithm you guide implementers towards making choices that may not be optimal, either because better choices are now available or because the choices you made are unsuited for their target environment.

Is publishing runnable code instead of pseudo code shunned?

Tags:

Formatting

Computer Science

Publications

Code

Related

Recent Posts