Knowing when the code is good enough, for research

As the

code I write is almost always only used by myself

you only have yourself to please. So the only sensible answer is to balance things to minimize the frustration to yourself. Don't do things you see as frustratingly slowing you down until you are more frustrated by the problems caused by not doing them.

However, the more strategic question is whether you want to continue writing code that will only be used and seen by yourself. If you want to advance your career through collaboration with peers, or by building a group of researchers to work on and with your codes, then it will be better to do more earlier.

You then know you've found a good balance when the collaborators you want to work with still want to work with you after they've spent some time with your code.


You said:

The code I write is almost always only used by myself

but I find this remark a little short-sighted.

In the moment, code may be used almost exclusively by the author/student-researcher, but eventually that student will graduate, and there may be some follow-on research to do. Woe to the next graduate student who inherits that mess, or the summer intern who gets hired to clean it up!

There are plenty of good reasons to invest a little time in writing better-engineered code:

  • You (and others who may join the research team) will be able to further build upon a well-designed architecture
  • What seems “self-documenting” now may be harder-to-follow than you think six months from now
  • A lot of code gets ultimately gets used for longer than the original author speculated
  • The more complex your code becomes, the harder it will be to debug later

Sure, there’s a tradeoff between immediacy of results and building for the future. But I’d cautioned against getting lulled into a false sense of security and churning out code as fast as you can with little regard to future maintainability and extensibility.


As you would with any other document, write for your audience. That is, write your code to make it help, as best as possible (yet still at reasonable cost), those who are reading it. Consider:

  1. Will anybody other than you be reading the code, now or in the future? If it's a one-off script that you will quickly discard you need not worry much about readability. If nobody else is likely ever to look at it, you need to think about how likely your future self is to remember what was going on at the time and understand the conventions you used.

  2. If others are likely to read your code, how technically adept with that particular language and libraries, and in general, are they likely to be? Can you use standard conventions used by programmers experienced in that language? Do you need write things in a way that experienced developers who don't know that particular language can quickly understand what you're doing? Do you need to avoid standard things (e.g., list comprehensions) that inexperienced developers might not understand well?

  3. How well will the other developers understand the problem domain? Do you need to provide a detailed explanation of what you're doing and why you're doing it, or do you expect that when they look at a line they'll, e.g., immediately see that it's a form of Black Scholes equation and understand how and why it's being used there?

Keep in mind that making things more descriptive and adding more comments does not necessarily make your code more readable; for some audiences that may even make it less readable. Just as mathematicians can more quickly comprehend a dense equation than a long description of the equation written in "plain English," experienced developers familiar with a domain will take considerably longer to read verbose code written for a beginner than they will tight code written for an expert that communicates exactly what is needed and no more. As an extreme example, read this:

def double_a_number(number_to_be_doubled):
    ''' Given a number, this will return twice that number.
        Parameters:
            number_to_be_doubled: the number to be doubled
        Return Value:
            The doubled number.
    '''
    return number_to_be_doubled * 2

and then tell me if you think anybody would find it easier to understand than:

def double(x): return 2*x

Also be careful not to fall into the classic trap affecting inexperienced (and, sadly, even many experienced) developers of putting slavish devotion to rules ahead of the readability that those rules were supposed to promote. "Inconsistent" variable names and the like are not necessarily a problem (except for whomever is writing your lint configuration files) if your focus is on communicating with other developers rather than satisfying arbitrary rules.

And my last bit of personal advice, since you mention docstrings, always remember that a comment is only there because you couldn't write clear enough code. If you feel a function needs a docstring, look first to see if you can't improve the function name, change positional parameters to named parameters, or use any other techniques to make the code state more clearly what it's about. (And if someone's telling you to put in docstrings anyway, refer them to my example above.)

Tags:

Code