Granular versus terse coding

My path to prefer granularity

This is probably more an extended comment and a complementary answer to an excellent one by Anton. What I want to say is that for a long time, I had been thinking exactly along Mr.Wizard's lines. Mathematica makes it so easy to glue transformations together (and keep them readable and understandable!), that there is a great temptation to always code like that. Going to extreme granularity may seem odd and actually wrong.

What changed my mind almost a decade ago was a tiny book by Roger Sessions called Reusable data structures for C. In particular, his treatment of linked lists, although all other things he did were also carrying that style. I was amazed by the level of granularity he advocated. By then, I've produced and / or studied several other implementations for the same things, and was sure one can't do better / easier. Well, I was wrong.

What I did realize by that time was that once you've written some code, you can search for repeated patterns and try to factor them out - and as long as you do that reasonably well, you follow the DRY principle, avoid code duplication and everything is fine. What I didn't realize was that often, when you go to even more granular code, dissecting pieces that you may initially consider inseparable, you suddenly see that your code has a hidden inner structure which can be expressed even more economically with those smaller blocks. This is what Sessions has repeatedly and profoundly demonstrated throughout his book, and it was an important lesson for me.

Since then, I started actively looking for smaller bricks in my code (in a number of languages. While I mostly answer Mathematica questions, I wrote reasonably large volumes of production code also in Java, C, javascript and Python), and more often than not, I was finding them. And almost in all cases, going more granular was advantageous, particularly in the long term, and particularly when the code you write is only a smaller part of a much larger code base.

My reasons to prefer granularity

Now, why is that? Why I think that granular code is very often a superior approach? I think, there are a few reasons. Here are some that come to mind

Conceptual advantages

  • It helps to conceptually divide code into pieces which for me make sense by themselves, and which I view as parts deserving their own mental image / name.

    More granular functions, when the split of a larger chunk of code is done correctly, represent inner "degrees for freedom" in your code. They expose the ideas behind the code, and the core elements which combine to give you a solution, more clearly.

    Sure, you can see that also in a single chunk of code, but less explicitly. In that case, you have to understand the entire code to see what is supposed to be the input for each block, just to understand how it is supposed to work. Sometimes that's Ok, but in general this is an additional mental burden. With separate functions, their signatures and names (if chosen well) help you with that.

  • It helps to separate abstraction levels. The code combined from granular pieces reads like DSL code, and allows me to grasp the semantics of what is being done easier.

    To clarify this point, I should add that when your problem is a part of a larger code base, you often don't recall it (taken separately) as clearly as when it is a stand-alone problem - simply because most of such functions solve problems which only make sense given a larger context. Smaller granular functions make it easier for me to reconstruct that context locally without reading all the big code again.

  • It is often more extensible

    This is so because I can frequently add more functionality by overloading some of the granular functions. Such extension points are just not visible / not easily possible in the terse / monolithic approach.

  • It often allows one to reveal certain (hidden) inner structure, cross-cutting concerns, and new generalization points, and this leads to significant code simplifications.

    This is particularly so when we talk not about a single function, but about several functions, forming a larger block of code. It frequently happens that when you split one of the functions into pieces, you then notice that other functions may reuse those components. This sometimes allows one to discover a new cross-cutting concern in code, which was previously hidden. Once it is discovered, one can make efforts to factor it from the rest of the code and make it fully orthogonal. Again, this is not something that is observable on the level of a single function.

  • It allows you to easily create many more combinations

    This way you can get solutions to similar (perhaps somewhat different) problems, without the need to dissect your entire code and rewrite it all. For example, if I had to change the specific way the random walk in that example was set up, I only had to change one tiny function - which I can do without thinking about the rest.

Practical advantages

  • It is easier to understand / recall after a while

    Granular code, at least for me, is easier to understand, when you come to it after a while, having forgotten the details of it. I may not remember exactly what was the idea behind the solution (well-chosen names help here), as well as which data structures were involved in each transformation (signatures help here). It also helps when you read someone else's code. Again, this is particularly true for larger code bases.

  • More granular functions are easier to test in isolation.

    You can surely do that with the parts of a single function too, but it is not as straightforward. This is particularly true if your functions live in a package and are parts of a larger code base.

  • I can better protect such code from regression bugs

    Here I mean the bugs coming from changes not propagated properly through entire code (such as changes of types / number of arguments for some functions), since I can insert argument checks and post-conditions easier. When some wrong / incomplete change is made, the code breaks in a controlled, predictable and easy-to-understand fashion. In many ways, this approach complements unit tests, code basically tests itself.

  • It makes debugging much simpler. This is true because:

    • Functions can throw inner exceptions with the detailed information where the error occurred (see also previous point)
    • I can access them easier in running code, even when they are in packages.

      This is actually often a big deal, since it is one thing to run and test a tiny function, even private one, and it is another thing to deal with a larger and convoluted function. When you work on the running code, and have no direct access to the source (such that you can easily reload an isolated function), the smaller the function is that you may want to test, the easier it is.

  • It makes creating workarounds, patches, and interactions with other code much easier. This I have experienced myself a lot.

    • Making patches and workarounds.

      It often happens that you don't have access to the source, and have to change the behavior of some block of functionality at runtime. Being able to just simply overload or Block a small function is so much better than having to overload or redefine huge pieces of code, without even knowing what you may break by doing so.

    • Integrating your functionality with code that does not have a public extension API

      The other, similar, issue is when you want to interact with some code (for example, make some of its functions work with your data types and be overloaded on them). It is good if that other code has an API designed for extensions. But if not, you may for example use UpValues to overload some of those functions. And there, having such granular functions as hooks really saves the day. In such moments, you really feel grateful for the other person who wrote their code in a granular fashion. This happened to me more than once.

Implications for larger programs

There surely isn't a single "right" way to structure code. And you may notice, that in most of the answers I post here on M SE, I do not follow the granularity principle to the extreme. One important thing to realize here is that the working mode where one solves a very particular problem is very different from the working mode when one is constructing, extending and / or maintaining larger code bases.

The whole ability to glue together things insanely fast works against you in the long term, if your code is large. This is a road to writing so-called write-only code, and for software development that is a road to hell. Perl is notorious for that - which was the reason why lots of people switched to Python from Perl despite the unquestionable power of Perl. Mathematica is similar, because it shares with Perl the property that there are typically a large number of ways to solve any given problem.

Put another way, the Mathematica language is very reusable, but that doesn't mean that it is very easy to create reusable code with it. It is easy to create the code that solves any particular problem fast, but that's not the same thing. Smaller granularity I view as an idiomatic (in Mathematica) way to improve reusability. What I wanted to stress was that reusability comes from the right separation of concerns, factoring out different pieces. It is obvious for the larger volumes of code, but I think this is no less true also for smaller functions.

When we typically solve some problem in Mathematica, we don't have reusability in mind all that much, since our context is usually confined to that particular problem. In such a case, reusability is a foreign concept and gets in the way. My guess is that you like terse code because it brings you to the solution most economically. But when / if you want to solve many similar problems most economically, then you will notice that, if you list all your solutions, and compare those, your terse code for all of them will contain repeated pieces which however are wired into particular solutions, and there won't be an easy way to avoid that redundancy unless you start making your code more granular.

My conclusions

So, this really boils down to a simple question: do you need to solve some very specific problem, or do you want to construct a set of bricks to solve many similar problems. It is somewhat an art to decide for each particular case, and this can not be decided without a bigger context / picture in mind. If you are sure that you just need to solve a particular problem, then going to extreme granularity is probably an overkill. If you anticipate many similar problems, then granularity offers advantages.

It so happens that large code bases frequently automate a lot of similar things, rather than solve a single large problem. This is true even for programs like compilers, which do solve a single large problem, but in reality lots of sub-problems will reuse the same core set of data structures. So, I was particularly advocating granularity in the context of development of large programs - and I would agree that for solving some particular very specific problem, making it too granular might result in too much of a mental overhead. Of course, that also greatly depends on personal habits - mine have been heavily influenced in recent years by dealing with larger chunks of code.


This is a nicely formulated and stimulating question. The answer of course is "Mu". The comment of István Zachar is among those lines... (Here is the official "Mu" link.)

Usually when we cannot answer a question in the terms it is asked we extend the paradigm. (E.g. coming up with complex numbers. Or getting enlightened. Or bringing straw men.) So, what follows is my view of how the opposed in the question perspectives can be resolved. For simplicity of the explanations I both over-simplify and over-exaggerate.

The names "Leonid" and "Mr. Wizard" are used to designate code producing agents as described in the question. Those names should not be seen as literal references to the programming activities of the corresponding persons.

Diagnosing the dichotomy

  1. In Object-Oriented Programming (OOP) languages the key concept is reuse. We do not want to have the domino effect when changing parts of the code because of new understanding of the problem domain the code operates in. That is why we encapsulate behavior in objects, etc. We can say that modeling in OOP is with an entity centric language: the entities are objects and they exchange messages. Leonid's breakdown of functions can be seen to come from similar considerations (with a structural programming flavor).

  2. In Functional Programming (FP) languages we have the ability -- and we are enticed -- to refine our code iteratively until we derive essentially a new language that fits the problem domain of our code. FP provides modeling with a verb centric language. The valency of the verbs can be used to make new verbs, etc. Fundamental understanding of the problem domain brings the derivation of new verbs. Mr. Wizard's coding style comes from that perspective.

  3. A person would prefer one or the other approach based on person's native language and scientific, engineering, or mathematical background and the audience. For example, chemists would prefer OOP (and flow charts) because they study processes of matter changing states. A mathematician might select different styles when communicating through code to (i) other mathematicians in academia, or (ii) software engineers in large companies.

  4. The two approaches should not be just contrasted but seen as dual. For example, consider a graph G in which the nodes are entities and the edges are messages exchanged by the entities. This graph corresponds to OOP. Now consider the line graph L(G) in which the edges of G become nodes and the nodes of G become edges. That L(G) graph corresponds to FP.

  5. Leonid does FP by using L(G) i.e. by seeing functions as objects. Mr. Wizard does FP using G but seeing edge paths not vertices and links between them.

Proposed cure

One way to address the opposing forces of the two perspectives is to factor out the reuse concern from the brevity and simplicity of implementation concern.

One way do this is to have a Domain Specific Language (DSL) that is easy to use and understand. Say, a DSL close to the natural language of the intended users. The DSL will bring what Leonid wants, the interpreter of the DSL would bring code what Mr. Wizard wants. (And if done right Mr. Wizard might enjoy reading it.)

This can be facilitated with different tools. Below I demonstrate the use of Functional Parsers and the OOP Design Pattern Interpreter.

The code in this answer of mine illustrates fairly well "a DSL close to the natural language of the intended users" for programming tasks.

Example

Below is (another) simple example that shows separated and reusable formulation in Leonid's manner, and coding in Mr. Wizard's manner. (Also see this answer of mine about DSLs design and application in Mathematica.)

  • The coding style of Leonid is used to come up with the EBNF grammar.

  • The coding style of Mr. Wizard is used in both the EBNF grammar rules formulation and under the hood of the parser generation implementation. For the former see the functions applied at the end of some of the grammar rules, after the characters "<@". For the latter see the article "Functional Parsers" by Fokker -- it is exactly in the style Mr. Wizard likes. (My implementation FunctionalParsers.m is messier since it was written with an older Mathematica version.)

enter image description here

To be clear, this example shows that the two approaches can be both used by a higher abstraction layer. Mr. Wizard's approach can be tucked in into an interpreter and Leonid's approach would require users to program with clear natural language commands that can be decomposed into/within context free grammars. Mr. Wizard's approach facilitates the parsing and interpretation of those grammars, but otherwise is not visible to the users.

Conclusion

From the above it is clear that with the right attitude, automated tools, and smart computers Mr. Wizard's perspective becomes obsolete to humans, and Leonid's perspective endures, it bubbles up to the end users.

(Sorry if this conclusion seems too pessimistic. In my defense we increasingly live in a world that is all about outsourcing, offshoring, automation, and "the robots are coming"...)

Tags:

Coding Style