Mathematica style guide?

I think this is a very relevant question as I think it is agreed standard that having "a" coding styleguide for every project where several people write code is a very good (inevitable?) thing. It also seems to be agreed that it is more important to have a styleguide/standard than how excatly that looks like. I also am convinced that especially for Mathematica there are many details which should be handled differently for different kinds of projects and teams.

Thus instead of giving just an example of another style convention I think it makes more sense to write up a list of things that such a guidline could/should address. It would then be a second step to fill these entries with content (or probably avoid some) and probably every team/project wants to have their own details. I would prefer to not fill in specific suggestions for each entry here (too much danger of nonagreement), if people think it would make sense to work on a "mathematica stack exchange users" suggestion there is the other wiki answer from Szabolcs which could be used for that. Of course such a list will never be complete, and for some entries it might be open to debate whether they are relevant at all. I made this list a community wiki and invite everyone to contribute. My suggestion is to not delete entries which one thinks are not relevant but only give some pro/con arguments for them.

Use of Tools

It might make sense to make requirements about which tools to use or not use, there a plenty of possibilities to write, develop, document and test mathematica code. It certainly is good to have a convention about that. Possible decisions include:

  • use of frontend, workbench, text-editors, other IDEs (e.g. the Mathematica IDEAS plughin) for code development
  • use of internal or external tools to write/run tests
  • use of version control system and which
  • use of external tools for e.g. documentation

of course not all of these are independent, it is known that notebooks are not working welll together with version control systems, so making use of the latter might influence the decision about whether to use the frontend (or more precisely notebook files for code) or not...

File/Code Organisation

Use of File Formats

  • use of notebooks or packages for source code
  • use of notebooks or other formats for documentation
  • file formats for data that is relevant for the project (e.g. csv vs. excel)

Organization of Project/Source-Code Directory

  • define directory layout and which content should go where
  • modularisation of code:
    • how much content per file: one function/symbol definition per file,
    • how many lines are typically acceptable per function, per file,...
    • under which conditions are exceptions from the above acceptable?
    • use of extra directories vs. just extra package files for subpackages
    • use and naming of public/private contexts for subpackages
  • use of Protect and other Attributes for symbols.

Naming Conventions

Directory/File Names

  • require restrictions so that package files can be loaded with Needs
  • uppercase/camelcase/... conventions for directories and filenames
  • use of "-","_", " ",... in (non-package) filenames
  • use of file extensions, upper-/lower-case

Symbol Names

  • upper vs. lower CamelCase, allow/suggest just lower case
  • allow non-ascii characters in symbol names or not? if yes, restrict to subset like e.g. greek letters?
  • make naming depend on symbol purpose and content? If yes:
    • use verbs for symbols used as functions, nouns for symbols used as variables
    • use of singular vs. plural for lists (number[[idx]] vs. numbers[[idx]]), or other conventions as numberArray[[x]]
    • conventions for e.g. variables used as loop counters, flags, ...
    • use of mathematica like xxxQ functions vs. isXxx as used in many other languages
    • use a leading $ to indicate use of a global variable.
    • all uppercase names for constant (wide use in other languages, but does anyone use that in Mathematica?)
    • allow single letter symbol names or not

Option Names

all of the conventions made for symbol names need to be made here, not necessary with the same outcome. Additionally:

  • use of strings vs. symbols for option names

Documentation

  • prefer inline documentation with (**) or extra text cells/lines before/after relevant (function) definitions
  • require usage messages, probably at least stubs for auto completion
  • have more detailed explanation in extra files (e.g. mathematical background, preliminary experiments etc.)

Code Layout

Use of Shortcuts, Parentheses and Such

Mathematica code could theoretically be written in FullForm and a team with a strong lisp background might actually prefer that. But it is full of shortcuts and many of them help to make code more readable, but with exagerated use of shortcuts Mathematica code can look like perl oneliner contest examples which would make good comic curse strings. It certainly makes sense to give some guidelines about use of such shortcuts:

  • avoid or prefer shortcuts in general?
  • white- and blacklists for shortcuts
  • define conditions under which shortcuts are to be used. (e.g. I often use /@ when the resulting expression fits in a line and no additional parenthesis are required but otherwise I prefer an explicit Map with my standard convention for indenting and linebreaks).
  • it often makes sense to write parentheses even when they are not strictly necessary, so it might be relevant to define when paretheses are allowed/required/forbidden or to be replaced by code which doesn't need them (e.g. ()& vs. Function[]).

Line Breaks and Indenting

  • where to put line breaks
    • for function definitions put linebreak after := or not
    • extra linebreak before closing ] and } or not
  • where to put spaces, where not
    • after , in list of arguments
    • inbetween operators like +, -, =
  • use standard form cells with automatic indentation or input form cells / pure text with manual indentation
  • how much indentation
  • use tabs or spaces for indentation

Constructs Preference/Shunning

Mathematica is a very "rich" language and there are litteraly hunderts of ways to achieve the same thing. It might make sense to require certain standard solutions or preferences of certain constructs to help team members to easier understand other members code, e.g.:

  • looping constructs: e.g. favour Do vs. For, favour non-indexing constructs like Map and Scan vs. their indexing counterparts Table / Do
  • preferences of "paradigms" e.g. pattern matching vs. functional vs. procedural styles. e.g.: Replace[result,$Failed:>(Message[...];Throw[...]) vs. showMessageIfFailed[result]; vs. If[result===$Failed,Message[...]]
  • use of pure functions (many of them nested are hard to read/understand)
  • f=Function[x,x^2] vs. f=#^2& vs. f[x]:=x^2
  • restrict use of symbols to those available to certain Mathematica versions.
  • object/data representation: Association, Dataset, list of rules (and again: symbol or string keys?), matrix/list with positional meaning, custom head denoting an object, ManagedLibraryExpression

Everyone will have their own preferences about coding style. This is especially true for Mathematica, as most work done in this language is interactive, and until recently there was relatively little open collaboration between people that could have led to the development of standards. The existence of this site (Mathematica.SE) helped make a big progress in this area.

Let's try to collect a few guidelines which are already commonly followed in the online Mathematica community.

Naming things

  • The only characters allowed in symbol names are alphanumeric characters and $. This naturally leads to using Camel Case for names.

  • When developing packages meant to be used by others, use fully spelt out, descriptive names for public symbols.

  • When doing interactive work, use only names starting with lowercase, e.g. findAllRoots.

  • When writing packages, use only capitalized names for public symbols, e.g. FindAllRoots. However, for symbols private to the package use lowercase.

  • Start the names of constants or flags with a $ sign. This is typically used for global variables that control in some way how the system works, e.g. $MaxExtraPrecision.

When there's more than one way to write something

  • Instead of f[g[h[x]]], write f@g@h[x] for readability. Instead of f[{a,b,c}], write f@{a,b,c}.

  • If you have a purely stylistic choice between = and :=: use = for variable definitions and := for function definitions.

  • When evaluation has side effects, prefer DownValues to OwnValues. I.e., use randomNumber[] := RandomReal[] instead of randomNumber := RandomReal[].

  • For procedural loops, prefer Do over For


Let me try with a few simple ("obvious"?) style guidelines I try to follow:

  • Use meaningful names that are spelled out or that use widely-adopted abbreviations from the field of application.
  • Begin names with lower-case letters (except when they're going into a Package for others' use) and then use camelCasing.
  • Avoid nesting functions too deeply with use of brackets; instead, try to use the built-in special input forms, e.g., /@ for Map, @@ for Apply, prefix form with @, and (when it's semantically appropriate) postfix form with //.
  • Intermix text cells containing documentation with input cells containing code.
  • Define ::usage strings for functions to be used by others, or for functions whose syntax you may readily forget.