Tips for golfing in R

Some tips:

  1. In R, it's recommended to use <- over =. For golfing, the opposite holds since = is shorter...
  2. If you call a function more than once, it is often beneficial to define a short alias for it:

    as.numeric(x)+as.numeric(y)
    
    a=as.numeric;a(x)+a(y)
    
  3. Partial matching can be your friend, especially when functions return lists which you only need one item of. Compare rle(x)$lengths to rle(x)$l

  4. Many challenges require you to read input. scan is often a good fit for this (the user ends the input by entring an empty line).

    scan()    # reads numbers into a vector
    scan(,'') # reads strings into a vector
    
  5. Coercion can be useful. t=1 is much shorter than t=TRUE. Alternatively, switch can save you precious characters as well, but you'll want to use 1,2 rather than 0,1.

    if(length(x)) {} # TRUE if length != 0
    sum(x<3)         # Adds all the TRUE:s (count TRUE)
    
  6. If a function computes something complicated and you need various other types of calculations based on the same core value, it is often beneficial to either: a) break it up into smaller functions, b) return all the results you need as a list, or c) have it return different types of values depending on an argument to the function.

  7. As in any language, know it well - R has thousands of functions, there is probably some that can solve the problem in very few characters - the trick is to know which ones!

Some obscure but useful functions:

sequence
diff
rle
embed
gl # Like rep(seq(),each=...) but returns a factor

Some built-in data sets and symbols:

letters     # 'a','b','c'...
LETTERS     # 'A','B','C'...
month.abb   # 'Jan','Feb'...
month.name  # 'January','Feburary'...
T           # TRUE
F           # FALSE
pi          # 3.14...

  1. Instead of importing a package with library, grab the variable from the package using :: . Compare the followings:

    library(splancs);inout(...)
    splancs::inout(...)
    

    Of course, it is only valid if one single function is used from the package.

  2. This is trivial but a rule of thumb for when to use @Tommy's trick of aliasing a function: if your function name has a length of m and is used n times, then alias only if m*n > m+n+3 (because when defining the alias you spend m+3 and then you still spend 1 everytime the alias is used). An example:

    nrow(a)+nrow(b)     # 4*2 < 4+3+2
    n=nrow;n(a)+n(b)
    length(a)+length(b) # 6*2 > 6+3+2
    l=length;l(a)+l(b)
    
  3. Coercion as side-effect of functions:

    • instead of using as.integer, character strings can be coerced to integer using : :

      as.integer("19")
      ("19":1)[1] #Shorter version using force coercion.
      
    • integer, numeric, etc. can be similarly coerced to character using paste instead of as.character:

      as.character(19)
      paste(19) #Shorter version using force coercion.
      

Some very specific golfing tips:

  • if you need to extract the length of a vector, sum(x|1) is shorter than length(x) as long as x is numeric, integer, complex or logical.

  • if you need to extract the last element of a vector, it may be cheaper (if possible) to initialise the vector backwards using rev() and then calling x[1] rather than x[length(x)] (or using the above tip, x[sum(x|1)]) (or tail(x,1) --- thanks Giuseppe!). A slight variation on this (where the second-last element was desired) can be seen here. Even if you can't initialise the vector backwards, rev(x)[1] is still shorter than x[sum(x|1)] (and it works for character vectors too). Sometimes you don't even need rev, for example using n:1 instead of 1:n.

  • (As seen here). If you want to coerce a data frame to a matrix, don't use as.matrix(x). Take the transpose of the transpose, t(t(x)).

  • if is a formal function. For example, "if"(x<y,2,3) is shorter than if(x<y)2 else 3 (though of course, 3-(x<y) is shorter than either). This only saves characters if you don't need an extra pair of braces to formulate it this way, which you often do.

  • For testing non-equality of numeric objects, if(x-y) is shorter than if(x!=y). Any nonzero numeric is regarded as TRUE. If you are testing equality, say, if(x==y)a else b then try if(x-y)b else a instead. Also see the previous point.

  • The function el is useful when you need to extract an item from a list. The most common example is probably strsplit: el(strsplit(x,"")) is one fewer byte than strsplit(x,"")[[1]].

  • (As used here) Vector extension can save you characters: if vector v has length n you can assign into v[n+1] without error. For example, if you wanted to print the first ten factorials you could do: v=1;for(i in 2:10)v[i]=v[i-1]*i rather than v=1:10:for(...) (though as always, there is another, better, way: cumprod(1:10))

  • Sometimes, for text based challenges (particularly 2-D ones), it's easier to plot the text rather than cat it. the argument pch= to plot controls which characters are plotted. This can be shortened to pc= (which will also give a warning) to save a byte. Example here.

  • To take the floor of a number, don't use floor(x). Use x%/%1 instead.

  • To test if the elements of a numeric or integer vector are all equal, you can often use sd rather than something verbose such as all.equal. If all the elements are the same, their standard deviation is zero (FALSE) else the standard deviation is positive (TRUE). Example here.

  • Some functions which you would expect to require integer input actually don't. For example, seq(3.5) will return 1 2 3 (the same is true for the : operator). This can avoid calls to floor and sometimes means you can use / instead of %/%.

  • The most common function for text output is cat. But if you needed to use print for some reason, then you might be able to save a character by using show instead (which in most circumstances just calls print anyway though you forego any extra arguments like digits)

  • don't forget about complex numbers! The functions to operate on them (Re, Im, Mod, Arg) have quite short names which can occasionally be useful, and complex numbers as a concept can sometimes yield simple solutions to some calculations.

  • for functions with very long names (>13–15 characters), you can use get to get at the function. For example, in R 3.4.4 with no packages loaded other than the default, get(ls(9)[501]) is more economical than getDLLRegisteredRoutines. This can also get around source code restrictions such as this answer. Note that using this trick makes your code R-version-dependent (and perhaps platform dependent), so make sure you include the version in your header so it can be reproduced if necessary.

Tags:

R

Code Golf

Tips