What's the difference between substitute and quote in R

Here's an example that may help you to easily see the difference between quote() and substitute(), in one of the settings (processing function arguments) where substitute() is most commonly used:

f <- function(argX) {
   list(quote(argX), 
        substitute(argX), 
        argX)
}
    
suppliedArgX <- 100
f(argX = suppliedArgX)
# [[1]]
# argX
# 
# [[2]]
# suppliedArgX
# 
# [[3]]
# [1] 100

R has lazy evaluation, so the identity of a variable name token is a little less clear than in other languages. This is used in libraries like dplyr where you can write, for instance:

summarise(mtcars, total_cyl = sum(cyl))

We can ask what each of these tokens means: summarise and sum are defined functions, mtcars is a defined data frame, total_cyl is a keyword argument for the function summarise. But what is cyl?

> cyl
Error: object 'cyl' not found

It isn't anything! Well, not yet. R doesn't evaluate it right away, but treats it as an expression to be parsed later with some parse tree that is different than the global environment your command line is working in, specifically one where the columns of mtcars are defined. Somewhere in the guts of dplyr, something like this is happening:

> substitute(cyl, mtcars)
[1] 6 6 4 6 8 ...

Suddenly cyl means something. That's what substitute is for.

So what is quote for? Well sometimes you want your lazily-evaluated expression to be represented somewhere else before it's evaluated, i.e. you want to display the actual code you're writing without any (or only some) values substituted. The docs you quoted explain this is common for "informative labels for data sets and plots".

So, for example, you could create a quoted expression, and then both print the unevaluated expression in your chart to show how you calculated and actually calculate with the expression.

expr <- quote(x + y)
print(expr) # x + y
eval(expr, list(x = 1, y = 2)) # 3

Note that substitute can do this expression trick also while giving you the option to parse only part of it. So its features are a superset of quote.

expr <- substitute(x + y, list(x = 1))
print(expr) # 1 + y
eval(expr, list(y = 2)) # 3

Maybe this section of the documentation will help somewhat:

Substitution takes place by examining each component of the parse tree as follows: If it is not a bound symbol in env, it is unchanged. If it is a promise object, i.e., a formal argument to a function or explicitly created using delayedAssign(), the expression slot of the promise replaces the symbol. If it is an ordinary variable, its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged.

Note the final bit, and consider this example:

e <- new.env()
assign(x = "a",value = 1,envir = e)
> substitute(a,env = e)
[1] 1

Compare that with:

> quote(a)
a

So there are two basic situations when the substitution will occur: when we're using it on an argument of a function, and when env is some environment other than .GlobalEnv. So that's why you particular example was confusing.

For another comparison with quote, consider modifying the myplot function in the examples section to be:

myplot <- function(x, y)
    plot(x, y, xlab = deparse(quote(x)),
             ylab = deparse(quote(y)))

and you'll see that quote really doesn't do any substitution.

Tags:

R