What is the difference between . and .data?
. variable comes from
magrittr, and is related to pipes. It means "the value being piped into this expression". Normally with pipes, the value from a previous expression becomes argument 1 in the next expression, but this gives you a way to use it in some other argument.
.data object is special to
dplyr (though it is implemented in the
rlang package). It does not have any useful value itself, but when evaluated in the
dplyr "tidy eval" framework, it acts in many ways as though it is the value of the dataframe/tibble. You use it when there's ambiguity: if you have a variable with the same name
foo as a dataframe column, then
.data$foo says it is the column you want (and will give an error if it's not found, unlike
data$foo which will give
NULL). You could alternatively use
.env$foo, to say to ignore the column and take the variable from the calling environment.
.env are specific to
dplyr functions and others using the same special evaluation scheme, whereas
. is a regular variable and can be used in any function.
Edited to add: You asked why
names(.data) didn't work. If @r2evans excellent answer isn't enough, here's a different take on it: I suspect the issue is that
names() isn't a
dplyr function, even though
names.rlang_fake_data_pronoun is a method in
rlang. So the expression
names(.data) is evaluated using regular evaluation instead of tidy evaluation. The method has no idea what dataframe to look in, because in that context there isn't one.
Up front, I think
.data's intent is a little confusing until one also considers its sibling pronoun,
. is something that
magrittr::%>% sets up and uses; since
dplyr re-exports it, it's there. And whenever you reference it, it is a real object, so
nrow(.), etc all work as expected. It does reflect data up to this point in the pipeline.
.data, on the other hand, is defined within
rlang for the purpose of disambiguating symbol resolution. Along with
.env, it allows you to be perfectly clear on where you want a particular symbol resolved (when ambiguity is expected). From
?.data, I think this is a clarifying contrast:
disp <- 10 mtcars %>% mutate(disp = .data$disp * .env$disp) mtcars %>% mutate(disp = disp * disp)
However, as stated in the help pages,
.env) is just a "pronoun" (we have verbs, so now we have pronouns too), so it is just a pointer to explain to the tidy internals where the symbol should be resolved. It's just a hint of sorts.
So your statement
.datajust mean "our result up to this point in the pipeline."
is not correct:
. represents the data up to this point,
.data is just a declarative hint to the internals.
Consider another way of thinking about
.data: let's say we have two functions that completely disambiguate the environment a symbol is referenced against:
get_internally, this symbol must always reference a column name, it will not reach out to the enclosing environment if the column does not exist; and
get_externally, this symbol must always reference a variable/object in the enclosing environment, it will never match a column.
In that case, translating the above examples, one might use
disp <- 10 mtcars %>% mutate(disp = get_internally(disp) * get_externally(disp))
In that case, it seems more obvious that
get_internally is not a frame, so you can't call
names(get_internally) and expect it to do something meaningful (other than
NULL). It'd be like
So don't think of
.data as an object, think of it as a mechanism to disambiguate the environment of the symbol. I think the
$ it uses is both terse/easy-to-use and absolutely-misleading: it is not a
environment-like object, even if it is being treated as such.
BTW: one can write any S3 method for
$ that makes any classed-object look like a frame/environment:
`$.quux` <- function(x, nm) paste0("hello, ", nm, "!") obj <- structure(0, class = "quux") obj$r2evans #  "hello, r2evans!" names(obj) # NULL
(The presence of a
$ accessor does not always mean the object is a frame/env.)
On a theoretical level:
. is the magrittr pronoun. It represents the entire input (often a data frame when used with dplyr) that is piped in with
.data is the tidy eval pronoun. Technically it is not a data frame at all, it is an evaluation environment.
On a practical level:
. will never be modified by dplyr. It remains constant until the next piped expression is reached. On the other hand,
.data is always up to date. That means you can refer to previously created variables:
mtcars %>% mutate( cyl2 = cyl + 1, am3 = .data[["cyl2"]] + 10 )
And you can also refer to column slices in the case of a grouped data frame:
mtcars %>% group_by(cyl) %>% mutate(cyl2 = .data[["cyl"]] + 1)
If you use
.[["cyl"]] instead, the entire data frame will be subsetted and you will get an error because the input size is not the same as the group slice size. Tricky!