In this post I will elaborate a bit on how to pass variables to functions and making them work with the tidyverse family of packages.
If you have been using R during the past few years you have probably at least
heard of the popular tidyverse family, including packages such as dplyr
,
tidyr
and also ggplot2
.
One feature they share is that they accept bare variable names to identify columns in data frames, like so:
library("tidyverse")
data("Orange")
Orange %>%
group_by(Tree) %>%
summarise(mean = mean(circumference))
## # A tibble: 5 x 2
## Tree mean
## <ord> <dbl>
## 1 3 94
## 2 1 99.6
## 3 5 111.
## 4 2 135.
## 5 4 139.
The above works well enough for interactive use, but what happens if we want to do this inside of a function, so we don’t have to write repetitive code?
group_mean <- function(data, group, meancol) {
data %>%
group_by(group) %>%
summarise(mean = mean(meancol))
}
group_mean(Orange, Tree, circumference)
## Error: Column `group` is unknown
This throws an error, so maybe try with quotes?
group_mean(Orange, "Tree", "circumference")
## Error: Column `group` is unknown
This does not work as well. If you want to use tidyverse functions or other functions that make use of the so-called tidy evaluation inside of your own functions, then you will need to make use of quasiquotation. This requires some extra work, which is essentially the trade-off for the convenient interactive use the tidyverse provides.
Depending on the age of the code and whether you want to pass bare names or strings to your function, there are several ways to handle the situation.
The easiest way to handle tidyverse functions inside your own functions is
since rlang
0.4.0 the {{ (curly, curly) operator.
Here you pass bare names into a function and surround them with double curly
braces when needed:
group_mean_cc <- function(data, group, meancol) {
data %>%
group_by({{group}}) %>%
summarise(mean = mean({{meancol}}))
}
group_mean_cc(Orange, Tree, circumference)
## # A tibble: 5 x 2
## Tree mean
## <ord> <dbl>
## 1 3 94
## 2 1 99.6
## 3 5 111.
## 4 2 135.
## 5 4 139.
The operator takes care of all the handling of the variable names for you, so it is pretty straightforward to use.
The rlang package version 0.4.0 is somewhat recent and you will likely see
the older two-step variant using the !!
(bang, bang) operator in front of a
quosure in existing code and examples online.
First, you transform your variable name to a quosure and then unquote it with
double exclamation marks:
group_mean_q <- function(data, group, meancol) {
group_q <- enquo(group)
meancol_q <- enquo(meancol)
data %>%
group_by(!!group_q) %>%
summarise(mean = mean(!!meancol_q))
}
group_mean_q(Orange, Tree, circumference)
## # A tibble: 5 x 2
## Tree mean
## <ord> <dbl>
## 1 3 94
## 2 1 99.6
## 3 5 111.
## 4 2 135.
## 5 4 139.
This leads to the same result, but is more wordy.
You can also put the enquo
call directly into a function, leading to a
construct like !!enquo(group)
. But for functions
that are a bit complex this might make the code harder to understand;
generally, I prefer assigning the quosures to their own variables.
From time to time you also want to pass strings into a functions.
Passing strings to tidyverse functions also requires two steps.
First you transform the string to a symbol using sym
and then you unquote
it when needed with !!
:
group_mean_s <- function(data, group, meancol) {
group_s <- sym(group)
meancol_s <- sym(meancol)
data %>%
group_by(!!group_s) %>%
summarise(mean = mean(!!meancol_s))
}
group_mean_s(Orange, "Tree", "circumference")
## # A tibble: 5 x 2
## Tree mean
## <ord> <dbl>
## 1 3 94
## 2 1 99.6
## 3 5 111.
## 4 2 135.
## 5 4 139.
Here, strings were passed to the function and again we got the desired result.
sym
and !!
.enquo
and !!
, so
you should at least be aware of it.rlang
package
provides a lot of useful tools and you should have a look at it.