Statistics, Science, Random Ramblings

A blog about data and other interesting things

Tidy evaluation in your own functions

Posted at — Aug 30, 2019

In this post I will elaborate a bit on how to pass variables to functions and making them work with the tidyverse family of packages.

If you have been using R during the past few years you have probably at least heard of the popular tidyverse family, including packages such as dplyr, tidyr and also ggplot2.

One feature they share is that they accept bare variable names to identify columns in data frames, like so:

library("tidyverse")
data("Orange")
Orange %>% 
    group_by(Tree) %>% 
    summarise(mean = mean(circumference))
## # A tibble: 5 x 2
##   Tree   mean
##   <ord> <dbl>
## 1 3      94  
## 2 1      99.6
## 3 5     111. 
## 4 2     135. 
## 5 4     139.

Generating Errors

The above works well enough for interactive use, but what happens if we want to do this inside of a function, so we don’t have to write repetitive code?

group_mean <- function(data, group, meancol) {
    data %>% 
        group_by(group) %>% 
        summarise(mean = mean(meancol))
}

group_mean(Orange, Tree, circumference)
## Error: Column `group` is unknown

This throws an error, so maybe try with quotes?

group_mean(Orange, "Tree", "circumference")
## Error: Column `group` is unknown

This does not work as well. If you want to use tidyverse functions or other functions that make use of the so-called tidy evaluation inside of your own functions, then you will need to make use of quasiquotation. This requires some extra work, which is essentially the trade-off for the convenient interactive use the tidyverse provides.

Depending on the age of the code and whether you want to pass bare names or strings to your function, there are several ways to handle the situation.

Curly, Curly

The easiest way to handle tidyverse functions inside your own functions is since rlang 0.4.0 the {{ (curly, curly) operator. Here you pass bare names into a function and surround them with double curly braces when needed:

group_mean_cc <- function(data, group, meancol) {
    data %>% 
        group_by({{group}}) %>% 
        summarise(mean = mean({{meancol}}))
}

group_mean_cc(Orange, Tree, circumference)
## # A tibble: 5 x 2
##   Tree   mean
##   <ord> <dbl>
## 1 3      94  
## 2 1      99.6
## 3 5     111. 
## 4 2     135. 
## 5 4     139.

The operator takes care of all the handling of the variable names for you, so it is pretty straightforward to use.

Bang, Bang

The rlang package version 0.4.0 is somewhat recent and you will likely see the older two-step variant using the !! (bang, bang) operator in front of a quosure in existing code and examples online. First, you transform your variable name to a quosure and then unquote it with double exclamation marks:

group_mean_q <- function(data, group, meancol) {
    group_q <- enquo(group)
    meancol_q <- enquo(meancol)
    
    data %>% 
        group_by(!!group_q) %>% 
        summarise(mean = mean(!!meancol_q))
}

group_mean_q(Orange, Tree, circumference)
## # A tibble: 5 x 2
##   Tree   mean
##   <ord> <dbl>
## 1 3      94  
## 2 1      99.6
## 3 5     111. 
## 4 2     135. 
## 5 4     139.

This leads to the same result, but is more wordy. You can also put the enquo call directly into a function, leading to a construct like !!enquo(group). But for functions that are a bit complex this might make the code harder to understand; generally, I prefer assigning the quosures to their own variables.

Strings

From time to time you also want to pass strings into a functions. Passing strings to tidyverse functions also requires two steps. First you transform the string to a symbol using sym and then you unquote it when needed with !!:

group_mean_s <- function(data, group, meancol) {
    group_s <- sym(group)
    meancol_s <- sym(meancol)
    
    data %>% 
        group_by(!!group_s) %>% 
        summarise(mean = mean(!!meancol_s))
}

group_mean_s(Orange, "Tree", "circumference")
## # A tibble: 5 x 2
##   Tree   mean
##   <ord> <dbl>
## 1 3      94  
## 2 1      99.6
## 3 5     111. 
## 4 2     135. 
## 5 4     139.

Here, strings were passed to the function and again we got the desired result.

Concluding remarks

  • When passing names to tidyverse functions use bare names and {{.
  • When having to pass strings, use sym and !!.
  • There is probably a lot of code out there using enquo and !!, so you should at least be aware of it.
  • Generally if you work a lot with tidyverse functions, the rlang package provides a lot of useful tools and you should have a look at it.
  • You should also give the tidy evaluation manual a read, as it covers pretty much everything there is to know regarding tidy evaluation.