# Statistics, Science, Random Ramblings

## A blog mostly about data and R

Something I came across today was that it is not directly possible to use `tidyr::spread()` on multiple columns.

In short, you need to `gather` everything together first, before you can then spread. Let’s do a quick walk-through to clarify:

``````library("tidyverse")
data(mtcars)
``````##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1``````

For this example we assume that we want to group the data by `cyl` and `gear` and then calculate the mean and standard deviation within the group.

``````mt_summary <- mtcars %>%
group_by(cyl, gear) %>%
summarise(mean_hp = mean(hp), sd_hp = sd(hp))
mt_summary``````
``````## # A tibble: 8 x 4
## # Groups:   cyl [3]
##     cyl  gear mean_hp sd_hp
##   <dbl> <dbl>   <dbl> <dbl>
## 1     4     3     97  NA
## 2     4     4     76  20.1
## 3     4     5    102  15.6
## 4     6     3    108.  3.54
## 5     6     4    116.  7.51
## 6     6     5    175  NA
## 7     8     3    194. 33.4
## 8     8     5    300. 50.2``````

Now let’s also assume that we want to transform the data that we have a column for mean sd for each level of gear. This sounds like a job for `tidyr::spread()`, but as mentioned it can only handle a single key/value pair, so we could spread either mean or sd.

Fortunately there is a workaround. First we need to gather the mean and sd columns:

``````mt_transformed <- gather(mt_summary, var, value, -c(cyl, gear))
mt_transformed``````
``````## # A tibble: 16 x 4
## # Groups:   cyl [3]
##      cyl  gear var      value
##    <dbl> <dbl> <chr>    <dbl>
##  1     4     3 mean_hp  97
##  2     4     4 mean_hp  76
##  3     4     5 mean_hp 102
##  4     6     3 mean_hp 108.
##  5     6     4 mean_hp 116.
##  6     6     5 mean_hp 175
##  7     8     3 mean_hp 194.
##  8     8     5 mean_hp 300.
##  9     4     3 sd_hp    NA
## 10     4     4 sd_hp    20.1
## 11     4     5 sd_hp    15.6
## 12     6     3 sd_hp     3.54
## 13     6     4 sd_hp     7.51
## 14     6     5 sd_hp    NA
## 15     8     3 sd_hp    33.4
## 16     8     5 sd_hp    50.2``````

Now we need to create our column names for each group:

``````mt_transformed <- unite(mt_transformed, tempcol, var, gear)
mt_transformed``````
``````## # A tibble: 16 x 3
## # Groups:   cyl [3]
##      cyl tempcol    value
##    <dbl> <chr>      <dbl>
##  1     4 mean_hp_3  97
##  2     4 mean_hp_4  76
##  3     4 mean_hp_5 102
##  4     6 mean_hp_3 108.
##  5     6 mean_hp_4 116.
##  6     6 mean_hp_5 175
##  7     8 mean_hp_3 194.
##  8     8 mean_hp_5 300.
##  9     4 sd_hp_3    NA
## 10     4 sd_hp_4    20.1
## 11     4 sd_hp_5    15.6
## 12     6 sd_hp_3     3.54
## 13     6 sd_hp_4     7.51
## 14     6 sd_hp_5    NA
## 15     8 sd_hp_3    33.4
## 16     8 sd_hp_5    50.2``````

And from here it is pretty straightforward:

``````mt_transformed <- spread(mt_transformed, tempcol, value)
mt_transformed``````
``````## # A tibble: 3 x 7
## # Groups:   cyl [3]
##     cyl mean_hp_3 mean_hp_4 mean_hp_5 sd_hp_3 sd_hp_4 sd_hp_5
##   <dbl>     <dbl>     <dbl>     <dbl>   <dbl>   <dbl>   <dbl>
## 1     4       97        76       102    NA      20.1     15.6
## 2     6      108.      116.      175     3.54    7.51    NA
## 3     8      194.       NA       300.   33.4    NA       50.2``````

Of course this can be chained using pipes, but to illustrate this workaround I did this step by step.

Future versions of tidyr will include `pivot` functions which appear to be more flexible than the `gather` and `spread` pair that is currently available.

## Concluding remarks

• You can not use `tidyr::spread` on multiple columns.
• However, using a workaround you can still achieve the same goal.
• Future versions of tidyr will include more flexible pivot functions.