Recently, I did a lot of work with data stored in lists. Lists in R are pretty useful as they allow you to store data of different types in a single object, where there are pretty much no limits what kinds of data you put in there. In my case it was mostly large data frames resulting from a series of processing steps on external data. Ultimately, the data needed for specific analyses was put into data frames, however on the way to building those data frames I noted that I applied a set of patterns on lists over and over again. For example putting the name of the list items as column into the data frames or binding some of those data frames together.
Thus, I decided to write a package to make my life a bit easier.
Is it the first package for doing things with lists? Probably not and there is
for example some overlap with purrr
;
but it was fun to write and also a nice learning experience aiming to build a
package that others could use as well.
This included putting some effort into writing documentation and not making
too many assumptions around the use cases.
The result is the listr
package, which at the moment is only available
from my gitlab page as version 0.0.1.
As the version number suggests there are still rough edges and most
likely bugs, but the core functionality is there.
I aim to submit the package to CRAN at some point in the future when it is
more polished.
During the last two months I used the available version in the real world and found it quite useful, so I am confident that I will continue to expand the package in the future.
listr
library("listr")
data("penguins", package = "palmerpenguins")
p <- split(penguins, penguins$island)
One of the main ideas of listr
is using tidyselect
for interacting
with lists, meaning you can apply all those nifty patterns you like
from using tidyverse
functions.
p |>
list_rename(i1 = Biscoe, i2 = Dream) |>
list_select(starts_with("i"))
## $i1
## # A tibble: 168 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # … with 158 more rows, and 2 more variables: sex <fct>, year <int>
##
## $i2
## # A tibble: 124 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Dream 39.5 16.7 178 3250
## 2 Adelie Dream 37.2 18.1 178 3900
## 3 Adelie Dream 39.5 17.8 188 3300
## 4 Adelie Dream 40.9 18.9 184 3900
## 5 Adelie Dream 36.4 17 195 3325
## 6 Adelie Dream 39.2 21.1 196 4150
## 7 Adelie Dream 38.8 20 190 3950
## 8 Adelie Dream 42.2 18.5 180 3550
## 9 Adelie Dream 37.6 19.3 181 3300
## 10 Adelie Dream 39.8 19.1 184 4650
## # … with 114 more rows, and 2 more variables: sex <fct>, year <int>
That is of course a toy example, but it should demonstrate the use well enough.
The other main idea behind the package is that it is pipe-friendly. Pipes have become quite popular in R and so it was not a hard decision to make all functions in the package work with pipes by simply making all functions expect the data to work on as first argument.
p |>
list_insert("penguins are cool", 2, name = "random_text") |>
list_select(1, 2)
## $Biscoe
## # A tibble: 168 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # … with 158 more rows, and 2 more variables: sex <fct>, year <int>
##
## $random_text
## [1] "penguins are cool"
I probably need to find a good dataset that I can bundle with the package to build better examples.
The package does contain wrappers around do.call("rbind", ...)
and
do.call("cbind", ...)
, something which I use very often with lists.
p |>
list_bind(1, 2, name = "biscoe_and_dream")
## $biscoe_and_dream
## # A tibble: 292 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## * <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # … with 282 more rows, and 2 more variables: sex <fct>, year <int>
##
## $Torgersen
## # A tibble: 52 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 42 more rows, and 2 more variables: sex <fct>, year <int>
The default is to bind rows and to keep the elements that were bound together in the list. I found myself oftentimes applying a pattern like the following:
p |>
list_bind(everything()) |>
list_extract(1)
## # A tibble: 344 × 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## * <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Biscoe 37.8 18.3 174 3400
## 2 Adelie Biscoe 37.7 18.7 180 3600
## 3 Adelie Biscoe 35.9 19.2 189 3800
## 4 Adelie Biscoe 38.2 18.1 185 3950
## 5 Adelie Biscoe 38.8 17.2 180 3800
## 6 Adelie Biscoe 35.3 18.9 187 3800
## 7 Adelie Biscoe 40.6 18.6 183 3550
## 8 Adelie Biscoe 40.5 17.9 187 3200
## 9 Adelie Biscoe 37.9 18.6 172 3150
## 10 Adelie Biscoe 40.5 18.9 180 3950
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
Which probably should get its own wrapper function in a future version of
listr
.
The next version will probably focus on some primarily cosmetic changes.
One thing are the error messages from tidyselect
, which as of now refer
to columns when for example an element in the list does not exist.
Most likely I will also introduce a class for nicer and more compact printing
of lists, similar to how tbl_df
look nicer than raw data frames.
In the meantime feel free to try the package with
devtools::install_gitlab("choh/listr")
.