Statistics, Science, Random Ramblings

A blog mostly about data and R

Check if a package is installed in R

Posted at — Oct 23, 2019

Recently, I asked myself whether there is a really good way to check whether some specific package is installed in R when you are not building a package yourself. For packages this is easy as you put your dependencies in the DESCRIPTION file and then you are good to go. But what do you do with regular data analysis projects that are not necessarily packaged? It turns out there are a few options, but they all have their ups and downs.

library()

The most basic way to load a package is by calling library("package"). This also ensures that a package is actually available as this will fail when a package is not available.

library("ggplot2")  # loads the package
library("ggplot3")  # fails 
## Error in library("ggplot3"): there is no package called 'ggplot3'

In the latter case an error is thrown, so you need to install the package to run the code. However, the main disadvantage of using this method to ensure that a package is installed at the beginning of an analysis is that it pollutes the namespace quite dramatically and you might run into namespace conflicts (thus, you might want to use :: to avoid errors).

require()

The main difference between library and require is that the latter one does not give an error if a package is not available but a warning and returns a boolean value.

is_ggplot2_available <- require("ggplot2")
is_ggplot2_available
## [1] TRUE
is_ggplot3_available <- require("ggplot3")
## Loading required package: ggplot3
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'ggplot3'
is_ggplot3_available
## [1] FALSE

This might be giving you a bit more control than library, but manually checking whether TRUE or FALSE was returned might be a bit tedious if you want to ensure that many packages are available. The downsides are the same as for library, but you need to do more manual checking.

.packages(all.available = TRUE)

In theory this seems like a nice option, as this returns a vector with names of installed packages.

pkg <- .packages(all.available = TRUE)
pkg[1:10]
##  [1] "abind"      "assertthat" "aws"        "awsMethods" "bayesplot" 
##  [6] "bibtex"     "bit"        "bit64"      "bitops"     "blogdown"

However, the help for this function mentions:

.packages(all.available = TRUE) is not a way to find out if a small number of packages are available for use: not only is it expensive when thousands of packages are installed, it is an incomplete test.

It further tells you to use require to check whether a package is installed, but I tend to disagree with that statement, as there is no way to not load the namespace of a package if it is available when using require.

installed.packages()

Calling installed.packages() returns a detailed data frame about installed packages, not only containing names, but also licences, versions, dependencies and more.

p <- installed.packages()
colnames(p)
##  [1] "Package"               "LibPath"              
##  [3] "Version"               "Priority"             
##  [5] "Depends"               "Imports"              
##  [7] "LinkingTo"             "Suggests"             
##  [9] "Enhances"              "License"              
## [11] "License_is_FOSS"       "License_restricts_use"
## [13] "OS_type"               "MD5sum"               
## [15] "NeedsCompilation"      "Built"

The major downside when trying to find whether a package is installed is that returning the information from installed.packages() is rather slow.

Furthermore, the documentation mentions:

It will be slow when thousands of packages are installed, so do not use it to find out if a named package is installed (use find.package or system.file) nor to find out if a package is usable (call requireNamespace or require and check the return value) nor to find details of a small number of packages (use packageDescription).

So, this probably is not the right way to go either.

find.package()

The find.package function returns the path to a package’s installation or gives an error if it is not found.

gg2 <- try(find.package("ggplot2"), silent = TRUE)
gg3 <- try(find.package("ggplot3"), silent = TRUE)
gg2 # this was done on macos, so your result might vary
## [1] "/Users/christian/Library/R/3.6/library/ggplot2"
gg3
## [1] "Error in find.package(\"ggplot3\") : there is no package called 'ggplot3'\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <packageNotFoundError in find.package("ggplot3"): there is no package called 'ggplot3'>

For find.package the documentation states:

find.package is not usually the right tool to find out if a package is available for use: the only way to do that is to use require to try to load it. It need not be installed for the correct platform, it might have a version requirement not met by the running version of R, there might be dependencies which are not available, ….

system.file()

Using system.file is really similar to find.package and in case a package is available the return value will be the same. However, in case a package is not available it will return an empty string instead of throwing an error.

gg2 <- system.file(package = "ggplot2")
gg3 <- system.file(package = "ggplot3")
gg2
## [1] "/Users/christian/Library/R/3.6/library/ggplot2"
gg3
## [1] ""

Which allows to check whether a package is installed using nzchar:

ifelse(nzchar(gg3), "ggplot3 is available", "ggplot3 is not available")
## [1] "ggplot3 is not available"

Here, the documentation does not discourage us from using it to check for the existence of the package, but the same problems as given for `find.package´ should still apply. If a package is found you can still not be entirely be sure that it can be loaded.

The pacman package

The pacman package provides a function to automatically install a package if it is not available locally. In addition to CRAN it also tries to install from Bioconductor.

Obviously this will fail if a package is not available from these sources (like our ggplot3 example here), but there is only so much you can do in that case.

library("pacman")
p_load("ggplot2") # loads the package
p_load("ggplot3") # tries to install but fails, as there is no ggplot3

The obvious downside is that you introduce an external dependency.

Concluding remarks

As we can see there is no one best way to check whether an R package is available, but depending on the situation you might want to choose one of the many approaches.

If you only need to load a few packages using library to load them all might be fine. Giving a cut-off value for how few qualify as few is difficult as you also might want to consider whether this is a package exporting just two functions or something extensive like Hmisc or dplyr.

If you need to use many packages a mixed approach between calling the essential things using library and checking whether the rest is available and then using the :: operator might be a good approach. For checking whether something is available I would probably go with the system.file and nzchar approach, knowing very well that this does not ensure that there are no dependency issues or that the package can be loaded.

If you do not want to load any packages you might want to go with the system.file and nzchar approach, knowing that it is not without issues.

While at the moment it does not seem that there is that one approach to just check for a package’s availability without loading its namespace, at least you can set up a first line of defence that should help many issues where packages are not installed. Failing 30 minutes into some analysis is not exactly desirable and worse than having a not exactly perfect approach in place to avoid such issues.