Recently, I asked myself whether there is a really good way to check whether
some specific package is installed in R when you are not building a package yourself.
For packages this is easy as you put your dependencies in the DESCRIPTION
file and then you are good to go.
But what do you do with regular data analysis projects that are not
necessarily packaged?
It turns out there are a few options, but they all have their ups and downs.
library()
The most basic way to load a package is by calling library("package")
.
This also ensures that a package is actually available as this will fail
when a package is not available.
library("ggplot2") # loads the package
library("ggplot3") # fails
## Error in library("ggplot3"): there is no package called 'ggplot3'
In the latter case an error is thrown, so you need to install the package
to run the code.
However, the main disadvantage of using this method to ensure that a package
is installed at the beginning of an analysis is that it pollutes the
namespace quite dramatically and you might run into namespace conflicts
(thus, you might want to use ::
to avoid errors).
require()
The main difference between library
and require
is that the latter one
does not give an error if a package is not available but a warning and
returns a boolean value.
is_ggplot2_available <- require("ggplot2")
is_ggplot2_available
## [1] TRUE
is_ggplot3_available <- require("ggplot3")
## Loading required package: ggplot3
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'ggplot3'
is_ggplot3_available
## [1] FALSE
This might be giving you a bit more control than library
, but manually
checking whether TRUE
or FALSE
was returned might be a bit tedious if you
want to ensure that many packages are available.
The downsides are the same as for library
, but you need to do more manual
checking.
.packages(all.available = TRUE)
In theory this seems like a nice option, as this returns a vector with names of installed packages.
pkg <- .packages(all.available = TRUE)
pkg[1:10]
## [1] "abind" "assertthat" "aws" "awsMethods" "bayesplot"
## [6] "bibtex" "bit" "bit64" "bitops" "blogdown"
However, the help for this function mentions:
.packages(all.available = TRUE) is not a way to find out if a small number of packages are available for use: not only is it expensive when thousands of packages are installed, it is an incomplete test.
It further tells you to use require
to check whether a package is installed,
but I tend to disagree with that statement, as there is no way to not load
the namespace of a package if it is available when using require
.
installed.packages()
Calling installed.packages()
returns a detailed data frame about installed
packages, not only containing names, but also licences, versions, dependencies
and more.
p <- installed.packages()
colnames(p)
## [1] "Package" "LibPath"
## [3] "Version" "Priority"
## [5] "Depends" "Imports"
## [7] "LinkingTo" "Suggests"
## [9] "Enhances" "License"
## [11] "License_is_FOSS" "License_restricts_use"
## [13] "OS_type" "MD5sum"
## [15] "NeedsCompilation" "Built"
The major downside when trying to find whether a package is installed is that returning the information from installed.packages() is rather slow.
Furthermore, the documentation mentions:
It will be slow when thousands of packages are installed, so do not use it to find out if a named package is installed (use find.package or system.file) nor to find out if a package is usable (call requireNamespace or require and check the return value) nor to find details of a small number of packages (use packageDescription).
So, this probably is not the right way to go either.
find.package()
The find.package
function returns the path to a package’s installation or
gives an error if it is not found.
gg2 <- try(find.package("ggplot2"), silent = TRUE)
gg3 <- try(find.package("ggplot3"), silent = TRUE)
gg2 # this was done on macos, so your result might vary
## [1] "/Users/christian/Library/R/3.6/library/ggplot2"
gg3
## [1] "Error in find.package(\"ggplot3\") : there is no package called 'ggplot3'\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <packageNotFoundError in find.package("ggplot3"): there is no package called 'ggplot3'>
For find.package
the documentation states:
find.package is not usually the right tool to find out if a package is available for use: the only way to do that is to use require to try to load it. It need not be installed for the correct platform, it might have a version requirement not met by the running version of R, there might be dependencies which are not available, ….
system.file()
Using system.file
is really similar to find.package
and in case a package
is available the return value will be the same.
However, in case a package is not available it will return an empty string
instead of throwing an error.
gg2 <- system.file(package = "ggplot2")
gg3 <- system.file(package = "ggplot3")
gg2
## [1] "/Users/christian/Library/R/3.6/library/ggplot2"
gg3
## [1] ""
Which allows to check whether a package is installed using nzchar
:
ifelse(nzchar(gg3), "ggplot3 is available", "ggplot3 is not available")
## [1] "ggplot3 is not available"
Here, the documentation does not discourage us from using it to check for the existence of the package, but the same problems as given for `find.package´ should still apply. If a package is found you can still not be entirely be sure that it can be loaded.
pacman
packageThe pacman
package provides a function to automatically install a package
if it is not available locally. In addition to CRAN it also tries to
install from Bioconductor.
Obviously this will fail if a package is not available from these sources (like our ggplot3 example here), but there is only so much you can do in that case.
library("pacman")
p_load("ggplot2") # loads the package
p_load("ggplot3") # tries to install but fails, as there is no ggplot3
The obvious downside is that you introduce an external dependency.
As we can see there is no one best way to check whether an R package is available, but depending on the situation you might want to choose one of the many approaches.
If you only need to load a few packages using library
to load them all
might be fine. Giving a cut-off value for how few qualify as few is
difficult as you also might want to consider whether this is a package
exporting just two functions or something extensive like Hmisc
or dplyr
.
If you need to use many packages a mixed approach between calling the
essential things using library
and checking whether the rest is available
and then using the ::
operator might be a good approach. For checking
whether something is available I would probably go with the system.file
and
nzchar
approach, knowing very well that this does not ensure that there
are no dependency issues or that the package can be loaded.
If you do not want to load any packages you might want to go with the
system.file
and nzchar
approach, knowing that it is not without issues.
While at the moment it does not seem that there is that one approach to just check for a package’s availability without loading its namespace, at least you can set up a first line of defence that should help many issues where packages are not installed. Failing 30 minutes into some analysis is not exactly desirable and worse than having a not exactly perfect approach in place to avoid such issues.