Comparing ‘groundhog’ with ‘Posit’s Package Manager’
last update: Apr 7th, 2024
tldr; groundhog, in relation to Posit’s pkg manager: requires many fewer re-installations of packages, fewer of those need to be from (the slower and less reliable process of installing from) source, and only groundhog ensures the same package version is installed on Mac and Windows versions of R for a given requested date.
The company behind R Studio, “Posit”, offers a Package Manager which, like groundhog, seeks to deliver package version control for R. See: https://p3m.dev/client/#/repos/cran/setup
Their approach is to make archived copies of CRAN (& Bioconductor), and then allow users to install packages from those archived copies. For example, to have packages version-controlled to 2023-01-01, one would install them from Posit’s copy of CRAN made on that day.
Their website recommends users to set such a date through R Studio options, but this is a bad idea for researchers wishing to make their scripts reproducible, because such date would not be included in the R scripts they share. Nobody can see your R Studio options.
The better solution is to indicate the archive date within the script (which they recommend for non-users of R Studio), like this:
options(repos = c(CRAN = "https://p3m.dev/cran/2023-01-01"))
After running that line, any install.packages()
call would install packages as available on that archived copy of CRAN.
The advantages of the Posit Package Manager solution include that (1) it is ran by Posit, a huge player in the R market, (2) it works not just with CRAN but also bioconductor (but unlike groundhog, not with GitHub or GitLab packages), and that (3) it installs binaries for Linux machines, while groundhog only offers source for Linux.
The disadvantages are the following:
- Dependency versions. The Posit approach only achieves version-control for the requested package, not its dependencies (if there is some version of a dependency already installed, that version will be used, not the version available on the date in the repository option). Groundhog, in contrast, checks the version of all dependencies. This shortcoming with the Posit solution can be avoided by setting a custom directory for packages to be installed for each project, always starting with a clean slate (0 packages installed), but that’s cumbersome, inefficient, and difficult to convey to readers of the R script (who would need to do the same).
- Re-installation. The Posit approach only modifies the installation of packages, nor their loading (not the
library()
command). This means that to ensure one has version-controlled a script, to ensure thatlibrary('rio')
loaded the intended version, one has to reinstall the rio package (!). For example, if you were using 2023-06-01 as the date for packages for a while, then switched to a new script with date 2023-09-01, and then went back to the older script, you would have to reinstall all packages (or more likely, forget, and use the wrong version). With groundhog, in contrast, groundhog.library(pkg,date) will verify when loading that the desired version is loaded. Moreover, alternative versions are saved side by side, so you never really have to re-install the same version. Like #1, this can be avoided by having custom a package folder for each script or project you work on.
- Windows vs Mac. A CRAN archive date is not enough to ensure the same version of the pkg is available. On any given date, there may be a binary version for Windows and a different one for Mac (this happens because binaries for Mac and Windows are built by different people on their own schedule, sometimes weeks apart). Groundhog uses the date to find the version (the most recent version available), and then looks for the binary for that version whenever it was available, so the same script delivers the same version on Mac and Windows. If the Mac binary was available 2 weeks later, it gets it from 2 weeks later, instead of installing the older version of the package (or from source).
- From source. The Posit approach will end up leading users to have to install from source (the slower and more buggy installation process, specially on a Mac) because there is often a delay between pkg being available and a binary of it being available. Groundhog will deliver a binary for the requested version if it was ever offered.
- GitHub: Only groundhog also does version control with GitHub and GitLab packages
- R version: Only groundhog guides users to the version of R they should use for the date of interest
Bottom line
The Posit solution is pretty good. If it had existed when I wrote groundhog, i probably wouldn’t have written it. But it seems to offer a worse solution than groundhog, one that is slower, less transparent, requires frequent (almost constant) re-installation of packages, and is more prone to human error when relying on it. As with renv, the posit solution may be superior for an in-house corporate environments, but is inferior for an environment involving open collaboration among academics seeking at least medium term reproducibility for individual stand-alone scripts.