Comparing 'groundhog' with 'renv'
last update: March 30th, 2022
tldr; academics should probably use groundhog, corporate data scientists should arguably use renv.
Probably the best well-known solution for version-control in R is the package renv, which is a revamped and reformed version of the older packrat package. Here we focus on the relative advantages and disadvantages of renv vs groundhog.
- renv is integrated with R Studio
- renv is developed by professional software engineers (rather than academic researchers)
- renv works with more repositories than does groundhog (in addition to groundhog's CRAN, GitHub and GitLab, renv also works with BitBucket and BioConductor). However, addingg BioConductor to the set of repositories groundhog works with is top of the to-do list.
- groundhog allows sharing individual .R scripts that are reproducible, with renv you have to work within a project structure.
- groundhog does not requires additional attachment files to make R code reproducible, with renv you have to also share a 'lockfile' (a textfile listing required package versions) and possibly several additional configuration files.
- groundhog requires no learning curve, you simply rely on
library(). Nothing else changes. renv, in contrast, requires several additional setup and ongoing steps to attain version-control. See the 'workflow' that is recommended for its use. For instance, users need to actively save the packages they have loaded into their environment using the command
- groundhog does not require reproducers of code to know how to use it, or that groundhog even exists. A user can run a script that relies on groundhog by simply executing the code, and the necessary packages will be installed and loaded, or a warning/error message will indicate this goal was not achieved. For renv, in contrast, just executing the code in an R script will not load the intended versions of packages, and no warning or error message will be generated if this occurs.
- groundhog can be used retroactively, taking scripts created without groundhog and that no longer run, and making them run again by loading the package versions used when they were originally created. The blogpost DataColada provides an example. This is not possible with renv.
- groundhog gives real-time feedback when each package is loaded verifying version-control was successful (that the expected package version of the loaded package, and all its dependencies, were in fact loaded). renv does not produce such feedback. For example, renv's instructions page includes this statement:
[renv's] approach is not 100% reliable in detecting the packages required by your project. If you find that renv’s dependency discovery is failing to discover one or more packages used in your project, one escape hatch is to include a file called
_dependencies.Rwith code of the form" (see original document with quoted text in the web-archive; bold underlining emphasis added here)
Whether the advantages or disadvantages of groundhog are more important depends on the use case.
In a corporate environment, with a close-knit network of developers that already collaborate, e.g., through GitHub, and who enjoy a shared understanding of technology in general and version-control solutions in particular, renv is an apt solution. Its professional software architecture and integration with R Studio project management is likely to compensate its many ease-of-use shortcomings.
In an academic context, where individual researchers wish to share almost exclusively stand-alone R scripts, where files will be downloaded and re-run by strangers (students, reviewers, colleagues, paper readers, book buyers etc), where scripts will be made available as individual files on open repositories like ResearchBox, Dataverse and the OSF, where code files need to be reproducible years after they were written, and reproduced by people who will not communicate with the original author, and who will not invest in learning whatever version-control solution those authors used years ago, groundhog seems like the right, and only, solution.