Using groundhog with foreach loops
The package foreach
allows running loops in parallel, leveraging the multiple cores of a computer for faster processing. The way foreach
implements this is through multiple instances/environments of R running in the background. This is relevant for reproducibility because those background environments load packages from the default R library, not groundhog’s. This can be fixed with 3 lines of code inside the loop.
Specifically, one includes a groundhog.library()
call from within the loop. To avoid repeating the loading of the package in every loop, an if
statement runs it only if groundhog
has not yet been loaded; it runs once per parallel environment created.
Below is an example for a loop that runs a function from the package pwr
. You would just replace pwr
for the package(s) you need for the functions inside the loop.
#Use groundhog to load the packages used in the parallel loop
library('groundhog')
pkgs <-c('foreach','doParallel','parallel')
groundhog.library(pkgs, '2022-03-01')
#Create the loop with ‘foreach’
sample.sizes <- foreach(simk=1:10 , .combine='c') %dopar% {
#KEY LINES: if ‘groundhog’ has not yet been loaded in the new environment, load it and ‘pwr’
if (!'groundhog' %in% .packages()) { #if groundhog is not loaded in background environemnt
library('groundhog') #load it
groundhog.library('pwr', '2022-03-01') #and use it to load 'pwr', replace pwr with pkg(s) you need
}
#Now comes the code for the loop itself, with the function that needed ‘pwr’
powerk <- runif(1,min=.50 ,max=.90)
pwr.t.test(d=.5, power=powerk)$power
}