General questions

GQ1: Tell me in 10 lines how to use this package.

GA1: Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time)

graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21")

Dockerize the dependency graph to a directory

dockerize(graph, output_dir = "rangtest")

You can build the Docker image either by the R package stevedore or Docker CLI client. We use the CLI client.

docker build -t rangimg ./rangtest ## might need sudo

Launch the container with the built image

docker run --rm --name "rangcontainer" -ti rangimg

And the tenth line is not needed.

GQ2: For running resolve(), how do I know which packages are used in a project?

GA2: rang >= 0.2 supports scanning of a directory for R packages (the current working directory by default). snapshot_date is inferred from the latest modification date of all files.

resolve()

A better strategy, however, is to do the scanning first and then manually review which packages are from non-CRAN sources.

pkgs <- as_pkgrefs(".")

GQ3: Why is the R script generated by dockerize() and export_rang() so strange/unidiomatic/inefficient/did you guys read fortunes::fortune("answer is parse")?

GA3: It is because we optimize the R code in rang.R for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 1.3.1.

GQ4: Why doesn’t rang support reconstructing computational environments with R < ~~2.1.0~~ 1.3.1 yet?

GA4: It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install source packages with R CMD INSTALL. But we are working on supporting R in the 1.x series. Support for R 1.x series is available in rang >= 0.2. But R version older than 1.3.1 is still not supported because we haven’t found a effectiveness way to automatically compile R < 1.3.1.

GQ5: Does rang.R (generated by export_rang() or dockerize()) run on non-Linux OSes?

GA5: Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, rang.R should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See this entry in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and tar. Similarly, R packages with compiled code require C / Fortran compilers. See this entry in R for Windows FAQ.

GQ6: What are the caveats of using rang?

GA6: Many

rang does not support reconstructing computational environments with R < 1.3.1 (i.e. snapshot_date < “2001-08-31 14:58”) yet
dockerize() can only generate Debian/Ubuntu-based Docker images; it also means that packages depending on non-Linux specific features (e.g. WinBUGS) do not work.
dockerize(cache = TRUE) does not cache ~~R source code (yet) and~~ (available in rang >= 0.2.1) System Requirements (in deb packages)
query_sysreqs() (as well as resolve(query_sysreqs = TRUE)) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore:
- Removed CRAN packages are assumed to have no System Requirements
- R Packages with changed System Requirements between snapshot_date and the date of running resolve() might produce incorrect System Requirements
A result from resolve() in the following cases must be dockerized with caching (i.e. dockerize(cache = TRUE))
- R version < 3.1 and has at least one Github package. It is because the outdated version of Debian cannot communicate with the Github API
- R version < 3.3 and has at least one Bioconductor package, same reason.
- Has at least one local package.
- R version < 2.1
R packages on Github, CRAN, and Bioconductor might not be available in the near future (Github: likely; CRAN and Bioconductor: very unlikely). But one can cache the packages (dockerize(cache = TRUE)).
The Rocker project and its host Docker Hub might not be available in the near future (unlikely)
Ubuntu / Debian archives (for System Requirements) might not be available in the future (super unlikely)

GQ7: rang depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 1.3.1?

GA7: To clarify, it is true that resolve() and dockerize() depend on many factors, including a modern version of R. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and deb packages from Ubuntu and Debian in the future. If you don’t believe in all of these, see also: DQ4.

GQ8: What are the data sources of resolve()?

GA8: Several

Dependencies / R version / System Requirements of CRAN packages: r-hub APIs pkgsearch r-versions sysreqs
Github: Github API
Dependencies of Bioconductor packages: Bioconductor

GQ9: I am not convinced by this package. What are the alternatives?

GA9: If you don’t consider the Dockerization part of rang, the date-based pinning of R packages can be done by:

Using Posit Public Package Manager

library(pak)
options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21"))
pkg_install("rio")
pkg_install("crsh/papaja")

Using groundhog

library(groundhog)
pkgs <- c("rio","crsh/papaja")
groundhog.library(pkgs, "2019-07-21")

If you don’t consider the date-based pinning of R packages, the Dockerization can be done by:

Using containerit [not on CRAN]

library(containerit)
## combine with Package Manager to pin packages by date
install.packages("rio")
remotes::install_github("crsh/papaja")
library(rio)
library(papaja)
print(containerit::dockerfile(from = utils::sessionInfo()))

Using dockerfiler

library(dockerfiler)
my_dock <- Dockerfile$new()
## combine with Package Manager to pin packages by date
my_dock$RUN(r(install.packages(c("remotes", "rio"))))
my_dock$RUN(r(remotes::install_github("crsh/papaja")))
my_dock

GQ10: I want to know more about this package.

GA10: Good. Read our preprint.

Docker questions

DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?

DA1: It might be the case for recent R code, e.g. R >= 3.0 (or snapshot_date > “2013-04-03 09:10”). But we position rang as an archaeological tool to run really old R code (snapshot_date >= “2005-04-19 09:01”, but see GQ4). For this, Docker is essential because R in the 2.x/1.x series might not be installable anymore in a non-virtualized environment.

According to The Turing Way, a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by dockerize(), especially when materials_dir and cache were used, can be directly shared as a research compendium.

DQ2: How do I access bash instead of R?

DA2: By default, containers launched with the images generated by rang goes to R. One can override this by launching the container with an alternative entry point.

Suppose an image was built as per GA1.

docker run --rm --name "rangcontainer" --entrypoint bash -ti rangimg

DQ3: How do I copy files from and to a launched container?

DA3: Again an image was built as per GA1 and launched as below

docker run --rm --name "rangcontainer" -ti rangimg

# probably you need to run this from another terminal
docker cp rangcontainer:/rang.R rang2.R
docker cp rang2.R rangcontainer:/rang2.R

We want to emphasize here that launching a container with --name is useful because the name of the container is randomly generated when --name was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state. Any file generated inside the container previously will be removed. So use docker cp to copy any artifact if one wants to preserve any artifact.

DQ4: How do I back up an image?

DA4: If you don’t believe Docker Hub / Debian archives / Ubuntu archives would be available forever, you may back up the generated image.

docker save rangimg | gzip > rangimg.tar.gz

You can also share the back up gzipped tarball file (usually < 1G, depending on the size of materials_dir, thus shareable on Zenodo).

To restore the backup image:

docker load < rangimg.tar.gz

And launch a container the same way

docker run --rm --name "rangcontainer" -ti rangimg

Apptainer/Singularity questions

AQ1: I am on HPC and I don’t have Docker there. Can I use Apptainer/Singularity instead of Docker?

AA1: Docker may require root privileges and is not usually available on HPC. You might have Singularity or Apptainer instead. Apptainer/Singularity do not require root to run images (only to build them). You can build images on your own Linux PC (or in VirtualBox on Windows or macOS), or on a virtual private server, or also for free in the cloud.

You have two options:

You can prepare (using dockerize()) and build a Docker image and convert it to an Apptainer/Singularity image. See Apptainer/Singularity documentation for that.
You can use apptainerize() function just like you would use dockerize().

apptainerize(graph, output_dir = "rangtest")

Afterwards you build an image:

cd rangtest
apptainer build container.sif container.def
# sudo singularity build container.sif container.def # same as above

And run the container:

apptainer exec container.sif R
# singularity exec container.sif R # same as above

Then stop the container when you are done with it just quit R.

apptainer and singularity shell commands are interchangeable, at least for now. See Apptainer Singularity compatibility for details.

apptainerize()/singularize() functions work exactly the same as dockerize(), except you cannot cache Linux distribution rootfs.

AQ2: What if I want to run RStudio IDE in a container instead of just CLI R?

AA2: To run RStudio IDE in Apptainer/Singularity container, some writeable folders and a config file have to be created locally:

mkdir -p run var-lib-rstudio-server .rstudio
printf 'provider=sqlite\ndirectory=/var/lib/rstudio-server\n' > database.conf

After that, you can run the container (do not run as root user, otherwise you will not be able to login to RStudio IDE).

Start instance (on default RSTUDIO port 8787):

apptainer instance start \
    --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
    container.sif \
    rangtest

Now open a browser and go to localhost:8787. The default username is your local username, default password is ‘set_your_password’ (if you are using container generated by rang).

List running instances:

apptainer instance list

Stop instance:

apptainer instance stop rangtest

Start instance with custom port (e.g. 8080) and password:

apptainer instance start \
    --env RPORT=8080
    --env PASSWORD='set_your_password' \
    --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
    container.sif \
    rangtest

Run container with custom rserver command line:

apptainer exec \
    --env PASSWORD='set_your_password' \
    --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
    container.sif \
    /usr/lib/rstudio-server/bin/rserver \
    --auth-none=0 --auth-pam-helper-path=pam-helper \
    --server-user=$(whoami) --www-port=8787

If you run the container using apptainer exec command, you will have to kill the rserver process manually or Cmd/Ctrl+C from the running container to stop the server.