GQ1: Tell me in 10 lines how to use this package.
GA1: Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time)
Dockerize the dependency graph to a directory
You can build the Docker image either by the R package
stevedore
or Docker CLI client. We use the CLI client.
Launch the container with the built image
And the tenth line is not needed.
GQ2: For running resolve()
, how do I know which
packages are used in a project?
GA2: rang
>= 0.2 supports scanning
of a directory for R packages (the current working directory by
default). snapshot_date
is inferred from the latest
modification date of all files.
A better strategy, however, is to do the scanning first and then manually review which packages are from non-CRAN sources.
GQ3: Why is the R script generated by
dockerize()
and export_rang()
so
strange/unidiomatic/inefficient/did you guys read
fortunes::fortune("answer is parse")
?
GA3: It is because we optimize the R code in
rang.R
for backward compatibility. We need to make sure
that the code runs well in vanilla R environments since 1.3.1.
GQ4: Why doesn’t rang
support reconstructing
computational environments with R < 2.1.0 1.3.1
yet?
GA4: It is because installing source packages
from within R was introduced in R 2.1.0. Before that one needed to
install source packages with Support for R 1.x
series is available in rang >= 0.2. But R version older than 1.3.1 is
still not supported because we haven’t found a effectiveness way to
automatically compile R < 1.3.1.R CMD INSTALL
. But we are
working on supporting R in the 1.x series.
GQ5: Does rang.R
(generated by
export_rang()
or dockerize()
) run on non-Linux
OSes?
GA5: Theoretically speaking, yes. But strongly not
recommended. If the system requirements are fulfilled,
rang.R
should probably run fine on OS X if the R packages
do not contain compiled code. C and Fortran compilers are needed if it
is the case. See this
entry in R Mac OS X FAQ. On Windows, installing Github packages
requires properly set up PATH and tar
. Similarly, R
packages with compiled code require C / Fortran compilers. See this
entry in R for Windows FAQ.
GQ6: What are the caveats of using rang?
GA6: Many
rang
does not support reconstructing computational
environments with R < 1.3.1 (i.e. snapshot_date
<
“2001-08-31 14:58”) yetdockerize()
can only generate Debian/Ubuntu-based
Docker images; it also means that packages depending on non-Linux
specific features (e.g. WinBUGS) do not work.dockerize(cache = TRUE)
does not cache deb
packages)query_sysreqs()
(as well as
resolve(query_sysreqs = TRUE)
) queries for System
Requirements based on the latest version of the packages on CRAN /
Github. Therefore:
snapshot_date
and the date of running
resolve()
might produce incorrect System Requirementsresolve()
in the following cases must be
dockerized with caching (i.e. dockerize(cache = TRUE)
)
dockerize(cache = TRUE)
).GQ7: rang
depends on R >= 3.5.0. Several of
the dependencies depend on many modern R packages. How dare you claiming
your package supports R >= 1.3.1?
GA7: To clarify, it is true that
resolve()
and dockerize()
depend on many
factors, including a modern version of R. But the reconstruction process
(if with caching of R packages) depends only on the availability of
Docker images from Docker Hub, availability of R source code on CRAN (R
< 3.1.0), and deb
packages from Ubuntu and Debian in the
future. If you don’t believe in all of these, see also: DQ4.
GQ8: What are the data sources of
resolve()
?
GA8: Several
GQ9: I am not convinced by this package. What are the alternatives?
GA9: If you don’t consider the Dockerization part of
rang
, the date-based pinning of R packages can be done
by:
library(pak)
options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21"))
pkg_install("rio")
pkg_install("crsh/papaja")
If you don’t consider the date-based pinning of R packages, the Dockerization can be done by:
library(containerit)
## combine with Package Manager to pin packages by date
install.packages("rio")
remotes::install_github("crsh/papaja")
library(rio)
library(papaja)
print(containerit::dockerfile(from = utils::sessionInfo()))
library(dockerfiler)
my_dock <- Dockerfile$new()
## combine with Package Manager to pin packages by date
my_dock$RUN(r(install.packages(c("remotes", "rio"))))
my_dock$RUN(r(remotes::install_github("crsh/papaja")))
my_dock
GQ10: I want to know more about this package.
GA10: Good. Read our preprint.
DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?
DA1: It might be the case for recent R code, e.g. R
>= 3.0 (or snapshot_date
> “2013-04-03 09:10”). But
we position rang
as an archaeological tool to run really
old R code (snapshot_date
>= “2005-04-19 09:01”, but see
GQ4). For this, Docker is essential because R in the 2.x/1.x series
might not be installable anymore in a non-virtualized environment.
According to The
Turing Way, a research compendium that aids computational
reproducibility should contain a complete description of the
computational environment. The directory exported by
dockerize()
, especially when materials_dir
and
cache
were used, can be directly shared as a research
compendium.
DQ2: How do I access bash instead of R?
DA2: By default, containers launched with the images
generated by rang
goes to R. One can override this by
launching the container with an alternative entry point.
Suppose an image was built as per GA1.
DQ3: How do I copy files from and to a launched container?
DA3: Again an image was built as per GA1 and launched as below
# probably you need to run this from another terminal
docker cp rangcontainer:/rang.R rang2.R
docker cp rang2.R rangcontainer:/rang2.R
We want to emphasize here that launching a container with
--name
is useful because the name of the container is
randomly generated when --name
was not used to launch it.
It is also important to remind you that a relaunched container goes back
to the initial state. Any file generated inside the container previously
will be removed. So use docker cp
to copy any artifact if
one wants to preserve any artifact.
DQ4: How do I back up an image?
DA4: If you don’t believe Docker Hub / Debian archives / Ubuntu archives would be available forever, you may back up the generated image.
You can also share the back up gzipped tarball file (usually < 1G,
depending on the size of materials_dir
, thus shareable on
Zenodo).
To restore the backup image:
And launch a container the same way
AQ1: I am on HPC and I don’t have Docker there. Can I use Apptainer/Singularity instead of Docker?
AA1: Docker may require root privileges and is not usually available on HPC. You might have Singularity or Apptainer instead. Apptainer/Singularity do not require root to run images (only to build them). You can build images on your own Linux PC (or in VirtualBox on Windows or macOS), or on a virtual private server, or also for free in the cloud.
You have two options:
You can prepare (using dockerize()
) and build a
Docker image and convert it to an Apptainer/Singularity image. See Apptainer/Singularity
documentation for that.
You can use apptainerize()
function just like you
would use dockerize()
.
Afterwards you build an image:
cd rangtest
apptainer build container.sif container.def
# sudo singularity build container.sif container.def # same as above
And run the container:
Then stop the container when you are done with it just quit R.
apptainer
and singularity
shell commands
are interchangeable, at least for now. See Apptainer
Singularity compatibility for details.
apptainerize()
/singularize()
functions work
exactly the same as dockerize()
, except you cannot cache
Linux distribution rootfs.
AQ2: What if I want to run RStudio IDE in a container instead of just CLI R?
AA2: To run RStudio IDE in Apptainer/Singularity container, some writeable folders and a config file have to be created locally:
mkdir -p run var-lib-rstudio-server .rstudio
printf 'provider=sqlite\ndirectory=/var/lib/rstudio-server\n' > database.conf
After that, you can run the container (do not run as
root
user, otherwise you will not be able to login to
RStudio IDE).
Start instance (on default RSTUDIO port 8787):
apptainer instance start \
--bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
container.sif \
rangtest
Now open a browser and go to localhost:8787. The default username is your local username, default password is ‘set_your_password’ (if you are using container generated by rang).
List running instances:
Stop instance:
Start instance with custom port (e.g. 8080) and password:
apptainer instance start \
--env RPORT=8080
--env PASSWORD='set_your_password' \
--bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
container.sif \
rangtest
Run container with custom rserver
command line:
apptainer exec \
--env PASSWORD='set_your_password' \
--bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \
container.sif \
/usr/lib/rstudio-server/bin/rserver \
--auth-none=0 --auth-pam-helper-path=pam-helper \
--server-user=$(whoami) --www-port=8787
If you run the container using apptainer exec
command,
you will have to kill the rserver
process manually or
Cmd/Ctrl+C from the running container to stop the server.