--- title: "FAQ" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{FAQ} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # General questions **GQ1: Tell me in 10 lines how to use this package.** **GA1:** Get the dependency graph of several R packages on CRAN or Github at a specific snapshot date(time) ```r graph <- resolve(c("crsh/papaja", "rio"), snapshot_date = "2019-07-21") ``` Dockerize the dependency graph to a directory ```r dockerize(graph, output_dir = "rangtest") ``` You can build the Docker image either by the R package `stevedore` or Docker CLI client. We use the CLI client. ```sh docker build -t rangimg ./rangtest ## might need sudo ``` Launch the container with the built image ```sh docker run --rm --name "rangcontainer" -ti rangimg ``` And the tenth line is not needed. **GQ2: For running `resolve()`, how do I know which packages are used in a project?** **GA2:** `rang` >= 0.2 supports scanning of a directory for R packages (the current working directory by default). `snapshot_date` is inferred from the latest modification date of all files. ```r resolve() ``` A better strategy, however, is to do the scanning first and then manually review which packages are from non-CRAN sources. ```r pkgs <- as_pkgrefs(".") ``` **GQ3: Why is the R script generated by `dockerize()` and `export_rang()` so strange/unidiomatic/inefficient/did you guys read `fortunes::fortune("answer is parse")`?** **GA3:** It is because we optimize the R code in `rang.R` for backward compatibility. We need to make sure that the code runs well in vanilla R environments since 1.3.1. **GQ4: Why doesn't `rang` support reconstructing computational environments with R < ~~2.1.0~~ 1.3.1 yet?** **GA4:** ~~It is because installing source packages from within R was introduced in R 2.1.0. Before that one needed to install source packages with `R CMD INSTALL`. But we are working on supporting R in the 1.x series.~~ Support for R 1.x series is available in rang >= 0.2. But R version older than 1.3.1 is still not supported because we haven't found a effectiveness way to automatically compile R < 1.3.1. **GQ5: Does `rang.R` (generated by `export_rang()` or `dockerize()`) run on non-Linux OSes?** **GA5:** Theoretically speaking, yes. But strongly not recommended. If the system requirements are fulfilled, `rang.R` should probably run fine on OS X if the R packages do not contain compiled code. C and Fortran compilers are needed if it is the case. See [this entry](https://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html#Installation-of-source-packages) in R Mac OS X FAQ. On Windows, installing Github packages requires properly set up PATH and `tar`. Similarly, R packages with compiled code require C / Fortran compilers. See [this entry](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#Can-I-install-packages-into-libraries-in-this-version_003f) in R for Windows FAQ. **GQ6: What are the caveats of using rang?** **GA6:** Many * `rang` does not support reconstructing computational environments with R < 1.3.1 (i.e. `snapshot_date` < "2001-08-31 14:58") yet * `dockerize()` can only generate Debian/Ubuntu-based Docker images; it also means that packages depending on non-Linux specific features (e.g. WinBUGS) do not work. * `dockerize(cache = TRUE)` does not cache ~~[R source code](https://cran.r-project.org/src/base/) (yet) and~~ (available in rang >= 0.2.1) System Requirements (in `deb` packages) * `query_sysreqs()` (as well as `resolve(query_sysreqs = TRUE)`) queries for System Requirements based on the latest version of the packages on CRAN / Github. Therefore: * Removed CRAN packages are assumed to have no System Requirements * R Packages with changed System Requirements between `snapshot_date` and the date of running `resolve()` might produce incorrect System Requirements * A result from `resolve()` in the following cases must be dockerized with caching (i.e. `dockerize(cache = TRUE)`) * R version < 3.1 and has at least one Github package. It is because the outdated version of Debian cannot communicate with the Github API * R version < 3.3 and has at least one Bioconductor package, same reason. * Has at least one local package. * R version < 2.1 * R packages on Github, CRAN, and Bioconductor might not be available in the near future (Github: likely; CRAN and Bioconductor: very unlikely). But one can cache the packages (`dockerize(cache = TRUE)`). * The Rocker project and its host Docker Hub might not be available in the near future (unlikely) * Ubuntu / Debian archives (for System Requirements) might not be available in the future (super unlikely) **GQ7: `rang` depends on R >= 3.5.0. Several of the dependencies depend on many modern R packages. How dare you claiming your package supports R >= 1.3.1?** **GA7:** To clarify, it is true that `resolve()` and `dockerize()` depend on many factors, including a modern version of R. But the reconstruction process (if with caching of R packages) depends only on the availability of Docker images from Docker Hub, availability of R source code on CRAN (R < 3.1.0), and `deb` packages from Ubuntu and Debian in the future. If you don't believe in all of these, see also: DQ4. **GQ8: What are the data sources of `resolve()`?** **GA8:** Several * Dependencies / R version / System Requirements of CRAN packages: r-hub APIs [pkgsearch](https://r-hub.github.io/pkgsearch/) [r-versions](https://api.r-hub.io/rversions) [sysreqs](https://sysreqs.r-hub.io/) * Github: [Github API](https://docs.github.com/en/rest) * Dependencies of Bioconductor packages: [Bioconductor](https://bioconductor.org/) **GQ9: I am not convinced by this package. What are the alternatives?** **GA9:** If you don't consider the Dockerization part of `rang`, the date-based pinning of R packages can be done by: * Using [Posit Public Package Manager](https://packagemanager.rstudio.com/) ```r library(pak) options(repos = c(REPO_NAME = "https://packagemanager.rstudio.com/cran/2019-07-21")) pkg_install("rio") pkg_install("crsh/papaja") ``` * Using [groundhog](https://groundhogr.com/) ```r library(groundhog) pkgs <- c("rio","crsh/papaja") groundhog.library(pkgs, "2019-07-21") ``` If you don't consider the date-based pinning of R packages, the Dockerization can be done by: * Using [containerit](https://github.com/o2r-project/containerit) [not on CRAN] ```r library(containerit) ## combine with Package Manager to pin packages by date install.packages("rio") remotes::install_github("crsh/papaja") library(rio) library(papaja) print(containerit::dockerfile(from = utils::sessionInfo())) ``` * Using [dockerfiler](https://CRAN.R-project.org/package=dockerfiler) ```r library(dockerfiler) my_dock <- Dockerfile$new() ## combine with Package Manager to pin packages by date my_dock$RUN(r(install.packages(c("remotes", "rio")))) my_dock$RUN(r(remotes::install_github("crsh/papaja"))) my_dock ``` **GQ10: I want to know more about this package.** **GA10:** Good. Read our [preprint](https://arxiv.org/abs/2303.04758). # Docker questions **DQ1: Is Docker an overkill to simply ensure that a few lines of R code are reproducible?** **DA1:** It might be the case for recent R code, e.g. R >= 3.0 (or `snapshot_date` > "2013-04-03 09:10"). But we position `rang` as an archaeological tool to run really old R code (`snapshot_date` >= "2005-04-19 09:01", but see GQ4). For this, Docker is essential because R in the 2.x/1.x series might not be installable anymore in a non-virtualized environment. According to [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/compendia.html), a research compendium that aids computational reproducibility should contain a complete description of the computational environment. The directory exported by `dockerize()`, especially when `materials_dir` and `cache` were used, can be directly shared as a research compendium. **DQ2: How do I access bash instead of R?** **DA2:** By default, containers launched with the images generated by `rang` goes to R. One can override this by launching the container with an alternative entry point. Suppose an image was built as per GA1. ```sh docker run --rm --name "rangcontainer" --entrypoint bash -ti rangimg ``` **DQ3: How do I copy files from and to a launched container?** **DA3:** Again an image was built as per GA1 and launched as below ```sh docker run --rm --name "rangcontainer" -ti rangimg ``` ```sh # probably you need to run this from another terminal docker cp rangcontainer:/rang.R rang2.R docker cp rang2.R rangcontainer:/rang2.R ``` We want to emphasize here that launching a container with `--name` is useful because the name of the container is randomly generated when `--name` was not used to launch it. It is also important to remind you that a relaunched container goes back to the initial state. Any file generated inside the container previously will be removed. So use `docker cp` to copy any artifact if one wants to preserve any artifact. **DQ4: How do I back up an image?** **DA4:** If you don't believe Docker Hub / Debian archives / Ubuntu archives would be available forever, you may back up the generated image. ```sh docker save rangimg | gzip > rangimg.tar.gz ``` You can also share the back up gzipped tarball file (usually < 1G, depending on the size of `materials_dir`, thus shareable on Zenodo). To restore the backup image: ```sh docker load < rangimg.tar.gz ``` And launch a container the same way ```sh docker run --rm --name "rangcontainer" -ti rangimg ``` # Apptainer/Singularity questions **AQ1: I am on HPC and I don't have Docker there. Can I use Apptainer/Singularity instead of Docker?** **AA1:** Docker may require root privileges and is not usually available on HPC. You might have Singularity or Apptainer instead. Apptainer/Singularity do not require root to run images (only to build them). You can build images on your own Linux PC (or in VirtualBox on Windows or macOS), or on a virtual private server, or also for [free in the cloud](https://cloud.sylabs.io/builder). You have two options: 1. You can prepare (using `dockerize()`) and build a Docker image and convert it to an Apptainer/Singularity image. See [Apptainer/Singularity documentation](https://apptainer.org/docs/user/latest/docker_and_oci.html) for that. 2. You can use `apptainerize()` function just like you would use `dockerize()`. ```r apptainerize(graph, output_dir = "rangtest") ``` Afterwards you build an image: ```sh cd rangtest apptainer build container.sif container.def # sudo singularity build container.sif container.def # same as above ``` And run the container: ```sh apptainer exec container.sif R # singularity exec container.sif R # same as above ``` Then stop the container when you are done with it just quit R. `apptainer` and `singularity` shell commands are interchangeable, at least for now. See [Apptainer Singularity compatibility](https://apptainer.org/docs/user/latest/singularity_compatibility.html) for details. `apptainerize()`/`singularize()` functions work exactly the same as `dockerize()`, except you cannot cache Linux distribution rootfs. **AQ2: What if I want to run RStudio IDE in a container instead of just CLI R?** **AA2:** To run RStudio IDE in Apptainer/Singularity container, some writeable folders and a config file have to be created locally: ```bash mkdir -p run var-lib-rstudio-server .rstudio printf 'provider=sqlite\ndirectory=/var/lib/rstudio-server\n' > database.conf ``` After that, you can run the container (do not run as `root` user, otherwise you will not be able to login to RStudio IDE). Start instance (on default RSTUDIO port 8787): ```bash apptainer instance start \ --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \ container.sif \ rangtest ``` Now open a browser and go to localhost:8787. The default username is your local username, default password is 'set_your_password' (if you are using container generated by rang). List running instances: ```bash apptainer instance list ``` Stop instance: ```bash apptainer instance stop rangtest ``` Start instance with custom port (e.g. 8080) and password: ```bash apptainer instance start \ --env RPORT=8080 --env PASSWORD='set_your_password' \ --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \ container.sif \ rangtest ``` Run container with custom `rserver` command line: ```bash apptainer exec \ --env PASSWORD='set_your_password' \ --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf,.rstudio:/home/rstudio/.rstudio/ \ container.sif \ /usr/lib/rstudio-server/bin/rserver \ --auth-none=0 --auth-pam-helper-path=pam-helper \ --server-user=$(whoami) --www-port=8787 ``` If you run the container using `apptainer exec` command, you will have to kill the `rserver` process manually or Cmd/Ctrl+C from the running container to stop the server.