12 Docker Container
12.1 Introduction
This chapter explains how to develop and run the Docker container for the Poverty Calculator API on local World Bank laptops. For DEV, QA and Production deployments please see the chapter on Deploying to Azure. For the Table Maker API please refer to TBD.
12.2 Prerequisites
- Local admin account
- Docker Desktop for Windows
- WSL 2 (recommended)
- Visual Studio Code (recommended)
See below for recommendations on how to install Docker Desktop and WSL 2.
Visual Code is not strictly needed, but the VS Docker plugin is one of the best tools to interact with Docker.
12.3 Docker on World Bank laptops
12.3.1 Install Docker
- Install Docker Dekstop (v. 3.6.0 or later)
- Remember to check the box to enable WSL integration.
- Activate WSL
- Check if the WSL feature on Windows is activated (e.g. run
wsl --help
from Powershell). If it is then proceed to step 3. Note: If you get a message saying “service cannot be started” try restarting WSL like suggested below under Known problems and solutions. - Go to Control Panel -> Programs -> Turn Windows features on or off and check the box Windows Subsystem for Linux, or follow these steps if you prefer to activate WSL from the command line.
- Reboot system.
- Check if the WSL feature on Windows is activated (e.g. run
- Install WSL 2
- Download and install the Linux kernel update package.
- Set WSL 2 as your default version (run
wsl --set-default-version 2
from Powershell).
- Configure Docker access privileges
- In order to run Docker wo/ admin privileges you need to add yourself to the “docker-users” group.
- Open Computer Management. Go to Local User and Groups -> Groups -> docker-users and add your regular account (
WB/wb<UPI>
) to the list of users. - Reboot system.
- Start Docker Desktop and enjoy your hard work :-)
12.3.2 Known problems and solutions
Docker Dekstop sometimes fails with the following error.
System.InvalidOperationException:
Failed to deploy distro docker-desktop to C:\Users\<User>\AppData\Local\Docker\wsl\distro: exit code: -1
stdout: Error: 0xffffffff
The root cause of this problem is likely that the World Bank VPN connection modifies the same network configuration as the wsl
VM. Turning off your VPN when working with Docker locally is thus a good idea.
A temporary solution to the issue is to open a CMD shell in Admin mode, and then run the following code to restart LxssManager
.
> sc config LxssManager start=auto
[SC] ChangeServiceConfig SUCCESS
After this you will also need to restart Docker Desktop.
12.3.3 Tips and tricks
- Start your Docker working day by opening a CMD shell in admin-mode and run
sc config LxssManager start=auto
. Keep the shell open because you might need to re-run the command if WSL crashes. - Turn off your VPN when using Docker. This avoids the issue with WSL crashing from time to time. It still might happen though if the VPN switces on automatically or if you for other reasons switch back and forth.
- It is possible that solutions like wsl-vpnkit and vpnkit will help with the WSL issues, but these have not been tested yet. Feel free to give it a try yourself.
- If WSL causes too much pain, try switching to the legacy Hyper-V backend. (Go to Docker Desktop -> General Setting -> Use the WSL 2 based engine).
- Re-build an image from a partial cache by adding a
ARG
orENV
variable right before the layer you want to invalidate (yes, this is hacky but it works). E.g. do something like:
# Install PIP specific R packages
ENV test=test06302021
RUN Rscript -e "remotes::install_github('PIP-Technical-Team/wbpip@master', repos = '${CRAN}')"
RUN Rscript -e "remotes::install_github('PIP-Technical-Team/pipapi@master', repos = '${CRAN}')"
12.4 Create volume
The data that is used in the poverty calculations is stored outside the Docker container. We therefore need to link a data source on our local machine to the container when we run it. This could be accomplished with a bind mount, but the preferred way is to use a named volume.
You can create and populate a volume by following the steps below. First we create an empty volume, then we link this to a temporay container, before using docker cp
to copy the files to the container and volume. After the data has been copied you can discard the temporary container. The choice of Ubuntu 20.04 for the temporary container is arbitrary. You could use another image if you want.
# Create volume
docker volume create pip-vol
# Mount volume to tmp container
docker run -d --name pip-data --mount type=volume,source=pip-vol,target=/data ubuntu:20.04
# Copy data to container (and volume)
docker cp <data-folder>/. pip-data:data
# Stop and remove tmp container
docker stop pip-data
docker rm pip-data
For some purposes you will only need to create this volume once, but remember to update the contents of the volume in case the data structure changes or you want to make sure the container is running against the latest available data. See the Azure DevOps Data repo (DEV branch) for the latest updates.
The volume can be inspected with the docker inspect
and docker system
commands. If you are running Docker Desktop on Windows with WSL 2.0 then then the physical location of Docker volumes is usually found under \\wsl$\docker-desktop-data\version-pack-data\community\docker\volumes
. For more information on volumes see the Use volumes section in Docker’s reference manual.
12.5 Build image
Docker images are built using the docker build
command. Here we build the image and set the image name to pip-api
.
docker build -t pip-api .
You can also set a specific tag for the image. For example you can specify the version number or the base image used as source. How you decide to tag your images is up to you. Often a tag of latest
(the default) will suffice, but in other instances keeping track of different versions of your development image will be useful.
# Set tag when building
docker build -t pip-api:0.0.4
# Set tag after build (version number)
docker image tag pip-api:latest pip-api:0.0.4
# Set tag after build (base image)
docker image tag pip-api:latest pip-api:ubi8
Note that each layer in a Dockerfile is cached when built. These layers won’t be re-run unless the code to produce that specific layer or a previous layer in the Dockerfile changes. This is handy for fast iterations, but can also cause the image to be outdated.
If needed you can use the --no-cache
flag to force a re-build. This is useful when you want to make sure that the latest Linux updates are installed or for updating the {wbpip}
and {pipapi}
R packages. See also the Tips and Tricks section on how to do partial re-builds.
docker build --no-cache -t pip-api .
12.6 Run container
Run the container by exposing port 80 and mounting a volume with the survey and auxiliary data. The data structure in the attached volume must correspond to the sub-folder specifications in R/main.R
.
docker run --rm -d -p 80:80/tcp --mount src=pip-vol,target=/ipp,type=volume,readonly --name pip-api pip-api
You can also run the container with a bind mount if you prefer, but note that reading the files inside mount folder will be slower then with a volumne setup.
docker run --rm -d -p 80:80/tcp --mount src="<data-folder>",target=/data,type=bind --name pip-api pip-api
Navigate to http://localhost/__docs__/
to see the running API.
For details on the docker run
command and its options see the Docker Docs.
12.7 Debugging
Run container interactively:
Run the container in interactive mode, using the -it
flag, to see output messages.
$ docker run --rm -it -p 80:80/tcp --mount src=pip-vol,target=/ipp,type=volume,readonly --name pip-api pip-api
Running plumber API at http://0.0.0.0:80
Running swagger Docs at http://127.0.0.1:80/__docs__/
Inspect container:
For development purposes it can be useful to inspect the Docker container. Luckily it is very easy to enter a running container with the docker exec
command.
# Enter the container as default user
$ docker exec -it pip-api /bin/bash
plumber@bd8ea77299ca:/$
# Enter the container as root user
$ docker exec -it -u 0 pip-api /bin/bash
root@bd8ea77299ca:/$
12.8 Security
Since the PIP Techincal Team is developing both the API and the Docker container, we also have a larger responsibility for security. When deploying to Azure the image will need to go through a security scan, provided by Aquasec, that is generally quite strict. Any high level vulnerability found by the Aquasec scan will result in a failed deployment. It is not possible to leverage Aquasec on local machines, but we can get an indication of potential vulnerabilites by taking advantage of the built in security scan that comes with Docker. Additionally it is also possible to scan the contents of the R packages used in the image.
Note that neither the Snyk or Oyster scans described below are requirements from OIS. They are only included here as additional and preliminary tools.
Run image security scan:
Before deploying to Azure you can run preliminary security scans with Snyk. See Vulnerability scanning for Docker local images and Docker Security Scanning Guide 2021 for details.
docker scan --file .\Dockerfile pip-api
It is important to note that the Aquasec scan that runs on Azure is different that the Snyk scan, and can detect different vulnerabilites. Even though few or zero vulnerabilites are found by Snyk, further issues might still be detected by Aquasec.
Run security scan for R packages:
You can scan the R packages inside the container with the {oysteR}
package, which scans R projects against insecure dependencies using the OSS Index.
First log into a running container as root, and start R.
$ docker exec -it -u 0 pip-api /bin/bash
root@ab54fbf1eb36:/# R
Then install the {oyster}
package, and run an audit.
install.packages("oysteR")
library("oysteR")
<- audit_installed_r_pkgs()
audit get_vulnerabilities(audit)
Note that finding zero vulnerabilities is no guarantee against threats. For example is scanning of C++ code inside R packages not yet implemented. For more details and latest developments on the {oyster}
package see the README on Github.
12.9 Base image options
In the development process of the Dockerfile there have been used different base images.
The current image in the main branch is based on ubi8
, but other options (rocker
, centos8
) are available in seperate branches. These seperate branches will not be actively maintained, but should be kept in case there is a need to switch to another base image in the future.
The main need to switch base image is likely to stem from the Azure DevOps security scan. For example do images based on rocker
or centos8
work well locally, but neither will pass deployment. CentOS images are unfortunately not on the list of approved images by OIS, while Rocker images (which are based on Ubuntu 20.04 LTS) currently has a vulnerability (CVE-2019-18276) that is classified as a high level threat by Aqua Scan.
In fact one of the Linux dependencies for R on UBI 8, libX11
, also currently trigger a high level vulnerability (CVE-2020-14363) in Aqua Scan. Even though this issue has been solved in the offical RHEL 8 release, the issue still persist for UBI 8. It can however be solved by manually downloading a newer version of libX11 (1.6.12 or later), and removing the problematic version (1.6.8).
In case there is any need to develop with other base images in the future it is strongly recommended to start with one of the Linux distributions where RStudio provides precompiled binaries for R. This avoids the need to install R from source.
12.10 Using R on Linux
Docker images are usually based on Linux containers. Some familiarity with Linux is thus necessary in order to develop and work with Docker.
R users in particular should familiarize themselves with the difference between installing R and R packages from source vs binary, as well as potential Linux dependencies that might be nessecary to make a specific package work.
Installing R:
Installing R from source is possible on most Linux distributions, but this not always an easy endavour. It will often require you to maintain a list of the nessecary Linux dependencies for R. Installing from source is also much slower than installing from a binary version.
Another way to install R on Linux is to use the available pre-compiled binary in your Linux distro’s repository (e.g sudo apt install r-base
on Ubuntu). The problem with this approach is that the latest version of R might not yet be available, and for some Linux distro’s there will also only be a limited number of R versions available. This impedes the flexilibility needed for a production grade project.
The recommended approach is thus to rely on RStudio’s precompiled binaries. These binaries are available for the most common Linux distributions, including Ubuntu, Debian and CentOS / RHEL. Relying on these binaries will limited the number of base image options avaiable, e.g. a ligth-weight distro like Alpine is not supported, nor is the newest release of Ubuntu (Hirsute Hippo). But it will make much easier to maintain, upgrade and change the R version used in the project.
Installing R packages:
As a contrast to Windows and macOS, CRAN only supports source versions of the R packages available for Linux. This will significantly slow down the installation process of each pacakge.
Luckily RStudio provides binary versions of all packages on CRAN for a series of different Linux distro’s through their Public Pacakge Manager. It is therefore strongly recommended that you rely on RSPM as your installation repo.
An additional benefit of RSPM is that you can also select a specific date for the package versions to use.
On the RSPM website go the Setup page on the main menu and select the appropiate client OS and date. You should then see a URL similar too https://packagemanager.rstudio.com/all/__linux__/centos8/4743918
, where centos8
represents the distro and 4743918
the date.
For a more detailed introduction to the difference between binary and source packages, see Package structure and state in the book R Packages.
Installing R package dependencies:
Linux distro’s doesn’t come with the dependencies needed to work with all R packages. This includes common packages like {data.table}
and {plumber}
. It is important that such dependencies are installed prior to the installation of your R packages.
The best way to look for dependencies is to use the RSPM website and search for the specific package in question. Remeber to also select the correct Linux distribution.
12.11 Resources
For a further introduction to Docker take a look at this Data Science for Economists lecture on Docker by Grant McDermott.
See Best practices for writing Dockerfiles for advice on how to build Docker images.
Follow the Docker for Windows release notes for information on new releases and bug fixes.