DS2002 Data Science Systems

Course materials and documentation for DS2002

View the Project on GitHub ksiller/ds2002-course

Containers

The goal of this activity is to familiarize you with containerization using Docker and related technologies. Containers are essential for creating reproducible environments, packaging applications with their dependencies, and deploying software consistently across different systems.

Note: Work through the examples below in your terminal (Codespace or local), experimenting with each command and its various options. If you encounter an error message, don’t be discouraged—errors are learning opportunities. Reach out to your peers or instructor for help when needed, and help each other when you can.

Setup

Option 1: If you want to use Docker containers on your own computer, follow the setup guide in ../../setup/docker.md.

Option 2: Alternatively spin up a Linux Ubuntu EC2 instance in AWS.

  1. SSH to the Ubuntu EC2 instance (see Lab 09)
  2. Install Docker:
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh ./get-docker.sh
    

In-class exercises

Pulling & Running Docker Images

To pull a container image, find its location from Docker Hub or another registry. This should appear something like:

docker pull godlovedc/lolcow

The pull command downloads the image from Docker Hub to your machine.

To run the default command of the image, execute:

docker run godlovedc/lolcow

Output:

 _____________________________________
/ Everything will be just tickety-boo \
\ today.                              /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Running in Interactive Mode

The LolCow container told you a joke (ran a process) and then exited immediately to return to your shell.

Let’s find a new image to explore how we can use a container interactively. Go to Docker Hub and search for an Ubuntu image. Take note of the image name (ubuntu) and choose a tag. The tag is the portion after the :.

To work with a container interactively, append the -it flag to the docker run command. Be sure to add a shell or some other executable program after the image name and replace <tag> with the actual tag you found:

docker run -it ubuntu:<tag> /bin/bash

Note how the prompt has changed to something like this:

root@4489de2c677f

You’re in a bash shell inside the container!

Now, run

cat /etc/os-release

To exit out of the interactive container shell, enter

exit

Listing Docker images

To view all images you have built or pulled to your computer, run:

docker images

The output may look like this (column names vary slightly by Docker/runtime version):

IMAGE                                ID             DISK USAGE   CONTENT SIZE   EXTRA
godlovedc/lolcow:latest              a692b57abc43        370MB          104MB    U   
jekyll/jekyll:latest                 400b8d1569f1       1.23GB          322MB        
mysql:8.0                            99d774bf02a4       1.08GB          243MB    U   

How to read this table:

View Running Containers

To see all containers running locally:

docker ps

You should see output similar to:

CONTAINER ID   IMAGE            COMMAND                  CREATED          STATUS                      PORTS                                         NAMES
a57e5166fda7   ubuntu:latest    "/bin/bash"              4 seconds ago    Exited (0) 3 seconds ago                                                  heuristic_pare

To see all container instances, including those that have stopped, run this:

docker ps -a
CONTAINER ID   IMAGE                     COMMAND                  CREATED          STATUS                      PORTS                                         NAMES
ed9a3ade7cec   godlovedc/lolcow:latest   "/bin/sh -c 'fortune…"   3 seconds ago    Exited (0) 2 seconds ago                                                  hardcore_napier
a57e5166fda7   ubuntu:latest             "/bin/bash"              5 minutes ago    Exited (0) 5 minutes ago                                                  heuristic_pare

You can now refer to a specific container by using either the full name heuristic_pare or the first few characters of the container ID, such as a57e.

Inspect Properties of a Container

To inspect all metadata attributes about a running container, such as IP address, or volume mounts, etc. use the inspect command. This will return a JSON payload of fields:

docker inspect a57e

Try to find the Cmd[] section. It describes the command that’s executed by default.

File System

Each container image has its own filesystem. Let’s check this out by comparing host and container output:

pwd
docker run --rm ubuntu:latest pwd

The first command runs on the host in your active shell. If you’re in this repo’s practice directory it will show something like:

/home/mst3k/ds2002-course/practice/11-containers/

The second command runs pwd inside a temporary Ubuntu container and will show:

/

Similarly, compare the output of ls and docker run --rm ubuntu:latest ls.

Mount Storage

To mount a directory from your local workstation into a container when launched, use the -v flag with a mapping of HOST_DIRECTORY:CONTAINER_DIRECTORY:

docker run -it -v .:/my_folder/ ubuntu:latest /bin/bash

Run ls.

bin  boot  dev  etc  home  lib  media  mnt  my_folder  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Note the my_folder directory inside the container. Run ls my_folder and you should see the contents of your current host directory, now mounted inside /my_folder.

Now run:

echo "hello from the container" > my_folder/hello.txt

Then exit.

Through this mechanism you can dynamically bring folders and files into the container. Any files you add to my_folder will persist when you exit the container. Pretty cool!

Stop a Running Container

To stop a container:

docker stop heuristic_pare

or

docker stop a57e

Deleting Docker images

Note: Images can only be removed when there is no container instance with that image running anymore.

To delete an image, use the rmi (remove image) command with either the image name:tag or ID.

docker rmi image_name

To delete all unused images:

docker system prune

Creating Docker Containers

This directory contains a few container examples. We focus on the mechanism of the build process rather than the specific implementation details underlying each project.

Let’s try the Fortune Teller. The Dockerfile is located in fortune/Dockerfile.

FROM ubuntu:18.04

RUN apt-get update && apt-get install -y --no-install-recommends \
        fortune fortunes-min && \
    rm -rf /var/lib/apt/lists/*

ENV PATH=/usr/games:${PATH}

ENTRYPOINT ["fortune"]

How this Dockerfile works:

Let’s build it:

cd fortune
docker build -t fortune:latest .

Execute docker images to confirm the new fortune:latest image is ready.

And then run it:

docker run --rm fortune:latest

Run it a few more times for additional fortune telling.

Apptainer - Containers in HPC Environments

On many clusters (including UVA’s), you cannot run the Docker daemon as an ordinary user: shared systems avoid giving everyone root-equivalent features that Docker traditionally needed. Apptainer (formerly Singularity) is a common alternative: you run container images as yourself, and you typically execute immutable .sif image files instead of talking to a long-lived daemon.

Go to your home directory. On the HPC system you also need to load the apptainer software module.

cd ~
module load apptainer

Creating an Apptainer image from a Docker image

The general form is apptainer pull <output.sif> <transport>://<image reference>. For images on Docker Hub, the transport is docker (for example docker://ubuntu:latest or docker://godlovedc/lolcow:latest).

apptainer pull lolcow-latest.sif docker://godlovedc/lolcow:latest

apptainer pull … docker://… downloads from a registry (often Docker Hub) and builds a local .sif file. This .sif file is self-contained and you can move it to other locations.

Pull a few more images (still in the directory where you want the .sif files):

apptainer pull ubuntu-latest.sif docker://ubuntu:latest
apptainer pull mysql-8.0.sif docker://mysql:8.0

Running the Apptainer image

apptainer run lolcow-latest.sif

apptainer run executes the container’s default entrypoint. In this case it will run the script that tells you a joke.

Alternatively you can use the apptainer exec command:

apptainer exec ubuntu-latest.sif cat /etc/os-release

When you use apptainer exec you need to specify the command to execute inside the container after the image filename, in this case cat /etc/os-release.

Interactive shell

apptainer shell ubuntu-latest.sif

Mounting storage

Similar to Docker volume mounts, Apptainer can bind host paths into the container for shell, exec, and run. Use --bind (short form -B) with host_path:container_path.

apptainer shell --bind .:/my_folder ubuntu-latest.sif

You can repeat --bind (or -B) for multiple mappings. See Apptainer bind paths in the official docs.

Advanced Concepts (Optional)

Running in Detached Mode

To run a container in detached mode, append the -d flag to the docker run command with the container image name:

docker run -d --name mysql-dbhost -e MYSQL_ROOT_PASSWORD=my-secret-pw mysql:8.0

Detached mode means the container runs in the background and your terminal prompt returns immediately. Use this when you want a long-running service (such as MySQL) to stay up while you continue using the same terminal for other commands. For example, start MySQL in detached mode, then run docker ps to confirm status before connecting with a client.

Add an Environment Variable

To inject ENV variables into a container, add the -e flag with a Key-Value mapping when you run the container:

docker run -it -e MYKEY=myvalue ubuntu:latest /bin/bash

Attach a Local Port

To map a local port from a container to your workstation, use the -p flag with a mapping of HOST_PORT:CONTAINER_PORT. This allows you to view/test a service listening on that port:

docker run -d --name mysql-dbhost -e MYSQL_ROOT_PASSWORD=my-secret-pw -p 6033:3306 mysql:8.0

Review Logs

To view the output logs from a running container:

docker logs 2ad2

Shell into a Running Container

Finally, to “hop” into a running container that is running in detached mode, use the exec -it command against the ID or name of the running container. Be sure to add a shell or other executable after the name of the container.

docker exec -it 2ad2 /bin/bash

More Build Examples

whalesay

This is a famous demo container created by Docker to demonstrate an interactive container image that takes input from a user. To build it, cd into this directory:

docker build -t whalesay .

To run it, simply append a command or quote or joke at the end of the run command:

docker run whalesay Hello everyone!

convert

This is a simple Python ETL pipeline. You can build it locally by changing into its directory and running:

docker build -t converter .

To try running it on your own, just map a directory to the /data path of the container and pass the fictional ID 0987654321 as a parameter:

docker run -v ${PWD}:/data converter -i 0987654321

Multi-Stage Builds

Multi-stage builds allow you to use multiple FROM statements in a Dockerfile, which helps create smaller final images by separating build dependencies from runtime dependencies:

# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

# Runtime stage
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "app.py"]

Docker Compose

Docker Compose allows you to define and run multi-container Docker applications using a YAML file. This is useful for orchestrating services that need to work together:

version: '3.8'
services:
  web:
    build: .
    ports:
      - "5000:5000"
  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

Note the build: . statement for the web service. The . refers to the current directory and it is assumed that it contains a Dockerfile with the image build instructions. In contrast, the redis service will utilize an existing image redis:alpine from a public repository.

Run with: docker compose up

Dockerfile Best Practices

Container Orchestration

For production environments, consider container orchestration platforms:

Running Apptainer with GPU support

If your host has NVIDIA GPUs and drivers available, Apptainer can expose them inside the container with the --nv flag.

apptainer exec --nv pytorch-latest.sif python -c "import torch; print(torch.cuda.is_available())"

You can also test GPU visibility with:

apptainer exec --nv pytorch-latest.sif nvidia-smi

If GPUs are configured correctly, these commands should report at least one CUDA device.

Making Apptainer images executable

Apptainer images include a default runscript. If you mark the .sif file as executable, you can launch it directly instead of typing apptainer run each time:

chmod +x lolcow-latest.sif
./lolcow-latest.sif

This is functionally similar to:

apptainer run lolcow-latest.sif

Resources

Container Registries