DS2002 Data Science Systems

Course materials and documentation for DS2002

View the Project on GitHub ksiller/ds2002-course

High Performance Computing

The goal of this activity is to familiarize you with high performance computing (HPC) systems and concepts. HPC is essential for running computationally intensive workloads, parallel processing, and leveraging specialized hardware for data science tasks that require significant computational resources.

If the initial examples and lab feel like a breeze, challenge yourself with activities in the Advanced Concepts section and explore the resource links at the end of this post.

Setup

To follow along, complete the HPC Setup instructions if you haven’t already. Then log into the UVA HPC cluster and create a dedicated directory for job input and output so batch jobs run from scratch instead of your home directory:

mkdir -p /scratch/$USER/ds2002-jobruns

Working directory best practices

On Afton/Rivanna, each account has two main directories: /home/$USER/ and /scratch/$USER/. Use each for the right purpose so your jobs run efficiently and your data stays manageable.

Home directory (/home/$USER/ or ~)

Scratch directory (/scratch/$USER/)

Recommendation: Run batch jobs from scratch. For this course, set a working directory and cd into it before submitting or running job scripts:

WORKDIR=/scratch/$USER/ds2002-jobruns
mkdir -p $WORKDIR
cd $WORKDIR

Here, $USER is your computing ID (e.g. khs3z), so the path becomes something like /scratch/khs3z/ds2002-jobruns.

In-class exercises

Let’s get started and submit some jobs. All examples are in practice/08-hpc.

Serial Job

Serial jobs use only a single CPU core. This is in contrast to parallel jobs which use multiple CPU cores simultaneously.

Step 1: The job script

Afton/Rivanna uses Slurm (Simple Linux Utility for Resource Management) to manage job submissions and scheduling. Jobs are scheduled by submitting a job script to the scheduler.

A job script is a specialized Bash script:

The first example is 01-hello.sh. It looks like this:

#!/bin/bash
#SBATCH --account=ds2002
#SBATCH --job-name=hello
#SBATCH --output=hello-%j.out
#SBATCH --error=hello-%j.err
#SBATCH --time=00:01:00
#SBATCH --partition=standard
#SBATCH --mem=8G
#SBATCH --nodes=1                   # that's the default for standard partition
#SBATCH --ntasks-per-node=1         # that's the default for standard partition
#SBATCH --cpus-per-task=1           # that's the default for standard partition

# Execute some simple commands to print information about the job
# The output will be saved to the files specified in the #SBATCH --output and #SBATCH --error directives.
echo "Hello, ds2002!"
echo "The job is running on the $(hostname) node."
echo "The job id is $SLURM_JOB_ID"
echo "The job name is $SLURM_JOB_NAME."
echo "Available resources for this job:"
echo "  - CPU Cores (per task): $SLURM_CPUS_PER_TASK"
echo "  - Memory: $SLURM_MEM_PER_NODE"

# Redirect some output to a file
echo "Redirection of echo/print output to specific file" > hello.txt
# uncomment next line to print all environment variables related to SLURM
# printenv | grep SLURM >> hello.txt

The first part defines the allocation to use, a jobname, output and error filenames, time limit, hardware partition, needed memory, nodes, and cpu cores. For complete reference, review References > Batch directives.

Step 2: Submit the job

We assume you have the fork of the course repository in ~/ds2002-course. Run the following two commands, update the paths if needed.

cd /scratch/$USER/ds2002-jobruns
sbatch ~/ds2002-course/practice/08-hpc/01-hello.sh

Following best practice, we change into a work directory in the /scratch folder first, then we submit the job with the sbatch command. sbatch is a command provided by Slurm resource manager. It will take your job script, inspect the #SBATCH directives and add it to the job queue.

You should see a message:

Submitted batch job 10058413

Common errors:

Step 3: Check job status

Active jobs

Now that the job is in the queue, we can check its status with the squeue command. You can add a job’s ID to get information about a particular job (update the job number with your specific job ID).

squeue --job 10058413

At first you may see something like this:

JOBID      PARTITION   NAME   USER    ST    TIME  NODES   NODELIST(REASON) 
10058413   standard-+  hello  khs3z   PD    3:05    1     (none)

The ST column indicates the job status; PD stands for PENDING, meaning the job is in the queue waiting to be allocated to hardware but not running yet.

You can run squeue 10058413 --start to get an estimate when the job may run. Note that this is a best guess estimate and not 100% accurate.

Depending on how busy the cluster is, the job may quickly be placed on a node and run. The squeue ST field would change to R (RUNNING).

Output:

JOBID      PARTITION   NAME   USER    ST    TIME  NODES   NODELIST(REASON) 
10058413   standard-+  hello  khs3z    R    3:05    1     udc-an38-5

If you want to see all your active jobs run:

squeue --me

Job history

The sacct command shows your active jobs (in the queue) as well as past jobs. Run this command and compare its output to the squeue command.

sacct -S 2026-03-01 

Output:

JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- --------
10057749          hello standard-+     ds2002          1  COMPLETED      0:0 
10057749.ba+      batch                ds2002          1  COMPLETED      0:0 

See References > Checking job status for a complete list of job states and their meaning.

It is important to review how long your completed jobs actually took to execute. We refer to this as the elapsed time, as opposed to the time limit specified in the job script.

sacct -S 2026-03-01 --format=jobid,jobname%25,partition,account,start,elapsed,end,state,exit

Step 4: Review output

When the job has successfully completed the 01-hello.sh script, you should have three files:

The first two are generated by Slurm and have collected all standard output or error messages produced during execution of the job script. Note that the filenames contain the job IDs, as we had specified with #SBATCH --output=hello-%j.out and #SBATCH --error=hello-%j.err. **If you are using a logger in Python, instead of print statements, they likely all end up in the .err files even for outputs from logger.info. That’s just a default, so don’t be concerned about that.

In addition, we created hello.txt to capture specific output by redirecting output of the echo command. This is simply to illustrate the flexibility you have to capture a subset of “terminal” message in specific files if needed.

Job Arrays

Job arrays run the same script many times with a different array index per run. Slurm sets SLURM_ARRAY_TASK_ID (e.g. 1, 2, 3, …) so each task can handle a different input. This is ideal for “one file per task” workflows.

Simple example: text processing with 5 files

Suppose you have five input files data-1.txtdata-5.txt and a script process-text.py that takes input and output paths.

Step 1: Create the input files

In your working directory (e.g. /scratch/$USER/ds2002-jobruns) run the provided script:

bash ~/ds2002-course/practice/08-hpc/create-text-files.sh

This creates data-1.txtdata-5.txt with different content so each file has a different word count. The job array script 02-jobarray-text.sh runs one task per file.

Step 2: Setup the job script

#!/bin/bash
#SBATCH --account=ds2002
#SBATCH --job-name=text-array
#SBATCH --output=text-array-%A_%a.out
#SBATCH --error=text-array-%A_%a.err
#SBATCH --partition=standard
#SBATCH --time=00:05:00
#SBATCH --mem=8G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --array=1-5

INPUT="data-${SLURM_ARRAY_TASK_ID}.txt"
OUTPUT="results-${SLURM_ARRAY_TASK_ID}.txt"

module load miniforge
source activate ds2002
python ~/ds2002-course/practice/08-hpc/process-text.py "$INPUT" "$OUTPUT"

Step 3: Submit the job script

In your WORKDIR, run:

sbatch ~/ds2002-course/practice/08-hpc/02-jobarray-text.sh

Step 4: Check job status

Use squeue and sacct to check on your jobs.

10112610_1   text-array          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_1.+      batch          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_2   text-array          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_2.+      batch          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_3   text-array          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_3.+      batch          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_4   text-array          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_4.+      batch          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_5   text-array          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0 
10112610_5.+      batch          1 2026-03-09T15:25:44   00:00:01  COMPLETED      0:0

Note the job ID format. The 10112610 designates the job array; the suffix (after the _) indicates the task ID.

Step 5: Review output

Slurm will schedule 5 tasks; each reads one data-<id>.txt and writes one results-<id>.txt.

Check with:

ls -ltr
-rw------- 1 khs3z users    0 Mar  9 18:52 text-array-10112610_2.err
-rw------- 1 khs3z users    0 Mar  9 18:52 text-array-10112610_3.err
-rw------- 1 khs3z users    0 Mar  9 18:52 text-array-10112610_1.err
-rw------- 1 khs3z users    0 Mar  9 18:52 text-array-10112610_5.err
-rw------- 1 khs3z users    0 Mar  9 18:52 text-array-10112610_4.err
-rw------- 1 khs3z users   52 Mar  9 18:52 text-array-10112610_1.out
-rw------- 1 khs3z users   42 Mar  9 18:52 results-1.txt
-rw------- 1 khs3z users   52 Mar  9 18:52 text-array-10112610_5.out
-rw------- 1 khs3z users   43 Mar  9 18:52 results-5.txt
-rw------- 1 khs3z users   52 Mar  9 18:52 text-array-10112610_2.out
-rw------- 1 khs3z users   43 Mar  9 18:52 results-2.txt
-rw------- 1 khs3z users   52 Mar  9 18:52 text-array-10112610_4.out
-rw------- 1 khs3z users   43 Mar  9 18:52 results-4.txt
-rw------- 1 khs3z users   52 Mar  9 18:52 text-array-10112610_3.out
-rw------- 1 khs3z users   43 Mar  9 18:52 results-3.txt

Advanced Concepts (Optional)

Single Node Multiprocessing Jobs

This section shows how to move from a purely serial Python job to a multi-process job on a single node using Python’s multiprocessing package and small tweaks to the Slurm directives in the job script.

The Python code is in serial-pi.py (single core processing) and mp-pi.py for multi-core processing. In these examples, π is estimated using the Monte Carlo method.

single core processing (serial)

The key function in serial-pi.py:

def calculate_pi(num_points):
    """
    Calculate pi using the Monte Carlo method. We use an inefficent for loop on purpose to demonstrate the serial nature of the computation.
    """
    master = random.Random()
    seed = master.randint(0, 2**32 - 1)
    random.seed(seed)
    logging.info(f"Calculating pi using {num_points} points")
    num_inside = 0
    for _ in range(num_points):
        x = random.random()
        y = random.random()
        if x**2 + y**2 <= 1:
            num_inside += 1
    return 4 * num_inside / num_points

This is a brute-force Monte Carlo approach to estimating the value of π:

Review the entire scripts, and submit a serial job like so (run from a directory on the cluster where the repo is available):

sbatch ~/ds2002-course/practice/08-hpc/03-serial-pi.sh

The script 03-serial-pi.sh sets NUM_POINTS=100000000 and OUTPUT_FILE=pi.csv internally.

As we increase the number of points (random dart throws), the estimate reflects the actual value of π more precisely. But, the calculation also takes much longer.

multi-core processing

If we have multiple CPU cores available, we can use the same Monte Carlo idea but split the work across processes where the number of processes matches the number of CPU cores.

Let’s take a look at an example, the Python script mp-pi.py which is invoked by the 04-mp-pi.sh job script.

The job script 04-mp-pi.sh is very similar to the serial version.

#!/bin/bash
#SBATCH --account=ds2002
#SBATCH --job-name=pi
#SBATCH --output=pi-%a.out
#SBATCH --error=pi-%a.err
#SBATCH --time=00:01:00
#SBATCH --partition=standard
#SBATCH --mem-per-cpu=8G
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8

NUM_POINTS=100000000
OUTPUT_FILE=pi.csv

# use $SLURM_CPUS_PER_TASK to get the number of CPU cores assigned to the job
# use $SLURM_MEM_PER_CPU to get the amount of memory assigned to each CPU core
echo "Number of CPU cores assigned to the job: $SLURM_CPUS_PER_TASK"
echo "Amount of memory assigned to each CPU core: $SLURM_MEM_PER_CPU"

# we pass the number of CPU cores assigned to the job to set up the ideal number of processes inside the python script
python ~/ds2002-course/practice/08-hpc/mp-pi.py $NUM_POINTS $OUTPUT_FILE $SLURM_CPUS_PER_TASK

Note the following #SBATCH directives:

The key adjustments in the mp-pi.py script are in the calculate_pi function and addition of a new function count_inside that is called from within calculate_pi.

def calculate_pi(num_points, num_processes):
    """
    Distribute num_points across num_processes workers; sum inside counts and compute pi.
    """
    logging.info(f"Calculating pi using {num_points} points and {num_processes} processes")
    master = random.Random()
    base_seed = master.randint(0, 2**32 - 1)
    # distribute the numpoints across the workers
    chunk_size = num_points // num_processes
    remainder = num_points % num_processes
    chunks = []
    for i in range(num_processes):
        n = chunk_size + (1 if i < remainder else 0)
        if n > 0:
            chunks.append((n, base_seed, i))
    logging.info(f"Chunks: {chunks}")

    # create a pool of workers and map the count_inside function to the chunks
    # counts is a list of results returned by each worker
    with Pool(processes=num_processes) as pool:
        counts = pool.map(count_inside, chunks)
    
    # aggregate the results
    total_inside = sum(counts)
    return 4.0 * total_inside / num_points
def count_inside(args):
    """
    Worker: generate n_points random (x,y) and return how many fall inside unit circle.
    args = (n_points, base_seed, worker_id) so each worker has distinct random number generator state.
    """
    n_points, base_seed, worker_id = args
    random.seed(base_seed + worker_id)
    num_inside = 0
    for _ in range(n_points):
        x = random.random()
        y = random.random()
        if x * x + y * y <= 1.0:
            num_inside += 1
    logging.info(f"Worker {worker_id}: {num_inside} points of {n_points} points inside are within the unit circle")
    return num_inside

In the Python script, we need to define a function that is called by each process, we call it count_inside. At its core it retains the same approach of randomly hitting the dart board. As arguments it needs the number of points to “test” (i.e. number of darts to throw at the dart board), a unique seed to initialize the random number generator, and the id of the worker process.

To submit the multiprocessing job (run from a directory on the cluster where the repo is available):

sbatch ~/ds2002-course/practice/08-hpc/04-mp-pi.sh

When the serial and multiprocessing jobs have completed, compare their runtimes:

sacct --format=jobid,jobname,alloccpus,start,elapsed,state,exit

GPU Jobs

PyTorch and the deep learning example

PyTorch is a widely used framework for deep learning. Training neural networks on large datasets is much faster on a GPU than on CPU, so HPC clusters provide GPU nodes and a way to request them in your job. This example runs the official PyTorch MNIST demo: it trains a small convolutional network on the MNIST handwritten-digit dataset and reports training progress and final test accuracy.

Job script (05-pytorch.sh)

The script requests a single GPU device on a single node and runs the example inside an Apptainer container with GPU support (more on containers in a few weeks):

Python script (pytorch-example.py)

The example (same as PyTorch examples – MNIST) trains a small CNN on MNIST for 14 epochs by default. It prints training loss per batch and, at the end of each epoch, test-set average loss and accuracy. All output goes to the job’s stdout (and thus to the .out file).

How to submit

From a directory on the cluster (e.g. /scratch/$USER/ds2002-jobruns):

sbatch ~/ds2002-course/practice/08-hpc/05-pytorch.sh

Expected results

When the job completes, check the output file (e.g. pytorch-<jobid>.out). The tail of the file will show the final test result for the last epoch, for example:

Test set: Average loss: 0.0283, Accuracy: 9912/10000 (99%)

Exact numbers may vary slightly. Training progress (epoch, batch, loss) appears above this final line.

References

Batch directives

Batch directives define hardware resources for job execution (e.g. number of CPU cores, memory, specialty hardware like GPUs, and time limit). For the Slurm scheduler, these directives start with the #SBATCH keyword.

Below are the most common directives used in the course examples. For the full list, see the Slurm sbatch documentation.

Allocation Account

#SBATCH --account=ds2002

Billing account or project to charge for the job. On Rivanna/Afton, use the allocation assigned to the course (e.g. ds2002).

To check allocations you have access to, run this command:

allocations

Output:

Account                      Balance        Reserved       Available                
-----------------          ---------       ---------       ---------                
ds2002                       1000000               0          999559 

The available column indicates how many service units are available. Service units reflect credits and are related to how much compute time you have access to. When the available balance is zero, you will not be able to submit any new jobs. In such case, reach out to your course instructor or research advisor who can request additional credits.

Job name

#SBATCH --job-name=hello

Human-readable name for the job. Shown in queue listings (squeue) and in job history.

Output and error files

#SBATCH --output=hello-%j.out
#SBATCH --error=hello-%j.err

File where standard output (stdout) and standard error (stderr) are written. %j is replaced by the job ID so each run gets unique files.

Partition

Queue (partition) to submit to. A partition defines a set of compute nodes; they are often grouped based on particular hardware specifications or job use cases.

#SBATCH --partition=standard

For this course, we’ll use:

If you use #SBATCH --partition=gpu on Afton/Rivanna, you have to combine this with a #SBATCH --gres directive to specify how many GPUs you need (and optionally which type), see GPU Jobs.

GPU resource (gres)

#SBATCH --gres=gpu:1

Requests one GPU for the job. Use gpu:2 (or more) for multi-GPU jobs. Only meaningful when using the gpu partition.

Nodes

#SBATCH --nodes=1

Number of compute nodes to allocate for the job. For most course jobs you use a single node (1). Multi-node jobs (e.g. --nodes=2) are for distributed runs across several machines and are rarely needed for the examples here. On many partitions, 1 is the default.

Tasks per node

#SBATCH --ntasks-per-node=1

Number of tasks (processes) to run per node. For a single serial or multithreaded program, use 1 task per node. Use more than one when you launch multiple separate processes (e.g. with MPI). On many partitions, 1 is the default.

CPUs per task

#SBATCH --cpus-per-task=4

Number of CPU cores per task. Useful for multi-threaded or data-loading workloads.

Memory

#SBATCH --mem=32G

Total memory (RAM) for the job on each node. Here, 32 GB. Adjust based on your script’s needs.

Alternatively, you can use #SBATCH --mem-per-cpu=9GB to specify how much memory should be allocated for each cpu core. This is the preferred way for code that involves multi-threading or multi-processing to prevent “starving” cpu cores of memory as the number of threads/processes is increased, see Single Node Multiprocessing Jobs.

The --mem and --mem-per-cpu directives are mutually exclusive; you have to choose one or the other.

Time limit

#SBATCH --time=00-00:10:00

Maximum wall-clock time (dd-HH:MM:SS). The job is terminated when the time limit is reached; 10 minutes in this example. Request only what you need so the job can start sooner.

Available partitions and limits

The qlist command shows all available partitions and how busy they are. You may not have access to all of them. The ones relevant for this course are interactive, standard, and gpu.

qlist
PARTITION                     TOTAL_CORES FREE_CORES  RUNNING_JOBS   PENDING_JOBS   TIMELIMIT    
----------------------------------------------------------------------------------------------------
interactive                   2208        710         28             1              12:00:00    
standard                      18032       10602       688            216            7-00:00:00  
dedicated                     368         96          1              3              infinite    
parallel                      17184       504         115            9              3-00:00:00  
gpu                           4104        2628        259            343            3-00:00:00  
bii                           4360        439         2              25             7-00:00:00  
bii-gpu                       608         0           1              7              3-00:00:00  
bii-largemem                  3744        336         1              1              7-00:00:00  
gpu-mig                       128         70          29             12             3-00:00:00 

The qlimits command shows the max number of cpu cores, memory, and nodes you can request in each partition.

qlimits
Queue          Maximum      Maximum          Minimum      Maximum       Maximum       Default       Maximum      Minimum     
(partition)    Submit       Cores(GPU)/User  Cores/Job    Mem/Node(MB)  Mem/Core(MB)  Mem/Core(MB)  Nodes/Job    Nodes/Job   
========================================================================================================================
standard       5000         cpu=1500                      384000+                     4000          1                        
dedicated      10000                                      768000+                     4000          UNLIMITED                
interactive    5000         cpu=24                        128000+                     4000          2                        
parallel       2000         cpu=6000                      768000                      4000          64           2           
gpu            10000        gres/gpu=32                   257000+                     4000          16                       
bii            10000        cpu=400                       384000                      4000          112                      
bii-gpu        10000                                      384000+                     4000          12                       
bii-largemem   10000                                      768000+                     4000          10                       
gpu-mig        2000         gres/gpu=28                   2000000       15000         4000          1   

Submit a Job

The sbatch command submits the job script to the scheduler. As best practice, change into a dedicated work directory before running sbatch. E.g.,

cd /scratch/$USER/ds2002-jobruns
sbatch ~/ds2002-course/practice/08-hpc/01-hello.sh

Output:

Submitted batch job 10058413

Checking Job Status

The squeue command shows you all the jobs currently in the job queue. You can add a job’s ID to get information about a particular job.

squeue --job 10058413

Output:

JOBID      PARTITION  NAME      USER    ST    TIME  NODES   NODELIST(REASON) 
10058413   gpu-a6000  pytorch-  khs3z    R    3:05    1     udc-an38-5

The ST field indicates the job’s status.

To only see all your active jobs, run:

squeue --me

Note: the squeue command only lists active jobs, either pending, running, or cancelling.

The sacct command shows your active jobs (in the queue) as well as past jobs.

sacct -S 2026-03-01 

Output:

JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- --------
10057749          hello standard-+     ds2002          4  COMPLETED      0:0 
10057749.ba+      batch                ds2002          4  COMPLETED      0:0 
10058297     pytorch-e+  gpu-a6000     ds2002          4     FAILED    127:0 
10058297.ba+      batch                ds2002          4     FAILED    127:0 
10058317     pytorch-e+  gpu-a6000     ds2002          4    TIMEOUT      0:0 
10058317.ba+      batch                ds2002          4  CANCELLED     0:15 
10058413     pytorch-e+  gpu-a6000     ds2002          4  COMPLETED      0:0 
10058413.ba+      batch                ds2002          4  COMPLETED      0:0 
10079753     pytorch-e+  gpu-a6000     ds2002          4    RUNNING      0:0 
10079753.ba+      batch                ds2002          4    RUNNING      0:0

Resources