Usage

All usage of computer resources is manged through the Slurm¹ Workload Manager.

In addition, the server is equipped with a number of modules that can be loaded through the module command.

Modules

Modules are pre-compiled software that you can load into your shell environment.

To see the available modules, you can run the following command:

module avail

And to load a module:

module load <module-name>

And unload again:

module unload <module-name>

Slurm

Slurm is a job scheduler and resource manager for the compute resources available.

Status

To see the attached resources, you can run the following command:

sinfo -N -l

To see the jobs that are currently running, you can run the following command:

squeue

for a specific user:

squeue -u <username>

Submitting jobs

Batch jobs

A typical SLURM script consists of three main sections:

SLURM directives (#SBATCH)
Environment setup
Execution commands

Here's a detailed example:

script.sh

#!/bin/bash

#----------------------------------------
# SLURM Directives
#----------------------------------------
#SBATCH --chdir=/projects/main_compute-AUDIT/    # Working directory
#SBATCH --job-name alphafoldtestjobname          # Job name
#SBATCH --mem=50G                                # Memory requirement
#SBATCH --ntasks=1                               # Number of tasks
#SBATCH --cpus-per-task=1                        # CPU cores per task
#SBATCH --nodes=1                                # Number of nodes
#SBATCH --mail-type=begin                        # Email at job start
#SBATCH --mail-type=end                          # Email at job end
#SBATCH --mail-user=abc123@ku.dk                 # Email address
#SBATCH --gres=gpu:1                             # GPU requirement
#SBATCH --time=10:00:00                          # Maximum runtime of 10 hours

#----------------------------------------
# Environment Setup
#----------------------------------------
# Load required modules
module load miniconda/4.10.4
conda activate alphafold

#----------------------------------------
# Job Execution
#----------------------------------------
# Change to working directory
cd /projects/main_compute-AUDIT/data/alphafold

# Run the main script
bash run_alphafold.sh \
    -d /projects/testproject1/data/genetic_databases/ \
    -o /projects/testproject1/people/btj820/ \
    -m model_1 \
    -f example/query.fasta \
    -t 2020-05-14

Run the script with:

sbatch script.sh

Once the job is submitted you will receive a job-id, which you can use to check the status of the job with:

# Detailed job information
scontrol show job <job-id>

Also, the output of the job will be saved in a file with the name slurm-<job-id>.out in the working directory specified.

tail -f slurm-<job-id>.out

Tip

You can specify an output directory for your slurm outputs in your slurm script by adding:

#SBATCH --output=./slurm/slurm-%j.out                           # Output file path (%j is the job ID)

Note

Make sure that the directory exists before running sbatch Ie: mkdir slurm

And then see follow the output of the last running job by running:

tail -F "slurm/$(ls -t ./slurm | head -n 1)"

To make this easier, you could add a function to your .bashrc:

echo '
sfollow() {
    local latest_file=$(ls -t ./slurm/ | head -n 1)
    tail -n +1 -F "./slurm/$latest_file"
}
' >> ~/.bashrc

Then just type sfollow to see the output of a slurm job and follow it in real time.

Get the node information:

scontrol show node

To stop a running job:

scancel <job-id>

Interactive jobs

To start a simple interactive shell with 2 CPU cores, 5GB ram, 1 v100 GPU you can run the following command:

Tip

If copy pasting doesn't work for the multi line code snippets, try switching between selecting the text and using the copy button in the top right corner

srun -w sodasgpun01fl --partition=gpuqueue \ #(1)!
    --ntasks-per-node=2 \ #(2)!
    --mem=50GB \ #(3)!
    --gres=gpu:v100:1 \ #(4)!
    --time=240 \ #(5)!
    --pty /bin/bash -i #(6)!

Standard node and partition configuration
Number of CPU cores
Amount of memory (RAM)
Number of GPUs
Maximum time to run the task in minutes
Run task in pseudo terminal.
Change to ~/bin/zsh if you installed zsh and wish to use that instead.

This will start a new shell session with the allocated resources. This means that exiting the shell (e.g. when logging out of the server) will release the resources. To prevent this, you can start a persistent session with tmux.

Check that you have access to the GPU by running

Tip

To run a new interactive shell session in an already running slurm job:

srun --jobid=<slurm-id> --pty /bin/bash -i

To request an extension of the time limit for a running job:

scontrol update JobId=<job-id> TimeLimit=<new-time>

nvidia-smi

You will need to reload modules and/or activate environments in the new shell.

source .venv/bin/activate

Jupyter Notebook

To start a Jupyter Notebook, you need to first allocate resources on the server:

srun -w sodasgpun01fl --partition=gpuqueue \
  --ntasks-per-node=2 \
  --mem=5GB \
  --pty /bin/bash -i

Then, within the newly created interactive slurm session, and a folder containing a python uv project, run:

uv add jupyter

Activate the virtual environment:

source .venv/bin/activate

Now, you can start the notebook server:

jupyter notebook --port=8880 --ip=10.84.10.216 --no-browser

Then copy the generated link and paste it in your local computer's browsers.

I.e: http://10.84.10.216:8800/?token=abcd1234...

Info

To above code works when you have entered an interactive slurm session. Don't change the port or the url, since they are required for access to the server.

To start a jupyter notebook on the head node instead, you have to specify a port when you access the server via ssh, and then also refer to that port in the jupyter notebook command.

ssh -L 8000:localhost:8000 abc123@sodashead01fl.unicph.domain
...
jupyter notebook --port=8000 --no-browser

VSCode

In order to make the resources from slurm available to VSCode, follow the steps above and start a jupyter session.

Then, in VSCode, when you open a Notebook Ctrl+Shift+P and search for Notebook: Select Notebook Kernel. If a kernel is already suggested, click Select another kernel... then Existing Jupyter Server... and copy the link with the token from above into the field.

Jupyter Kernels and Virtual Environments

TBD: When do you need to do this?

To register a virtual environment with Jupyter, you can run the following command, from within your environment (that is, after activating it and making sure that ipykernel (uv pip install ipykernel) is installed).

python -m ipykernel install --user

Persistent sessions

Use tmux to create and manage persistent sessions on the server.

Start a new tmux session

tmux new -s <session-name>

List tmux sessions

tmux ls

Attach tmux session

tmux a -t <session name>

Detach (when you are inside) the session from tmux, leaving everything running in the background

Ctrl+B D

Docker

The server is equipped with a subset of docker, called udocker. It requires the anaconda3 module to be loaded first.

module load anaconda3/2020.11
module load udocker

Resources

The UCPH guide to HPC systems

Five part video series introducing Slurm

The official slurm cheatsheet

TMUX cheatsheet

Simple Linux Utility for Resource Management. ↩