Perpetual SLURM Workflow Job¶
The latest version of this example can be found at: https://github.com/reproducible-reporting/stepup-queue/tree/main/docs/examples/slurm-perpetual/
For extensive workflows, it is often useful to submit the workflow itself to the queue as a job. It is generally preferred to run the workflow on a compute node of the cluster, as this allows for better resource management and prevents overloading the login node. However, most clusters impose a limit on the maximum wall time of a job, which can result in the workflow job being interrupted. This example shows how to work around this limitation by using a perpetual self-submitting job.
At the start of the job, a background process is launched that will end StepUp before the wall time limit is reached if StepUp has not ended on its own. When StepUp is interrupted, a temporary file is created. This file is later used as a signal that the workflow job needs to be resubmitted. This technique can be used with any type of job and is not specific to StepUp.
Here, we use a very short runtime to quickly demonstrate StepUp Queue’s features. In practice, you can let the StepUp job run for several hours or even days at a time, and stop it about 30 minutes before the wall time limit is reached.
Files¶
plan.py
is a Python script that defines the workflow:
#!/usr/bin/env python3
from stepup.core.api import static
from stepup.queue.api import sbatch
static("step1/", "step1/slurmjob.sh", "step2/", "step2/slurmjob.sh")
sbatch("step1/", out="../intermediate.txt")
sbatch("step2/", inp="../intermediate.txt")
step1/slurmjob.sh
is the first SLURM job:
#!/usr/bin/env bash
#SBATCH --job-name step1
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:02:00
# Give the CPU a break...
sleep 30
echo Done > ../intermediate.txt
step2/slurmjob.sh
is the second SLURM job:
#!/usr/bin/env bash
#SBATCH --job-name step2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=00:02:00
# Give the CPU a break...
sleep 30
cat ../intermediate.txt
workflow.sh
is the SLURM job script that runs the workflow:
#!/usr/bin/env bash
#SBATCH --job-name stepup
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=stepup-%j.out
#SBATCH --time=00:01:00
# In production, --time=12:00:00 is a reasonable time limit.
echo "StepUp workflow job starts:" $(date)
# If needed, load required modules and activate a relevant virtual environment.
# For example:
# module load Python/3.12.3
# activate venv/bin/activate
# Create a temporary directory to store a file that will be used as a flag
# to indicate that resubmission is needed.
STEPUP_QUEUE_FLAG_DIR=$(mktemp -d)
echo "Created temporary directory: $STEPUP_QUEUE_FLAG_DIR"
trap 'rm -rv "$STEPUP_QUEUE_FLAG_DIR"' EXIT
# Start a background process that will end stepup near the wall time limit.
# The first shutdown will wait for running steps to completed.
# The second will forcefully terminate remaining running steps.
echo "Starting background process to monitor wall time."
(
sleep 30; # In production, 39600 seconds is reasonable.
touch ${STEPUP_QUEUE_FLAG_DIR}/resubmit;
stepup shutdown;
sleep 10; # In production, 300 seconds is reasonable.
stepup shutdown
) &
BGPID=$!
trap "kill $BGPID" EXIT
# Start StepUp with 5 workers.
# This means that at most 5 jobs will be submitted concurrently.
# You can adjust the number of workers based on your needs.
# In fact, because this example is simple, a single worker would be sufficient.
# Note that the number of workers is unrelated
# to the single core used by this workflow script.
echo "Starting stepup with a maximum of 5 concurrent jobs."
stepup boot -n 5
# Use the temporary file to determine if the workflow script must be resubmitted.
echo "Checking if stepup was forcibly stopped."
if [ -f ${STEPUP_QUEUE_FLAG_DIR}/resubmit ]; then
echo "Resubmitting job script to let StepUp finalize the workflow."
sbatch workflow.sh
else
echo "Stepup was stopped gracefully."
fi
echo "StepUp workflow job ends:" $(date)