Skip to content

Usage

The sbatch Function

If you want to submit a job to the queue as part of a StepUp workflow, you must first prepare a directory with a job script called slurmjob.sh. This can be either a static file or the output of a previous step in the workflow. The function sbatch() will then submit the job to the queue. For simplicity, the following example assumes that the job script is static:

from stepup.core.api import static
from stepup.queue.api import sbatch

static("compute/", "compute/slurmjob.sh")
sbatch("compute/")

All arguments to the sbatch command of SLURM must be included in the slurmjob.sh script with #SBATCH directives. You can only submit one job from a given directory.

When the workflow is executed, the sbatch step will submit the job to the queue. It will then wait for the job to complete, just like sbatch --wait. Unlike sbatch --wait, it can also wait for a previously submitted job to complete. This can be useful when the workflow gets killed for some reason.

The standard output and error of the job are written to slurmjob.out and slurmjob.err, respectively.

The current status of the job is written to (and read from) the slurmjob.log file. By default, the job is not resubmitted if slurmjob.log exists. Instead, it waits for the job to complete without resubmitting it. You can remove slurmjob.log to ensure that the job is resubmitted, but this is obviously dangerous if the job is still running.

If the inputs of the job specified with sbatch("compute/", inp=["inp.txt"]) have changed, restarting the workflow will by default raise an exception. Ideally, you should clean up old outputs before restarting the workflow, and check that you really want to remove the data before doing so. If you feel this is overly cautious, you can set the STEPUP_QUEUE_RESUBMIT_CHANGED_INPUTS environment variable to "yes" to allow the workflow to resubmit jobs with changed inputs. Old outputs are not removed before resubmission. It is assumed that your job script will perform the necessary cleanup itself.

Examples

  • A simple example with static and dynamically generated job scripts can be found in the examples/slurm-basic/.

  • The example examples/slurm-perpetual/ shows how to run StepUp itself as a job in the queue, which cancels and submits itself again when nearing the wall time limit, if the workflow has not yet completed.

Killing running jobs

If you decide that you want to interrupt the workflow and cancel all running SLURM jobs, it is not enough to simply kill or stop StepUp. You must also cancel the jobs in the SLURM queue. This can be done by running the following command from the top-level directory of the workflow:

stepup canceljobs

It is part of the design of StepUp Queue’s not to automatically cancel jobs when the workflow is interrupted. It is quite common for a workflow to be interrupted by accident or for technical reasons. In this case, it would be inefficient to also cancel running jobs, which may still be doing useful work. Instead, jobs continue to run and you can restart the StepUp workflow to pick up where it left off.

After having cancelled jobs, it is still your responsibility to clean up files in the workflow. Removing them is not always desirable, so this is not done automatically.

Technical Details

The timestamps in the log file have a low resolution of about 1 minute. The job state is only checked every 30–40 seconds to avoid overloading the Job Scheduler. Information from slurmjob.log is maximally reused to avoid unnecessary scontrol calls.

The status of the job is inferred from scontrol show job, if relevant with a --cluster argument. To further minimize the number of scontrol calls in a parallel workflow, its output is cached and stored in ~/.cache/stepup-queue. The cached results are reused by all sbatch actions, so the number of scontrol calls is independent of the number of jobs running in parallel.

The time between two scontrol calls (per cluster) can be controlled with the STEPUP_SBATCH_CACHE_TIMEOUT environment variable, which is "30" (seconds) by default. Increase this value if you want to reduce the burden on Slurm.

The cached output of scontrol is checked with a randomized polling interval. The randomization guarantees that concurrent calls to scontrol (for multiple clusters) will not all coincide. The polling time can be controlled with two additional environment variables:

  • STEPUP_SBATCH_POLLING_INTERVAL = the minimal polling interval in seconds, default is "10".
  • STEPUP_SBATCH_TIME_MARGIN = the width of the uniform distribution for the polling interval in seconds, default is "5".