Skip to content

Welcome to StepUp Core

StepUp is a simple, powerful and universal build tool, a modern alternative to Make.

StepUp, like most build tools, schedules and executes commands in parallel. The scheduling takes into account that input files for a command must be available. Build tools also keep track of which other commands can create these files.

This is the documentation for StepUp Core, the basic framework for StepUp, without any domain-specific extensions. Domain-specific features are implemented in extension packages. Currently, there are:

  • StepUp RepRep extension for creating reproducible reports: papers, presentations, theses, etc.
  • StepUp Queue submits jobs to a SLURM scheduler.

Quick Visual Impression of StepUp

StepUp in the Wild

A real-world example of StepUp is the AutoCorrelation Integral Drill (ACID) dataset. This project uses a StepUp workflow to regenerate a 43 GB collection of 15360 synthetic time series used to benchmark algorithms that compute the autocorrelation integral. The workflows generate the dataset, package and upload it to Zenodo, and validate the results.

Why Was StepUp Created?

StepUp is a greenfield project inspired by similar tools, such as Ninja, pydoit and tup.

The defining feature of StepUp is that it treats the generation and execution of the build workflow as one and the same thing. This may sound abstract, so let’s break it down by reviewing how build tools work and have evolved over time.

Traditional build tools run programs in parallel using a fixed workflow as input: you must define this workflow in advance by writing all steps and their dependencies in a text file, such as a Makefile. In practice, these workflow definitions are rarely written manually. Instead, they are often generated by other tools, such as CMake or Automake, which handle the configuration and discovery of build steps. This separation into generation and execution simplifies the build tool, but it also prevents steps from being defined using the output of previous steps.

More modern build tools, such as Bazel, Meson and Buck2 have abandoned the traditional separation between build generator and executor. They introduce a domain-specific language (DSL), such as Starlark, to specify the build steps. These DSLs are designed to be powerful for build tasks, but are limited in what they can do for security reasons.

For software compilation, established build tools usually make acceptable assumptions, and workarounds exist for certain exceptions, see for example depfile, deps, dyndep and generator rule in Ninja. In build scenarios other than software compilation, e.g., building a scientific publication from LaTeX sources and raw data, these workarounds are too limited. For example, if a LaTeX source contains \input commands with TeX files generated by a script, one cannot know in advance whether these generated TeX files will reference additional input files, e.g., figures. In such cases, it is natural to determine all inputs of a LaTeX document on the fly instead of doing so in advance. (StepUp RepRep’s predecessor, RepRepBuild, generated build instructions for Ninja and addressed this problem with an elaborate generator rule.)

StepUp overcomes such difficulties by taking a different approach. The stepup command starts a background process that can receive build steps from any step in the build process, via Remote Procedure Calls (RPCs). It uses this information to extend its workflow, which is internally represented as a partial directed acyclic graph. This process is bootstrapped by an initial plan.py script containing the first RPC calls. Each build step can use intermediate results to add new information to the workflow. Steps can even be added rather late in the build, if this is necessary to correctly define them.

The program tup deserves a special mention in this brief review. StepUp’s algorithm for rebuilding steps (in response to changed inputs) resembles that of tup. Tup traverses upwards through the dependency graph, and StepUp adopted this pattern. The “Up” part of StepUp’s name acknowledges this source of inspiration, while “Step” reflects how StepUp defines operations as individual steps.

Other Noteworthy Features

  • 🐍 Python Scripting: Build scripts are written in Python (in so-called plan.py files), instead of introducing a new domain-specific language (DSL).

  • πŸ•ΈοΈ Partial Directed Acyclic Graph (PDAG) Execution: StepUp supports PDAG execution, similar to tup. This means it can start steps even when the full workflow is not yet known. For example, at startup, it will begin executing steps before having complete knowledge of the entire build graph.

  • πŸ”„ Dynamic Workflows: While a step is running, it can inform StepUp that it needs additional inputs, in which case the step will be rescheduled for later execution (after the additional inputs have become available). Similarly, a step can define additional outputs during its execution.

  • πŸ—“οΈ Advanced Scheduling: StepUp can schedule steps in parallel to prioritize steps that are on the critical path of the workflow. In addition, one can define resources and resource limits for steps, which StepUp will respect when scheduling. For instance, this can be used to limit the number of steps using a GPU that can run in parallel.

  • 🌐 Graph Visualization Webserver: The stepup browse command launches a small local web server and opens your browser to interactively explore the build graph. It serves a snapshot of the workflow database, letting you inspect every node (files and steps), their states, and the dependencies connecting them. This makes it easy to understand and debug how a workflow is wired together.

  • πŸ‘οΈ Watching File Changes: StepUp always runs background processes (a director and a step executor) to execute steps, and a terminal frontend to control or interrupt the build. The director starts with a build phase to execute steps in parallel until the build is complete. When StepUp completes the build, it can switch to a watch phase to register file changes. When the user requests a rerun, it knows exactly which part of the DAG needs to be rebuilt. This allows efficient edit-build iterations to incrementally build and refine a project.

  • 🧹 Automatic Cleanup: Old outputs are automatically removed when the steps creating those files are removed from the workflow. This cleanup is only performed after a completely successful build.

  • ⏳ Progress Bar: The StepUp terminal user interface provides easy-to-follow progress information.

  • πŸ“₯ Input Declaration: A file must either be declared static (user-provided) or built (created by a step) before it can be used as input for steps. Static declarations are uncommon in other build tools and allow StepUp to correctly execute steps with partial knowledge of the workflow.

  • πŸ”‘ File Hashing: If a step’s input files have changed, a file hash is used to determine whether the file is different from a previous run before the step is re-run. This prevents unnecessary step executions in two common scenarios:

    • A file is changed and then reverted to its original state.
    • Switching between branches in Git.
  • 🌱 Environment Variable Dependencies: Steps can have environment variables as dependencies, so that they are flagged for re-execution after the variables change.

  • 🧩 Pattern Matching: Rich pattern-matching rules make it easy to multiplex a step over multiple similar inputs.

  • πŸ“€ Optional Outputs: Steps do not need to have output files. They will be rerun if inputs have changed since the last run.

When to Use StepUp?

Many build and workflow tools overlap with StepUp, so whether it fits depends on what you need. StepUp sits between the two categories, and is a good fit if you want either:

  • a build tool with features more typical of workflow tools β€” dynamic workflows, advanced scheduling, file hashing, rich metadata, or automatic cleanup; or
  • a workflow tool with features more typical of build tools β€” complex dependencies, pattern matching, or incremental building.

A few use cases are out of scope by design:

  • Very short steps (a few milliseconds each). The overhead of scheduling, file hashing, and process execution adds up. Workflow tools built on long-running pilot processes or threads are faster here, but risk carrying side effects from one step to the next, whereas StepUp prioritizes robustness and correctness.
  • Huge graphs (hundreds of thousands of steps or more). StepUp’s SQLite storage overhead becomes significant. A pure in-memory graph (with specialized data structures) can handle larger workflows more efficiently, at the cost of giving up the crash recovery that SQLite provides.