First Step¶
The goal of the first tutorial is to introduce the basic usage of StepUp. For the sake of simplicity, a minimal workflow will be defined that does very little.
Example¶
Example source files: docs/getting_started/first_step/
Creating a Step¶
Create a file plan.py with the following contents:
Make this file executable with chmod +x plan.py.
- The first line is required to have the plan executed by the Python 3 interpreter.
- The second line imports the
run()function from StepUp Core. The modulestepup.core.apicontains functions to communicate with the director process of StepUp to define steps and other parts of the workflow. - The last line defines a step that writes
Hello Worldto the standard output. The (first) argument ofrun()is a single string: the command to execute.
If you want to use shell features in the command, such as pipes or IO redirection
(echo Hello World > hello.txt), you need to set the keyword argument shell=True:
This comes at the extra cost of running a shell process, so it is disabled by default.
Note that StepUp does not provide any standard input. It does capture standard output and error, as shown below.
Running StepUp¶
In the same directory, run:
- The
buildsubcommand starts the StepUp terminal user interface and the director process in the background, which will begin executing steps defined inplan.py. - The option
-j 1limits parallel execution to a single step at a time.
You should see the following output, with colors if your virtual terminal supports them:
DIRECTOR │ Listening on /tmp/stepup-########/director (StepUp Core 3.2.3.post54)
STARTUP │ (Re)initialized boot script
PHASE │ build
START │ ./plan.py
SUCCESS │ ./plan.py
START │ echo Hello World
SUCCESS │ echo Hello World
─────────────────────────────── Standard output ────────────────────────────────
Hello World
────────────────────────────────────────────────────────────────────────────────
DIRECTOR │ Trying to delete 0 outdated output(s)
DIRECTOR │ See you!
Let’s analyze the output:
- The first four lines are part of StepUp startup sequence.
The address
/tmp/stepup-########/directoris a Unix domain socket through which the director receives instructions from other processes to define the workflow. (The hash signs represent random characters.) - The
STARTandSUCCESSlines are shown for steps executed by StepUp:- The step
./plan.pyis created by default and runs the script that you just created. - Then the step
echo Hello Worldis defined inplan.py.
- The step
- When a step produces output, it is shown after the step has completed.
- When no more steps can be executed, StepUp checks if it can clean up outdated outputs and then exits.
Re-running StepUp¶
Now repeat the execution of StepUp with:
You will see a slightly different output:
DIRECTOR │ Listening on /tmp/stepup-########/director (StepUp Core 3.2.3.post54)
STARTUP │ Making failed steps pending
STARTUP │ Watching directories for 1 files from initial database
STARTUP │ Making steps pending that use changed environment variables
STARTUP │ Scanning initial database for changed files
STARTUP │ Scanning initial database for new nglob matches
PHASE │ build
DIRECTOR │ Trying to delete 0 outdated output(s)
DIRECTOR │ See you!
The startup sequence is now a bit longer because StepUp loads the workflow from .stepup/graph.db,
which was created in the first run.
It looks for relevant file changes and because plan.py has not changed,
it will not rerun it.
If file time stamps have changed, it will also check if files have actually changed
by comparing a SHA-256 hash
of input files, used environment variables and produced outputs.
When you manually remove .stepup/graph.db,
StepUp will not know anymore that it already executed some steps and runs all of them again.
run() versus step()¶
In this first example, either of the following two lines would result in the same output:
The second form is a more low-level function with more detailed control and fewer sanity checks.
It is primarily intended for developers of StepUp extensions.
For most end users, run() is more convenient and should be preferred.
For example, with run(), if the program is a local script in your workflow, e.g. ./script.py,
StepUp will automatically track it as a dependency of the step and rerun it when it changes.
Filenames with Spaces¶
A filename that contains spaces must therefore be quoted, so that it is treated as a single argument instead of several:
The inp, out, vol and env keyword arguments are plain lists of strings,
not command lines, so the filenames in them never need quoting:
That said, spaces in filenames are best avoided altogether.
They add no value and create avoidable friction:
every command that references such a file has to quote it,
which makes the plan.py scripts harder to read and easier to get wrong.
Stick to names that combine letters, digits, hyphens and underscores,
and reserve spaces for the contents of files rather than their names.
Try the Following¶
-
Change the arguments of the
echocommand inplan.pyand runsb -j 1again. As expected, StepUp detects the change and repeats theplan.pyandechosteps. -
Normally, you would never run
./plan.pydirectly as a normal Python script, i.e., without running it throughstepup. Try it anyway, just to see what happens. The terminal output shows the commands that would normally be sent to the StepUp director process whenplan.pyis executed bystepup. You should get the following screen output.This output contains internal details of StepUp, which can be useful for debugging purposes.