How to Create Workflows
This guide shows different methods for creating Torc workflows, from the most common (specification files) to more advanced approaches (CLI, API).
Using Workflow Specification Files (Recommended)
The easiest way to create workflows is with specification files. Torc supports YAML, JSON5, and KDL formats.
Create from a YAML File
torc workflows create workflow.yaml
Create from JSON5 or KDL
torc workflows create workflow.json5
torc workflows create workflow.kdl
Torc detects the format from the file extension.
Create and Run in One Step
For quick iteration, combine creation and execution:
# Create and run locally
torc run workflow.yaml
# Create and submit to Slurm
torc submit workflow.yaml
For format syntax and examples, see the Workflow Specification Formats reference.
Using the CLI (Step by Step)
For programmatic workflow construction or when you need fine-grained control, create workflows piece by piece using the CLI.
Step 1: Create an Empty Workflow
torc workflows new \
--name "my_workflow" \
--description "My test workflow"
Output:
Successfully created workflow:
ID: 1
Name: my_workflow
User: dthom
Description: My test workflow
Note the workflow ID (1) for subsequent commands.
Step 2: Add Resource Requirements
torc resource-requirements create \
--name "small" \
--num-cpus 1 \
--memory "1g" \
--runtime "PT10M" \
1 # workflow ID
Output:
Successfully created resource requirements:
ID: 2
Workflow ID: 1
Name: small
Step 3: Add Files (Optional)
torc files create \
--name "input_file" \
--path "/data/input.txt" \
1 # workflow ID
Step 4: Add Jobs
torc jobs create \
--name "process_data" \
--command "python process.py" \
--resource-requirements-id 2 \
--input-file-ids 1 \
1 # workflow ID
Step 5: Initialize and Run
# Initialize the workflow (resolves dependencies)
torc workflows initialize-jobs 1
# Run the workflow
torc run 1
Using the Python API
For complex programmatic workflow construction, use the Python client:
from torc import make_api
from torc.openapi_client import (
WorkflowModel,
JobModel,
ResourceRequirementsModel,
)
# Connect to the server
api = make_api("http://localhost:8080/torc-service/v1")
# Create workflow
workflow = api.create_workflow(WorkflowModel(
name="my_workflow",
user="myuser",
description="Programmatically created workflow",
))
# Add resource requirements
rr = api.create_resource_requirements(ResourceRequirementsModel(
workflow_id=workflow.id,
name="small",
num_cpus=1,
memory="1g",
runtime="PT10M",
))
# Add jobs
api.create_job(JobModel(
workflow_id=workflow.id,
name="job1",
command="echo 'Hello World'",
resource_requirements_id=rr.id,
))
print(f"Created workflow {workflow.id}")
For more details, see the Map Python Functions tutorial.
Using the Julia API
The Julia client provides similar functionality for programmatic workflow construction:
using Torc
import Torc: APIClient
# Connect to the server
api = make_api("http://localhost:8080/torc-service/v1")
# Create workflow
workflow = send_api_command(
api,
APIClient.create_workflow,
APIClient.WorkflowModel(;
name = "my_workflow",
user = get_user(),
description = "Programmatically created workflow",
),
)
# Add resource requirements
rr = send_api_command(
api,
APIClient.create_resource_requirements,
APIClient.ResourceRequirementsModel(;
workflow_id = workflow.id,
name = "small",
num_cpus = 1,
memory = "1g",
runtime = "PT10M",
),
)
# Add jobs
send_api_command(
api,
APIClient.create_job,
APIClient.JobModel(;
workflow_id = workflow.id,
name = "job1",
command = "echo 'Hello World'",
resource_requirements_id = rr.id,
),
)
println("Created workflow $(workflow.id)")
The Julia client also supports map_function_to_jobs for mapping a function across parameters, similar to the Python client.
Choosing a Method
| Method | Best For |
|---|---|
| Specification files | Most workflows; declarative, version-controllable |
| CLI step-by-step | Scripted workflows, testing individual components |
| Python API | Complex dynamic workflows, integration with Python pipelines |
| Julia API | Complex dynamic workflows, integration with Julia pipelines |
Common Tasks
Validate a Workflow File Without Creating
Use --dry-run to validate a workflow specification without creating it on the server:
torc workflows create --dry-run workflow.yaml
Example output:
Workflow Validation Results
===========================
Workflow: my_workflow
Description: A sample workflow
Components to be created:
Jobs: 100 (expanded from 1 parameterized job specs)
Files: 5
User data records: 2
Resource requirements: 2
Slurm schedulers: 2
Workflow actions: 3
Submission: Ready for scheduler submission (has on_workflow_start schedule_nodes action)
Validation: PASSED
For programmatic use (e.g., in scripts or the dashboard), get JSON output:
torc -f json workflows create --dry-run workflow.yaml
What Validation Checks
The dry-run performs comprehensive validation:
Structural Checks:
- Valid file format (YAML, JSON5, KDL, or JSON)
- Required fields present
- Parameter expansion (shows expanded job count vs. original spec count)
Reference Validation:
depends_onreferences existing jobsdepends_on_regexespatterns are valid and match at least one jobresource_requirementsreferences existschedulerreferences existinput_filesandoutput_filesreference defined filesinput_user_dataandoutput_user_datareference defined user data- All regex patterns (
*_regexesfields) are valid
Duplicate Detection:
- Duplicate job names
- Duplicate file names
- Duplicate user data names
- Duplicate resource requirement names
- Duplicate scheduler names
Dependency Analysis:
- Circular dependency detection (reports all jobs in the cycle)
Action Validation:
- Actions reference existing jobs and schedulers
schedule_nodesactions have requiredschedulerandscheduler_type
Scheduler Configuration:
- Slurm scheduler node requirements are valid
- Warns about heterogeneous schedulers without
jobs_sort_method(see below)
Heterogeneous Scheduler Warning
When you have multiple Slurm schedulers with different resource profiles (memory, GPUs, walltime, partition) and jobs without explicit scheduler assignments, the validation warns about potential suboptimal job-to-node matching:
Warnings (1):
- Workflow has 3 schedulers with different memory (mem), walltime but 10 job(s)
have no explicit scheduler assignment and jobs_sort_method is not set. The
default sort method 'gpus_runtime_memory' will be used (jobs sorted by GPUs,
then runtime, then memory). If this doesn't match your workload, consider
setting jobs_sort_method explicitly to 'gpus_memory_runtime' (prioritize
memory over runtime) or 'none' (no sorting).
This warning helps you avoid situations where:
- Long-walltime nodes pull short-runtime jobs
- High-memory nodes pull low-memory jobs
- GPU nodes pull non-GPU jobs
Solutions:
- Set
jobs_sort_methodexplicitly in your workflow spec - Assign jobs to specific schedulers using the
schedulerfield on each job - Accept the default
gpus_runtime_memorysorting if it matches your workload
Bypassing Validation
To create a workflow despite validation warnings:
torc workflows create --skip-checks workflow.yaml
Note: This bypasses scheduler node validation checks (which are treated as errors), but does not bypass all errors. Errors such as missing references or circular dependencies will always prevent creation.
List Available Workflows
torc workflows list
Delete a Workflow
torc workflows delete <workflow_id>
View Workflow Details
torc workflows get <workflow_id>
Defining File Dependencies
Jobs often need to read input files and produce output files. Torc can automatically infer job dependencies from these file relationships using variable substitution:
files:
- name: raw_data
path: /data/raw.csv
- name: processed_data
path: /data/processed.csv
jobs:
- name: preprocess
command: "python preprocess.py -o ${files.output.raw_data}"
- name: analyze
command: "python analyze.py -i ${files.input.raw_data} -o ${files.output.processed_data}"
Key concepts:
${files.input.NAME}- References a file this job reads (creates a dependency on the job that outputs it)${files.output.NAME}- References a file this job writes (satisfies dependencies for downstream jobs)
In the example above, analyze automatically depends on preprocess because it needs raw_data as input, which preprocess produces as output.
For a complete walkthrough, see Tutorial: Diamond Workflow.
Next Steps
- Tutorial: Diamond Workflow - Learn file-based dependencies with the fan-out/fan-in pattern
- Workflow Specification Formats - Detailed format reference
- Job Parameterization - Generate multiple jobs from templates
- Tutorial: Many Independent Jobs - Your first workflow