Tutorial 4: Simple Job Parameterization
This tutorial teaches you how to create parameter sweeps—generating multiple related jobs from a single job definition using Torc’s parameterization feature.
Learning Objectives
By the end of this tutorial, you will:
- Understand how parameterization expands one job definition into many jobs
- Learn the different parameter formats (lists, ranges)
- Know how to use format specifiers for consistent naming
- See how parameterization combines with file dependencies
Prerequisites
- Completed Tutorial 1: Many Independent Jobs
- Completed Tutorial 2: Diamond Workflow
- Torc server running
Why Parameterization?
Without parameterization, a 5-value hyperparameter sweep would require writing 5 separate job definitions. With parameterization, you write one definition and Torc expands it:
# Without parameterization: 5 separate definitions
jobs:
- name: train_lr0.0001
command: python train.py --lr=0.0001
- name: train_lr0.0005
command: python train.py --lr=0.0005
# ... 3 more ...
# With parameterization: 1 definition
jobs:
- name: train_lr{lr:.4f}
command: python train.py --lr={lr}
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
Step 1: Create the Workflow Specification
Save as learning_rate_sweep.yaml:
name: lr_sweep
description: Test different learning rates
jobs:
- name: train_lr{lr:.4f}
command: |
python train.py \
--learning-rate={lr} \
--output=/models/model_lr{lr:.4f}.pt
resource_requirements: gpu
output_files:
- model_lr{lr:.4f}
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
- name: evaluate_lr{lr:.4f}
command: |
python evaluate.py \
--model=/models/model_lr{lr:.4f}.pt \
--output=/results/metrics_lr{lr:.4f}.json
resource_requirements: gpu
input_files:
- model_lr{lr:.4f}
output_files:
- metrics_lr{lr:.4f}
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
- name: compare_results
command: |
python compare.py --input-dir=/results --output=/results/comparison.csv
resource_requirements: minimal
input_files:
- metrics_lr{lr:.4f}
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
files:
- name: model_lr{lr:.4f}
path: /models/model_lr{lr:.4f}.pt
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
- name: metrics_lr{lr:.4f}
path: /results/metrics_lr{lr:.4f}.json
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]"
resource_requirements:
- name: gpu
num_cpus: 8
num_gpus: 1
memory: 16g
runtime: PT2H
- name: minimal
num_cpus: 1
memory: 2g
runtime: PT10M
Understanding the Specification
Parameter Syntax:
{lr}- Simple substitution with the parameter value{lr:.4f}- Format specifier: 4 decimal places (e.g.,0.0010not0.001)
Parameter Values:
"[0.0001,0.0005,0.001,0.005,0.01]"- A list of 5 specific values
File Parameterization:
Notice that both jobs AND files have parameters:. When Torc expands:
- Each
train_lr{lr:.4f}job gets a correspondingmodel_lr{lr:.4f}file - The file dependencies are matched by parameter value
Dependency Flow:
train_lr0.0001→ outputsmodel_lr0.0001→ unblocksevaluate_lr0.0001train_lr0.0005→ outputsmodel_lr0.0005→ unblocksevaluate_lr0.0005- (and so on for each learning rate)
- All
evaluate_*jobs → unblockcompare_results
Step 2: Create and Initialize the Workflow
WORKFLOW_ID=$(torc workflows create learning_rate_sweep.yaml -f json | jq -r '.id')
echo "Created workflow: $WORKFLOW_ID"
torc workflows initialize-jobs $WORKFLOW_ID
Step 3: Verify the Expansion
# Count jobs (should be 11: 5 train + 5 evaluate + 1 compare)
torc jobs list $WORKFLOW_ID -f json | jq '.jobs | length'
List the job names:
torc jobs list $WORKFLOW_ID -f json | jq -r '.jobs[].name' | sort
Output:
compare_results
evaluate_lr0.0001
evaluate_lr0.0005
evaluate_lr0.0010
evaluate_lr0.0050
evaluate_lr0.0100
train_lr0.0001
train_lr0.0005
train_lr0.0010
train_lr0.0050
train_lr0.0100
Notice:
- One job per parameter value for
train_*andevaluate_* - Only one
compare_resultsjob (it has the parameter for dependencies, but doesn’t expand because its name has no{lr})
Step 4: Check Dependencies
torc jobs list $WORKFLOW_ID
Expected statuses:
- All
train_*jobs: ready (no input dependencies) - All
evaluate_*jobs: blocked (waiting for corresponding model file) compare_results: blocked (waiting for all metrics files)
Step 5: Run the Workflow
torc run $WORKFLOW_ID
Execution flow:
- All 5 training jobs run in parallel - They have no dependencies on each other
- Each evaluation unblocks independently - When
train_lr0.0001finishes,evaluate_lr0.0001can start (doesn’t wait for other training jobs) - Compare runs last - Only after all 5 evaluations complete
This is more efficient than a simple two-stage workflow because evaluations can start as soon as their specific training job completes.
Parameter Format Reference
List Format
Explicit list of values:
parameters:
lr: "[0.0001,0.0005,0.001,0.005,0.01]" # Numbers
opt: "['adam','sgd','rmsprop']" # Strings (note the quotes)
Range Format
For integer or float sequences:
parameters:
i: "1:100" # Integers 1 to 100 (inclusive)
i: "0:100:10" # Integers 0, 10, 20, ..., 100 (with step)
lr: "0.0:1.0:0.1" # Floats 0.0, 0.1, 0.2, ..., 1.0
Format Specifiers
Control how values appear in names:
| Specifier | Example Value | Result |
|---|---|---|
{i} | 5 | 5 |
{i:03d} | 5 | 005 |
{lr:.4f} | 0.001 | 0.0010 |
{lr:.2e} | 0.001 | 1.00e-03 |
How Parameterization and File Dependencies Interact
When both jobs and files are parameterized with the same parameter:
jobs:
- name: train_{i}
output_files: [model_{i}]
parameters:
i: "1:3"
- name: eval_{i}
input_files: [model_{i}]
parameters:
i: "1:3"
files:
- name: model_{i}
path: /models/model_{i}.pt
parameters:
i: "1:3"
Torc creates these relationships:
train_1→model_1→eval_1train_2→model_2→eval_2train_3→model_3→eval_3
Each chain is independent—eval_2 doesn’t wait for train_1.
Parameter Modes: Product vs Zip
By default, multiple parameters create a Cartesian product (all combinations). For paired parameters, use parameter_mode: zip:
jobs:
# Default (product): 3 × 3 = 9 jobs
- name: train_{dataset}_{model}
command: python train.py --dataset={dataset} --model={model}
parameters:
dataset: "['cifar10', 'mnist', 'imagenet']"
model: "['resnet', 'vgg', 'transformer']"
# Zip mode: 3 paired jobs (cifar10+resnet, mnist+vgg, imagenet+transformer)
- name: paired_{dataset}_{model}
command: python train.py --dataset={dataset} --model={model}
parameters:
dataset: "['cifar10', 'mnist', 'imagenet']"
model: "['resnet', 'vgg', 'transformer']"
parameter_mode: zip
Use zip mode when parameters have a 1:1 correspondence (e.g., input/output file pairs, pre-determined configurations).
See Parameterization Reference for details.
What You Learned
In this tutorial, you learned:
- ✅ How to use
parameters:to expand one job definition into many - ✅ List format (
"[a,b,c]") and range format ("1:100") - ✅ Format specifiers (
{i:03d},{lr:.4f}) for consistent naming - ✅ How parameterized files create one-to-one dependencies
- ✅ The efficiency of parameter-matched dependencies (each chain runs independently)
- ✅ The difference between product (default) and zip parameter modes
Next Steps
- Tutorial 5: Advanced Parameterization - Multi-dimensional grid searches
- Multi-Stage Workflows with Barriers - Scale to thousands of parameterized jobs