Workflow Specification Formats

Torc supports three workflow specification formats: YAML, JSON5, and KDL. All formats provide the same functionality with different syntaxes to suit different preferences and use cases.

Format Overview

Feature	YAML	JSON5	KDL
Parameter Expansion	✓	✓	✗
Comments	✓	✓	✓
Trailing Commas	✗	✓	N/A
Human-Readable	✓✓✓	✓✓	✓✓✓
Programmatic Generation	✓✓	✓✓✓	✓
Industry Standard	✓✓✓	✓✓	✓
Jobs, Files, Resources	✓	✓	✓
User Data	✓	✓	✓
Workflow Actions	✓	✓	✓
Resource Monitoring	✓	✓	✓
Slurm Schedulers	✓	✓	✓

YAML Format

Best for: Most workflows, especially those using parameter expansion.

File Extension: .yaml or .yml

Example:

name: data_processing_workflow
user: datauser
description: Multi-stage data processing pipeline

# File definitions
files:
  - name: raw_data
    path: /data/input/raw_data.csv
  - name: processed_data
    path: /data/output/processed_data.csv

# Resource requirements
resource_requirements:
  - name: small_job
    num_cpus: 2
    num_gpus: 0
    num_nodes: 1
    memory: 4g
    runtime: PT30M

# Jobs
jobs:
  - name: download_data
    command: wget https://example.com/data.csv -O ${files.output.raw_data}
    resource_requirements: small_job

  - name: process_data
    command: python process.py ${files.input.raw_data} -o ${files.output.processed_data}
    resource_requirements: small_job
    depends_on:
      - download_data

# Workflow actions
actions:
  - trigger_type: on_workflow_start
    action_type: run_commands
    commands:
      - mkdir -p /data/input /data/output
      - echo "Workflow started"

Advantages:

Most widely used configuration format
Excellent for complex workflows with many jobs
Full parameter expansion support
Clean, readable syntax without brackets

Disadvantages:

Indentation-sensitive
No trailing commas allowed
Can be verbose for deeply nested structures

JSON5 Format

Best for: Programmatic workflow generation and JSON compatibility.

File Extension: .json5

Example:

{
  name: "data_processing_workflow",
  user: "datauser",
  description: "Multi-stage data processing pipeline",

  // File definitions
  files: [
    {name: "raw_data", path: "/data/input/raw_data.csv"},
    {name: "processed_data", path: "/data/output/processed_data.csv"},
  ],

  // Resource requirements
  resource_requirements: [
    {
      name: "small_job",
      num_cpus: 2,
      num_gpus: 0,
      num_nodes: 1,
      memory: "4g",
      runtime: "PT30M",
    },
  ],

  // Jobs
  jobs: [
    {
      name: "download_data",
      command: "wget https://example.com/data.csv -O ${files.output.raw_data}",
      resource_requirements: "small_job",
    },
    {
      name: "process_data",
      command: "python process.py ${files.input.raw_data} -o ${files.output.processed_data}",
      resource_requirements: "small_job",
      depends_on: ["download_data"],
    },
  ],

  // Workflow actions
  actions: [
    {
      trigger_type: "on_workflow_start",
      action_type: "run_commands",
      commands: [
        "mkdir -p /data/input /data/output",
        "echo 'Workflow started'",
      ],
    },
  ],
}

Advantages:

JSON-compatible (easy programmatic manipulation)
Supports comments and trailing commas
Full parameter expansion support
Familiar to JavaScript/JSON users

Disadvantages:

More verbose than YAML
Requires quotes around all string values
More brackets and commas than YAML

KDL Format

Best for: Simple to moderate workflows with clean, modern syntax.

File Extension: .kdl

Example:

name "data_processing_workflow"
user "datauser"
description "Multi-stage data processing pipeline"

// File definitions
file "raw_data" path="/data/input/raw_data.csv"
file "processed_data" path="/data/output/processed_data.csv"

// Resource requirements
resource_requirements "small_job" {
    num_cpus 2
    num_gpus 0
    num_nodes 1
    memory "4g"
    runtime "PT30M"
}

// Jobs
job "download_data" {
    command "wget https://example.com/data.csv -O ${files.output.raw_data}"
    resource_requirements "small_job"
}

job "process_data" {
    command "python process.py ${files.input.raw_data} -o ${files.output.processed_data}"
    resource_requirements "small_job"
    depends_on_job "download_data"
}

// Workflow actions
action {
    trigger_type "on_workflow_start"
    action_type "run_commands"
    command "mkdir -p /data/input /data/output"
    command "echo 'Workflow started'"
}

Advantages:

Clean, minimal syntax
No indentation requirements
Modern configuration language
Supports all core Torc features

Disadvantages:

No parameter expansion support
Less familiar to most users
Boolean values use special syntax (#true, #false)

KDL-Specific Syntax Notes

Boolean values: Use #true and #false (not true or false)

resource_monitor {
    enabled #true
    generate_plots #false
}

Repeated child nodes: Use multiple statements

action {
    command "echo 'First command'"
    command "echo 'Second command'"
}

User data: Requires child nodes for properties

user_data "metadata" {
    is_ephemeral #true
    data "{\"key\": \"value\"}"
}

Common Features Across All Formats

Variable Substitution

All formats support the same variable substitution syntax:

${files.input.NAME} - Input file path
${files.output.NAME} - Output file path
${user_data.input.NAME} - Input user data
${user_data.output.NAME} - Output user data

Supported Fields

All formats support:

Workflow metadata: name, user, description
Jobs: name, command, dependencies, resource requirements
Files: name, path, modification time
User data: name, data (JSON), ephemeral flag
Resource requirements: CPUs, GPUs, memory, runtime
Slurm schedulers: account, partition, walltime, etc.
Workflow actions: triggers, action types, commands
Resource monitoring: enabled, granularity, sampling interval

Parameter Expansion (YAML/JSON5 Only)

YAML and JSON5 support parameter expansion to generate many jobs from concise specifications:

jobs:
  - name: "process_{dataset_id}"
    command: "python process.py --id {dataset_id}"
    parameters:
      dataset_id: "1:100"  # Creates 100 jobs

KDL does not support parameter expansion. For parameterized workflows, use YAML or JSON5.

Examples Directory

The Torc repository includes comprehensive examples in all three formats:

examples/
├── yaml/     # All workflows (15 examples)
├── json/     # All workflows (15 examples)
└── kdl/      # Non-parameterized workflows (9 examples)

Compare the same workflow in different formats to choose your preference:

See the examples directory for the complete collection.

Creating Workflows

All formats use the same command:

torc workflows create examples/yaml/sample_workflow.yaml
torc workflows create examples/json/sample_workflow.json5
torc workflows create examples/kdl/sample_workflow.kdl

Or use the quick execution commands:

# Create and run locally
torc run examples/yaml/sample_workflow.yaml

# Create and submit to scheduler
torc submit examples/yaml/workflow_actions_data_pipeline.yaml

Recommendations

Start with YAML if you’re unsure - it’s the most widely supported and includes full parameter expansion.

Switch to JSON5 if you need to programmatically generate workflows or prefer JSON syntax.

Try KDL if you prefer minimal syntax and don’t need parameter expansion.

All three formats are fully supported and maintained. Choose based on your workflow complexity and personal preference.

Keyboard shortcuts

Torc Documentation