Technical reference for Torc’s resource monitoring system.
The resource_monitor section in workflow specifications accepts the following fields:
Field Type Default Description
enabledboolean trueEnable or disable monitoring
granularitystring "summary""summary" or "time_series"
sample_interval_secondsinteger 5Seconds between resource samples
generate_plotsboolean falseReserved for future use
Summary mode ("summary"):
Stores only peak and average values per job
Metrics stored in the main database results table
Minimal storage overhead
Time series mode ("time_series"):
Stores samples at regular intervals
Creates separate SQLite database per workflow run
Database location: <output_dir>/resource_utilization/resource_metrics_<hostname>_<workflow_id>_<run_id>.db
Job Duration Recommended Interval
< 1 hour 1-2 seconds
1-4 hours 5 seconds (default)
> 4 hours 10-30 seconds
Column Type Description
idINTEGER Primary key
job_idINTEGER Torc job ID
timestampREAL Unix timestamp
cpu_percentREAL CPU utilization percentage
memory_bytesINTEGER Memory usage in bytes
num_processesINTEGER Process count including children
Column Type Description
job_idINTEGER Primary key, Torc job ID
job_nameTEXT Human-readable job name
When using summary mode, the following fields are added to job results:
Field Type Description
peak_cpu_percentfloat Maximum CPU percentage observed
avg_cpu_percentfloat Average CPU percentage
peak_memory_gbfloat Maximum memory in GB
avg_memory_gbfloat Average memory in GB
When using --format json:
{
"workflow_id": 123,
"run_id": null,
"total_results": 10,
"over_utilization_count": 3,
"violations": [
{
"job_id": 15,
"job_name": "train_model",
"resource_type": "Memory",
"specified": "8.00 GB",
"peak_used": "10.50 GB",
"over_utilization": "+31.3%"
}
]
}
Field Description
workflow_idWorkflow being analyzed
run_idSpecific run ID if provided, otherwise null for latest
total_resultsTotal number of completed jobs analyzed
over_utilization_countNumber of violations found
violationsArray of violation details
Field Description
job_idJob ID with violation
job_nameHuman-readable job name
resource_type"Memory", "CPU", or "Runtime"
specifiedResource requirement from workflow spec
peak_usedActual peak usage observed
over_utilizationPercentage over/under specification
File Description
resource_plot_job_<id>.htmlPer-job timeline with CPU, memory, process count
resource_plot_cpu_all_jobs.htmlCPU comparison across all jobs
resource_plot_memory_all_jobs.htmlMemory comparison across all jobs
resource_plot_summary.htmlBar chart dashboard of peak vs average
All plots are self-contained HTML files using Plotly.js with:
Interactive hover tooltips
Zoom and pan controls
Legend toggling
Export options (PNG, SVG)
Metric Unit Description
CPU percentage % Total CPU utilization across all cores
Memory usage bytes Resident memory consumption
Process count count Number of processes in job’s process tree
The monitoring system automatically tracks child processes spawned by jobs. When a job creates worker processes (e.g., Python multiprocessing), all descendants are included in the aggregated metrics.
Single background monitoring thread regardless of job count
Typical overhead: <1% CPU even with 1-second sampling
Uses native OS APIs via the sysinfo crate
Non-blocking async design