Creating a Custom HPC Profile

This tutorial walks you through creating a custom HPC profile for a cluster that Torc doesn’t have built-in support for.

Before You Start

Request Built-in Support First!

If your HPC system is widely used, consider requesting that Torc developers add it as a built-in profile. This benefits everyone using that system.

Open an issue at github.com/NREL/torc/issues with:

Your HPC system name and organization

Partition names and their resource limits (CPUs, memory, walltime, GPUs)

How to detect the system (environment variable or hostname pattern)

Any special requirements (minimum nodes, exclusive partitions, etc.)

Built-in profiles are maintained by the Torc team and stay up-to-date as systems change.

When to Create a Custom Profile

Create a custom profile when:

Your HPC isn’t supported and you need to use it immediately
You have a private or internal cluster
You want to test profile configurations before submitting upstream

Step 1: Gather Partition Information

First, collect information about your HPC’s partitions. On most Slurm systems:

# List all partitions
sinfo -s

# Get detailed partition info
sinfo -o "%P %c %m %l %G"

For this tutorial, let’s say your cluster “ResearchCluster” has these partitions:

Partition	CPUs/Node	Memory	Max Walltime	GPUs
`batch`	48	192 GB	72 hours	-
`short`	48	192 GB	4 hours	-
`gpu`	32	256 GB	48 hours	4x A100
`himem`	48	1024 GB	48 hours	-

Step 2: Identify Detection Method

Determine how Torc can detect when you’re on this system. Common methods:

Environment variable (most common):

echo $CLUSTER_NAME    # e.g., "research"
echo $SLURM_CLUSTER   # e.g., "researchcluster"

Hostname pattern:

hostname              # e.g., "login01.research.edu"

For this tutorial, we’ll use the environment variable CLUSTER_NAME=research.

Step 3: Create the Configuration File

Create or edit your Torc configuration file:

# Linux
mkdir -p ~/.config/torc
nano ~/.config/torc/config.toml

# macOS
mkdir -p ~/Library/Application\ Support/torc
nano ~/Library/Application\ Support/torc/config.toml

Add your custom profile:

# Custom HPC Profile for ResearchCluster
[client.hpc.custom_profiles.research]
display_name = "Research Cluster"
description = "University Research HPC System"
detect_env_var = "CLUSTER_NAME=research"
default_account = "my_project"

# Batch partition - general purpose
[[client.hpc.custom_profiles.research.partitions]]
name = "batch"
cpus_per_node = 48
memory_mb = 192000        # 192 GB in MB
max_walltime_secs = 259200  # 72 hours in seconds
shared = false

# Short partition - quick jobs
[[client.hpc.custom_profiles.research.partitions]]
name = "short"
cpus_per_node = 48
memory_mb = 192000
max_walltime_secs = 14400   # 4 hours
shared = true               # Allows sharing nodes

# GPU partition
[[client.hpc.custom_profiles.research.partitions]]
name = "gpu"
cpus_per_node = 32
memory_mb = 256000          # 256 GB
max_walltime_secs = 172800  # 48 hours
gpus_per_node = 4
gpu_type = "A100"
shared = false

# High memory partition
[[client.hpc.custom_profiles.research.partitions]]
name = "himem"
cpus_per_node = 48
memory_mb = 1048576         # 1024 GB (1 TB)
max_walltime_secs = 172800  # 48 hours
shared = false

Step 4: Verify the Profile

Check that Torc recognizes your profile:

# List all profiles
torc hpc list

You should see your custom profile:

Known HPC profiles:

╭──────────┬──────────────────┬────────────┬──────────╮
│ Name     │ Display Name     │ Partitions │ Detected │
├──────────┼──────────────────┼────────────┼──────────┤
│ kestrel  │ NREL Kestrel     │ 15         │          │
│ research │ Research Cluster │ 4          │ ✓        │
╰──────────┴──────────────────┴────────────┴──────────╯

View the partitions:

torc hpc partitions research

Partitions for research:

╭─────────┬───────────┬───────────┬─────────────┬──────────╮
│ Name    │ CPUs/Node │ Mem/Node  │ Max Walltime│ GPUs     │
├─────────┼───────────┼───────────┼─────────────┼──────────┤
│ batch   │ 48        │ 192 GB    │ 72h         │ -        │
│ short   │ 48        │ 192 GB    │ 4h          │ -        │
│ gpu     │ 32        │ 256 GB    │ 48h         │ 4 (A100) │
│ himem   │ 48        │ 1024 GB   │ 48h         │ -        │
╰─────────┴───────────┴───────────┴─────────────┴──────────╯

Step 5: Test Partition Matching

Verify that Torc correctly matches resource requirements to partitions:

# Should match 'short' partition
torc hpc match research --cpus 8 --memory 16g --walltime 2h

# Should match 'gpu' partition
torc hpc match research --cpus 16 --memory 64g --walltime 8h --gpus 2

# Should match 'himem' partition
torc hpc match research --cpus 24 --memory 512g --walltime 24h

Step 6: Test Scheduler Generation

Create a test workflow to verify scheduler generation:

# test_workflow.yaml
name: profile_test
description: Test custom HPC profile

resource_requirements:
  - name: standard
    num_cpus: 16
    memory: 64g
    runtime: PT2H

  - name: gpu_compute
    num_cpus: 16
    num_gpus: 2
    memory: 128g
    runtime: PT8H

jobs:
  - name: preprocess
    command: echo "preprocessing"
    resource_requirements: standard

  - name: train
    command: echo "training"
    resource_requirements: gpu_compute
    depends_on: [preprocess]

Generate schedulers:

torc slurm generate --account my_project --profile research test_workflow.yaml

You should see the generated workflow with appropriate schedulers for each partition.

Step 7: Use Your Profile

Now you can submit workflows using your custom profile:

# Auto-detect the profile (if on the cluster)
torc submit-slurm --account my_project workflow.yaml

# Or explicitly specify the profile
torc submit-slurm --account my_project --hpc-profile research workflow.yaml

Advanced Configuration

Hostname-Based Detection

If your cluster doesn’t set a unique environment variable, use hostname detection:

[client.hpc.custom_profiles.research]
display_name = "Research Cluster"
detect_hostname = ".*\\.research\\.edu"  # Regex pattern

Minimum Node Requirements

Some partitions require a minimum number of nodes:

[[client.hpc.custom_profiles.research.partitions]]
name = "large_scale"
cpus_per_node = 128
memory_mb = 512000
max_walltime_secs = 172800
min_nodes = 16  # Must request at least 16 nodes

Explicit Request Partitions

Some partitions shouldn’t be auto-selected:

[[client.hpc.custom_profiles.research.partitions]]
name = "priority"
cpus_per_node = 48
memory_mb = 192000
max_walltime_secs = 86400
requires_explicit_request = true  # Only used when explicitly requested

Troubleshooting

Profile Not Detected

If torc hpc detect doesn’t find your profile:

Check the environment variable or hostname:
```
echo $CLUSTER_NAME
hostname
```
Verify the detection pattern in your config matches exactly
Test with explicit profile specification:
```
torc hpc show research
```

No Partition Found for Job

If torc slurm generate can’t find a matching partition:

Check if any partition satisfies all requirements:

torc hpc match research --cpus 32 --memory 128g --walltime 8h

Verify memory is specified in MB in the config (not GB)
Verify walltime is in seconds (not hours)

Configuration File Location

Torc looks for config files in these locations:

Linux: ~/.config/torc/config.toml
macOS: ~/Library/Application Support/torc/config.toml
Windows: %APPDATA%\torc\config.toml

You can also use the TORC_CONFIG environment variable to specify a custom path.

Contributing Your Profile

If your HPC is used by others, please contribute it upstream:

Fork the Torc repository
Add your profile to src/client/hpc_profiles.rs
Add tests for your profile
Submit a pull request

Or simply open an issue with your partition information and we’ll add it for you.

Torc Documentation