Configure a workflow¶
Here are the recommended procedures to configure a workflow:
Workflow specification (JSON)
Python API
Julia API
Configure a workflow specification¶
Dump the workflow template to a JSON file. Alternatively, dump the example specification to a file. You might prefer it because it includes object definitions, like jobs and files. Finally, you can copy/paste/modify this example workflow file
$ torc workflows template > workflow.json
$ torc workflows example > example.json
Note
The output of these is JSON. You can name the file with .json5 and use JSON5 syntax if you prefer.
Customize the parameters in the file in an editor.
Refer to Workflow Specification for more configuration options.
Create a workflow in the database.
$ torc workflows create-from-json-file workflow.json
2023-07-31 16:48:32,982 - INFO [torc.cli.workflows workflows.py:234] : Created a workflow from workflow.json5 with key=14022560
OpenAPI Clients¶
Note
This method is recommended if your workflow has more than 10,000 jobs and required if the total size of the workflow exceeds 500 MiB.
Configure with the Python API¶
You can build a workflow through the torc Python API. Refer to this example Python script and the Python Client API Reference .
Note that if you don’t have a CLI executable for your jobs and instead want torc to map a list of
input parameters across workers, you can call torc.api.map_function_to_jobs()
. Refer to
the tutorial Map a Python function to compute nodes for more information.
Configure with the Julia API¶
You can build a workflow through the torc Julia API. Refer to this example Julia script.
Note that if you don’t have a CLI executable for your jobs and instead want torc to map a list of
input parameters across workers, you can call Torc.map_function_to_jobs()
. Refer to
the tutorial Map a Julia function to compute nodes for more information.
Startup and completion scripts¶
Torc provides the ability to run startup and completion scripts for workflows. This can be useful if you want to perform some actions before the workflow starts or after it completes and do not want to define them as jobs in the workflow. Defining the dependencies can be onerous.
Torc also provides the ability to run a script on each compute node before it starts running jobs.
workflow_startup_script
: A script that runs on the computer that initializes the workflow graph (torc workflows start
). In an HPC environment, this is typically the login node. If it is the login node, take care to not consume large amounts of CPU and memory.workflow_completion_script
: A script that runs on the compute node that completes the last job.worker_startup_script
: A script that runs on each compute node before it starts running jobs.
Here is how to configure each of these:
user: "user",
name: "my_workflow",
config: {
workflow_startup_script: "bash workflow_startup.sh",
workflow_completion_script: "bash workflow_completion.sh",
worker_startup_script: "bash worker_startup.sh",
}
from torc import make_api
from torc.openapi_client import WorkflowModel
api = make_api("http://localhost:8529/_db/test-workflows/torc-service")
workflow = WorkflowModel(user="user", name="my_workflow")
config = api.get_workflow_config(workflow.key)
config.workflow_startup_script = "bash workflow_startup.sh"
config.workflow_completion_script = "bash workflow_completion.sh"
config.worker_startup_script = "bash worker_startup.sh"
api.modify_workflow_config(workflow.key, config)
using Torc
import Torc: APIClient
api = make_api("http://localhost:8529/_db/test-workflows/torc-service")
workflow = send_api_command(
api,
APIClient.add_workflow,
APIClient.WorkflowModel(user = "user", name = "my_workflow")
)
config = send_api_command(api, APIClient.get_workflows_key_config, workflow._key)
config = api.get_workflow_config(workflow.key)
config.workflow_startup_script = "bash workflow_startup.sh"
config.workflow_completion_script = "bash workflow_completion.sh"
config.worker_startup_script = "bash worker_startup.sh"
api.modify_workflow_config(workflow.key, config)
send_api_command(api, APIClient.put_workflows_key_config, workflow._key, config)
Compute node configuration options¶
Refer to Advanced Configuration Options for how to customize behavior of the torc worker application on compute nodes. Here are some example settings:
user: "user",
name: "my_workflow",
config: {
compute_node_resource_stats: {
cpu: true,
disk: false,
memory: true,
network: false,
process: true,
monitor_type: "periodic",
make_plots: true,
interval: 10
},
compute_node_ignore_workflow_completion: false,
}
from torc import make_api
from torc.openapi_client import ComputeNodeResourceStatsModel, WorkflowModel
api = make_api("http://localhost:8529/_db/test-workflows/torc-service")
workflow = WorkflowModel(user="user", name="my_workflow")
config = api.get_workflow_config(workflow.key)
config.compute_node_resource_stats = ComputeNodeResourceStatsModel(
cpu=True,
memory=True,
process=True,
interval=10,
monitor_type="aggregation",
)
config.compute_node_ignore_workflow_completion = False
api.modify_workflow_config(workflow.key, config)
using Torc
import Torc: APIClient
api = make_api("http://localhost:8529/_db/test-workflows/torc-service")
workflow = send_api_command(
api,
APIClient.add_workflow,
APIClient.WorkflowModel(user = "user", name = "my_workflow")
)
config = send_api_command(api, APIClient.get_workflows_key_config, workflow._key)
config.compute_node_resource_stats = APIClient.ComputeNodeResourceStatsModel(
cpu=true,
memory=true,
process=true,
interval=10,
monitor_type="aggregation",
)
config.compute_node_ignore_workflow_completion = false
send_api_command(api, APIClient.put_workflows_key_config, workflow._key, config)