Query with SQL¶
Users may prefer to use SQL to query results rather than the torc CLI or API, and so torc provides a way to export the ArangoDB collections to SQLite. This page describes how to use that feature.
Export to SQLite¶
By default this command will export all worklows stored in the database. You can pass your desired workflow keys as positional arguments to limit the exported tables.
$ torc export sqlite
2023-04-28 15:00:31,205 - INFO [torc.cli.export export.py:116] : Exported database to workflow.sqlite for all workflows
Note
There is a significant difference between the ArangoDB contents and the SQLite file. The
ArangoDB stores one set of collections for each workflow. The SQLite file stores each type of
collection in the same table. For example, all jobs for all workflows are stored in the jobs
table.
SQL queries¶
The sqlite3
CLI tool provides a convenient interface to query the result file. If it is not
already installed in your environment, you should be able to install it with your package
management tool.
Warning
NREL’s Eagle HPC environment has an old version of the tool that does not support
sqlite3 -table
. Use sqlite3 -header -column
instead.
View the tables¶
$ sqlite3 -table workflow.sqlite
sqlite> .tables
blocks needs slurm_schedulers
consumes produces user_data
events requires workflow_configs
files resource_requirements workflow_statuses
jobs scheduled_bys workflows
View the jobs for one workflow¶
sqlite> SELECT key, name, status FROM jobs WHERE workflow_key = 98078061;
+----------+-------------+--------+
| key | name | status |
+----------+-------------+--------+
| 98078218 | preprocess | done |
| 98078274 | work2 | done |
| 98078244 | work1 | done |
| 98078304 | postprocess | done |
+----------+-------------+--------+
Join jobs with results¶
sqlite> SELECT jobs.name, results.return_code, results.status, results.exec_time_minutes
FROM jobs
JOIN results
ON jobs.key = results.job_key
WHERE jobs.workflow_key = 98078061;
+-------------+-------------+--------+---------------------+
| name | return_code | status | exec_time_minutes |
+-------------+-------------+--------+---------------------+
| preprocess | 0 | done | 0.00443786382675171 |
| work2 | 0 | done | 0.0314231514930725 |
| work1 | 0 | done | 0.0294709006945292 |
| postprocess | 0 | done | 0.0451397975285848 |
+-------------+-------------+--------+---------------------+
Join jobs with process stats¶
sqlite> SELECT jobs.name, s.run_id, s.max_cpu_percent, s.max_rss
FROM jobs
JOIN job_process_stats AS s
ON jobs.key == s.job_key
WHERE jobs.workflow_key = 98081576;
+-------------+--------+-----------------+-------------+
| name | run_id | max_cpu_percent | max_rss |
+-------------+--------+-----------------+-------------+
| preprocess | 1 | 82.4 | 433516544.0 |
| work1 | 1 | 73.5 | 72708096.0 |
| work2 | 1 | 78.7 | 167821312.0 |
| postprocess | 1 | 90.5 | 389586944.0 |
+-------------+--------+-----------------+-------------+