(query-with-sql)=

# Query with SQL

Users may prefer to use SQL to query results rather than the torc CLI or API, and so torc provides
a way to export the ArangoDB collections to SQLite. This page describes how to use that feature.

## Export to SQLite

By default this command will export all worklows stored in the database. You can pass your desired
workflow keys as positional arguments to limit the exported tables.

```console
$ torc export sqlite
2023-04-28 15:00:31,205 - INFO [torc.cli.export export.py:116] : Exported database to workflow.sqlite for all workflows
```

```{eval-rst}
.. note:: There is a significant difference between the ArangoDB contents and the SQLite file. The
   ArangoDB stores one set of collections for each workflow. The SQLite file stores each type of
   collection in the same table. For example, all jobs for all workflows are stored in the ``jobs``
   table.
```

## SQL queries

The `sqlite3` CLI tool provides a convenient interface to query the result file. If it is not
already installed in your environment, you should be able to install it with your package
management tool.

```{eval-rst}
.. warning:: NREL's Eagle HPC environment has an old version of the tool that does not support
   ``sqlite3 -table``. Use ``sqlite3 -header -column`` instead.
```

### View the tables

```console
$ sqlite3 -table workflow.sqlite
sqlite> .tables
blocks                 needs                  slurm_schedulers
consumes               produces               user_data
events                 requires               workflow_configs
files                  resource_requirements  workflow_statuses
jobs                   scheduled_bys          workflows
```

### View the jobs for one workflow

```console
sqlite> SELECT key, name, status FROM jobs WHERE workflow_key = 98078061;
+----------+-------------+--------+
|   key    |    name     | status |
+----------+-------------+--------+
| 98078218 | preprocess  | done   |
| 98078274 | work2       | done   |
| 98078244 | work1       | done   |
| 98078304 | postprocess | done   |
+----------+-------------+--------+
```

### Join jobs with results

```console
sqlite> SELECT jobs.name, results.return_code, results.status, results.exec_time_minutes
    FROM jobs
    JOIN results
    ON jobs.key = results.job_key
    WHERE jobs.workflow_key = 98078061;
+-------------+-------------+--------+---------------------+
|    name     | return_code | status |  exec_time_minutes  |
+-------------+-------------+--------+---------------------+
| preprocess  | 0           | done   | 0.00443786382675171 |
| work2       | 0           | done   | 0.0314231514930725  |
| work1       | 0           | done   | 0.0294709006945292  |
| postprocess | 0           | done   | 0.0451397975285848  |
+-------------+-------------+--------+---------------------+
```

### Join jobs with process stats

```console
sqlite> SELECT jobs.name, s.run_id, s.max_cpu_percent, s.max_rss
    FROM jobs
    JOIN job_process_stats AS s
    ON jobs.key == s.job_key
    WHERE jobs.workflow_key = 98081576;
+-------------+--------+-----------------+-------------+
|    name     | run_id | max_cpu_percent |   max_rss   |
+-------------+--------+-----------------+-------------+
| preprocess  | 1      | 82.4            | 433516544.0 |
| work1       | 1      | 73.5            | 72708096.0  |
| work2       | 1      | 78.7            | 167821312.0 |
| postprocess | 1      | 90.5            | 389586944.0 |
+-------------+--------+-----------------+-------------+
```