Join Collections

There are several scenarios where you will want to see the contents of two collections joined by a common key. One example is viewing job and result collections together. Torc often stores these types of relationships with edges in the graph database.

The torc HTTP API provides commands to join these collections.

Torc join CLI command

The torc CLI toolkit provides the easiest way to join these collections. Look at the help of this command.

$ torc collections join --help
Usage: torc collections join [OPTIONS] {compute-node-executed-jobs|compute-
                             node-utilization|job-blocks|job-needs-file|job-
                             produces-file|job-requirements|job-results|job-
                             schedulers|job-process-utilization|job-stores-
                             data}

  Perform a join of collections from a pre-set configuration.

  Examples:
  1. Show jobs and results in a table.
     $ torc collections join job-results
  2. Show jobs and results in JSON format.
     $ torc -F JSON collections join job-results

Options:
  -l, --limit INTEGER  Limit the output to this number of jobs.
  -s, --skip INTEGER   Skip this number of jobs.
  --help               Show this message and exit

Take one example:

$ torc collections join job-results

+----------------------------------------------------------------------------------------------------------------------------+
|                            jobs with edge='returned' direction='outbound' in workflow 95639437                             |
+-------+-----------+-----------+-----------+----------------+----------------------+----------------------------+-----------+
| index | from__key | from_name | to_run_id | to_return_code | to_exec_time_minutes |     to_completion_time     | to_status |
+-------+-----------+-----------+-----------+----------------+----------------------+----------------------------+-----------+
|   0   |  95639561 |   small   |     1     |       0        |  1.0095648964246113  | 2023-04-16T18:29:02.972248 |    done   |
|   1   |  95639573 |   medium  |     1     |       0        |  1.0064559698104858  | 2023-04-16T18:29:03.004850 |    done   |
|   2   |  95639585 |   large   |     1     |       0        |  1.0041922012964883  | 2023-04-16T18:29:03.032915 |    done   |
+-------+-----------+-----------+-----------+----------------+----------------------+----------------------------+-----------+

Refer to the help for all possibilities.

Note

Setting the output format to JSON with torc -F JSON may be helpful for this command.

Torc join-by-edge CLI command

The above CLI command actually invokes a much more flexible CLI command:

$ torc collections join-by-edge --help
Usage: torc collections join-by-edge [OPTIONS] COLLECTION EDGE

  Join a collection with one or more other collections connected by an edge.

Options:
  --outbound / --inbound   Inbound or outbound edge.  [default: outbound]
  -l, --limit INTEGER      Limit the output to this number of jobs.
  -s, --skip INTEGER       Skip this number of jobs.
  -x, --exclude-from TEXT  Exclude this base column name on the from side.
                           Accepts multiple
  -y, --exclude-to TEXT    Exclude this base column name on the to side.
                           Accepts multiple
  --help                   Show this message and exit.

You can use this command to view any collection + edge in either direction as well as limit the display to custom columns.

HTTP API

The format of the HTTP commands is:

  • GET /workflows/:key/join_by_inbound_edge/:collection/:edge

  • GET /workflows/:key/join_by_outbound_edge/:collection/:edge

Example:

$ curl --silent -X GET http://localhost:8529/_db/workflows/torc-service/workflows/95612117/join_by_outbound_edge/jobs/returned | jq .

{
  "items": [
    {
      "from": {
        "_key": "95612239",
        "_id": "jobs__95612117/95612239",
        "_rev": "_f2v-wWS---",
        "name": "small",
        "command": "python tests/scripts/resource_consumption.py -i 1 -c small",
        "cancel_on_blocking_job_failure": true,
        "supports_termination": false,
        "status": "done"
      },
      "to": {
        "_key": "95612607",
        "_id": "results__95612117/95612607",
        "_rev": "_f2Vuclq---",
        "job_key": "95612239",
        "job_name": "small",
        "run_id": 1,
        "return_code": 0,
        "exec_time_minutes": 1.0095913807551067,
        "completion_time": "2023-04-15T11:22:24.711032",
        "status": "done"
      }
    },
  ]
}