reV script

Execute the script step from a config file.

This command runs one or more terminal commands/scripts as part of a pipeline step.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reV script [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the script configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "cmd": "[REQUIRED]"
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
log_directory: ./logs
log_level: INFO
cmd: '[REQUIRED]'

log_directory = "./logs"
log_level = "INFO"
cmd = "[REQUIRED]"

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

cmdstr | list

A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify

"cmd": "python my_script.py"

This will run the python file “my_script.py” (in the project directory) on a single node.

Important

It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.

To run multiple commands in parallel, supply them as a list:

"cmd": [
    "python /path/to/my_script/py -a -out out_file.txt",
    "wget https://website.org/latest.zip"
]

This input will run two commands (a python script with the specified arguments and a wget command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.

Note that you may remove any keys with a null value if you do not intend to update them yourself.