setbacks

Command Line Interface

setbacks [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose: Flag to turn on debug logging. Default is not verbose.

batch

Execute an analysis pipeline over a parametric set of inputs.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks batch [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the batch configuration file. Below is a sample template config

{
    "logging": {
        "log_file": null,
        "log_level": "INFO"
    },
    "pipeline_config": "[REQUIRED]",
    "sets": [
        {
            "args": "[REQUIRED]",
            "files": "[REQUIRED]",
            "set_tag": "set1"
        },
        {
            "args": "[REQUIRED]",
            "files": "[REQUIRED]",
            "set_tag": "set2"
        }
    ]
}

logging:
  log_file: null
  log_level: INFO
pipeline_config: '[REQUIRED]'
sets:
- args: '[REQUIRED]'
  files: '[REQUIRED]'
  set_tag: set1
- args: '[REQUIRED]'
  files: '[REQUIRED]'
  set_tag: set2

pipeline_config = "[REQUIRED]"
[[sets]]
args = "[REQUIRED]"
files = "[REQUIRED]"
set_tag = "set1"

[[sets]]
args = "[REQUIRED]"
files = "[REQUIRED]"
set_tag = "set2"

[logging]
log_level = "INFO"

Parameters

loggingdict, optional

Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the batch command. Note that each pipeline job submitted via batch has it’s own logging key that will initialize pipeline step logging. Therefore, it’s only ever necessary to use this input if you want logging information about the batching portion of the execution.

pipeline_configstr

Path to the pipeline configuration defining the commands to run for every parametric set.

setslist of dicts

A list of dictionaries, where each dictionary defines a “set” of parametric runs. Each dictionary should have the following keys:

argsdict
A dictionary defining the arguments across all input configuration files to parameterize. Each argument to be parametrized should be a key in this dictionary, and the value should be a list of the parameter values to run for this argument (single-item lists are allowed and can be used to vary a parameter value across sets).
"args": {
    "input_constant_1": [
        18.02,
        19.04
    ],
    "path_to_a_file": [
        "/first/path.h5",
        "/second/path.h5",
        "/third/path.h5"
    ]
}
args:
  input_constant_1:
  - 18.02
  - 19.04
  path_to_a_file:
  - /first/path.h5
  - /second/path.h5
  - /third/path.h5
[args]
input_constant_1 = [ 18.02, 19.04,]
path_to_a_file = [ "/first/path.h5", "/second/path.h5", "/third/path.h5",]
This example would run a total of six pipelines, one with each of the following arg combinations:
input_constant_1=18.20, path_to_a_file="/first/path.h5"
input_constant_1=18.20, path_to_a_file="/second/path.h5"
input_constant_1=18.20, path_to_a_file="/third/path.h5"
input_constant_1=19.04, path_to_a_file="/first/path.h5"
input_constant_1=19.04, path_to_a_file="/second/path.h5"
input_constant_1=19.04, path_to_a_file="/third/path.h5"
Remember that the keys in the args dictionary should be part of (at least) one of your other configuration files.
fileslist
A list of paths to the configuration files that contain the arguments to be updated for every parametric run. Arguments can be spread out over multiple files. For example:
"files": [
    "./config_run.yaml",
    "./config_analyze.json"
]
files:
- ./config_run.yaml
- ./config_analyze.json
files = [ "./config_run.yaml", "./config_analyze.json",]
set_tagstr, optional
Optional string defining a set tag that will prefix each job tag for this set. This tag does not need to include an underscore, as that is provided during concatenation.

--dry: Flag to do a dry run (make batch dirs and update files without running the pipeline).

--cancel: Flag to cancel all jobs associated associated with the batch_jobs.csv file in the current batch config directory.

--delete: Flag to delete all batch job sub directories associated with the batch_jobs.csv file in the current batch config directory.

--monitor-background: Flag to monitor all batch pipelines continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline "log_file" to capture logs.

compute

Execute the compute step from a config file.

Setbacks can be computed for a specific turbine (hub height and rotor diameter) or more generally using a base setback distance.

Setbacks can be computed either locally (on a per-county basis with given distances/multipliers) or everywhere under a generic setback multiplier assumption applied to either the turbine tip-height or the base setback distance. These two methods can also be applied simultaneously - local setbacks are computed where given (via a the regulation file input) and a generic multiplier applied to the turbine tip-height or the base setback distance everywhere else.

Partial inclusions can be computed instead of boolean exclusions, both of which can be fed directly into reV.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks compute [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the compute configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null,
        "max_workers": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "excl_fpath": "[REQUIRED]",
    "hub_height": null,
    "rotor_diameter": null,
    "base_setback_dist": null,
    "regulations_fpath": null,
    "weights_calculation_upscale_factor": null,
    "replace": false,
    "hsds": false,
    "out_layers": null,
    "feature_specs": null,
    "features": "[REQUIRED]",
    "generic_setback_multiplier": null
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
  max_workers: null
log_directory: ./logs
log_level: INFO
excl_fpath: '[REQUIRED]'
hub_height: null
rotor_diameter: null
base_setback_dist: null
regulations_fpath: null
weights_calculation_upscale_factor: null
replace: false
hsds: false
out_layers: null
feature_specs: null
features: '[REQUIRED]'
generic_setback_multiplier: null

log_directory = "./logs"
log_level = "INFO"
excl_fpath = "[REQUIRED]"
replace = false
hsds = false
features = "[REQUIRED]"

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
max_workers:: (int, optional) Number of workers to use for setback exclusion computation. If this value is 1, the computation runs in serial. If this value is > 1, the computation runs in parallel with that many workers. If None, the computation runs in parallel on all available cores. By default, None.
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

excl_fpathstr

Path to HDF5 file containing the county FIPS layer (should be called cnty_fips) used to match local regulations in regulations_fpath to counties on the grid. No data will be written to this file unless explicitly requested via the out_layers input.

hub_heightint | float, optional

Turbine hub height (m), used along with rotor diameter to compute the blade tip-height which is used as the base setback distance for generic/local regulations. If this input is specified, rotor_diameter must also be given, and base_setback_dist must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via the regulations_fpath csv, or the generic_setback_multiplier input, or both) before setbacks are computed. By default, None.

rotor_diameterint | float, optional

Turbine rotor diameter (m), used along with hub height to compute the blade tip-height, which is used as the base setback distance for generic/local regulations. If this input is specified, hub_height must also be given, and base_setback_dist must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via the regulations_fpath csv, or the generic_setback_multiplier input, or both) before setbacks are computed. By default, None.

base_setback_distint | float, optional

Base setback distance (m). This value is used as the base setback distance for generic/local regulations. If this input is specified, both hub_height``and ``rotor_diameter must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via the regulations_fpath csv, or the generic_setback_multiplier input, or both) before setbacks are computed. By default, None.

regulations_fpathstr, optional

Path to regulations .csv or .gpkg file. At a minimum, this file must contain the following columns:

Feature Type: Contains labels for the type of setback that each row represents. This should be a “feature_type” label that can be found in the SETBACK_SPECS dictionary (e.g. "structures", "roads", "water", etc.), unless you have created your own setback calculator using setbacks_calculator(), in which case this label can match the feature_type input you used for that function call.

Feature Subtype: Contains labels for feature subtypes. The feature subtypes are only used for down-selecting the local regulations that should be applied for a particular feature, so often you can leave this blank or set it to None. If you do specify this value, it should be a “feature_subtypes_to_exclude” label that can be found in the SETBACK_SPECS dictionary, unless you have created your own setback calculator using setbacks_calculator(), in which case this label can match the feature_subtypes_to_exclude input you used for that function call.

Value Type: Specifies wether the value is a multiplier or static height. See SetbackRegulations (if using only base_setback_dist input) or WindSetbackRegulations (if using hub_height + rotor_diameter input) for more info.

Value: Numeric value of the setback or multiplier.

FIPS: Specifies a unique 5-digit code for each county (this can be an integer - no leading zeros required). This is used along side the cnty_fips layer in the excl_fpath to match the county regulations to the county’s spatial extent.

This option overrides the generic_setback_multiplier input, but only for counties that are listed in the input CSV file. This means both regulations_fpath and generic_setback_multiplier can be specified simultaneously in order to compute setbacks driven by local ordinance where given + a generic multiplier applied everywhere else. By default, None, which does not compute any local setbacks.

weights_calculation_upscale_factorint, optional

Optional input to specify partial setback calculations. If this value is an int > 1, the output will be a layer with inclusion weight values (floats ranging from 0 to 1). Note that this is backwards w.r.t the typical output of exclusion integer values (1 for excluded, 0 otherwise). Values <= 1 will still return a standard exclusion mask. For example, a cell that was previously excluded with a boolean mask (value of 1) may instead be converted to an inclusion weight value of 0.75, meaning that 75% of the area corresponding to that point should be included (i.e. the exclusion feature only intersected a small portion - 25% - of the cell). This percentage inclusion value is calculated by upscaling the output array using this input value, rasterizing the exclusion features onto it, and counting the number of resulting sub-cells excluded by the feature. For example, setting the value to 3 would split each output cell into nine sub-cells - 3 divisions in each dimension. After the feature is rasterized on this high-resolution sub-grid, the area of the non-excluded sub-cells is totaled and divided by the area of the original cell to obtain the final inclusion percentage. Therefore, a larger upscale factor results in more accurate percentage values. If None (or a value <= 1), this process is skipped and the output is a boolean exclusion mask. By default None.

replacebool, optional

Flag to replace the output GeoTIFF if it already exists. By default, False.

hsdsbool, optional

Boolean flag to use h5pyd to handle HDF5 “files” hosted on AWS behind HSDS. By default, False.

out_layersdict, optional

Dictionary mapping the input feature file names (with extension) to names of layers under which exclusions should be saved in the excl_fpath HDF5 file. If None or empty dictionary, no layers are saved to the HDF5 file. By default, None.

feature_specsdict, optional

Optional dictionary specifying new feature setback calculators or updates to existing ones. The keys of this dictionary should be names of the features for which a specification is being provided. If the name is already a key in SETBACK_SPECS, the corresponding specifications wil be updated for that feature. Otherwise, the name will represent a new feature type, which can be used as a key in the features input. The values of the feature-type keys should be dictionaries, where the keys are parameters of the setbacks_calculator() function. Required parameters in that function are required keys of these dictionaries. Values should be the updated value. For example, the input

feature_specs: {
    "water": {
        "num_features_per_worker": 500
    },
    "oil_and_gas_pipelines": {
        "feature_type": "oil and gas",
        "feature_filter_type": "clip"
    }
}

would update the existing "water" setbacks calculator to compute 500 features per worker at a time and create a new "oil_and_gas_pipelines" feature that looks for the string "oil and gas" in the regulations file and clips the feature to a county before calculating a setback. Note that even though "oil_and_gas_pipelines" is not a default feature supported by reVX, you can now use it in the features input. This can also be helpful if you need to compute the same type of setback for multiple different input datasets. For example, the input

feature_specs: {
    "water-nwi": {
        "feature_type": "water",
        "buffer_type": "default",
        "feature_filter_type": "clip",
        "num_features_per_worker": 700,
    },
    "water-nhd": {
        "feature_type": "water",
        "buffer_type": "default",
        "feature_filter_type": "clip",
        "num_features_per_worker": 10_000,
    }
}

would allow you to set up your features input like so:

features: {
    "water-nwi": "/path/to/nwi/*.gpkg",
    "water-nhd": "/path/to/nhd/*.gpkg",
}

By default, None, which does not add any new setback calculators (the default ones defined in SETBACK_SPECS are still available).

featuresdict

Dictionary specifying which features/data to process. The keys of this dictionary must be the a key from the SETBACK_SPECS dictionary or the feature_specs input dictionary specifying the feature type to run setbacks for. The value of each key must be a path or a list of paths to calculate that particular setback for. The path(s) can contain unix-style file-pattern matching syntax to point to multiple files. The paths may be specified relative to the config file. For example:

features: {
    "parcel": "../relative/path/to/parcel_colorado.gpkg",
    "road": [
        "/full/path/to/road/data/*.gpkg",
        "../../relative/path/to/data_i[l,n].gpkg",
    ]
}

With this input, parcel setbacks would be computed for the data in ../relative/path/to/parcel_colorado.gpkg, and road setbacks would be calculated for all GeoPackage data files in /full/path/to/road/data/ and for the files ../../relative/path/to/data_il.gpkg and ../../relative/path/to/data_in.gpkg.

generic_setback_multiplierint | float | str, optional

Optional setback multiplier to use where local regulations are not supplied. This multiplier will be applied to the base_setback_dist (or the turbine tip-height) to calculate the setback. If supplied along with regulations_fpath, this input will be used to apply a setback to all counties not listed in the regulations file. This input can also be a path to a config file containing feature types as keys and feature-specific generic multipliers as values. For example:

{
    "parcel": 1.1,
    "road": 2,
    "structure": 3.5
}

If specified this way, every key in the features inputs must also be given in the generic multipliers config. If None, no generic setback computation is performed. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

merge

Execute the merge step from a config file.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks merge [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the merge configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "are_partial_inclusions": null,
    "purge_chunks": false,
    "merge_file_pattern": "PIPELINE"
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
log_directory: ./logs
log_level: INFO
are_partial_inclusions: null
purge_chunks: false
merge_file_pattern: PIPELINE

log_directory = "./logs"
log_level = "INFO"
purge_chunks = false
merge_file_pattern = "PIPELINE"

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

are_partial_inclusionsbool, optional

Flag indicating wether the inputs are partial inclusion values or boolean exclusions. If None, will try to infer automatically from the input file’s GeoTIFF profile (dtype != uint8). By default, None.

purge_chunksbool, optional

Flag indicating wether individual “chunk” files should be deleted after a successful merge (True), or if they should be stored in a “chunk_files” directory (False). By default, False.

merge_file_patternstr | list | dict, optional

Unix-style /filepath/pattern*.h5 representing the files to be merged into a single output GeoTIFF file. If no output file path is specified (i.e. this input is a single pattern or a list of patterns), the output file path will be inferred from the pattern itself (specifically, the wildcard will be removed and the result will be the output file path). If a list of patterns is provided, each pattern will be merged into a separate output file. To specify the name of the output file(s), set this input to a dictionary whose keys are paths to the output file (relative paths are allowed) and the values are patterns representing the input files that should be merged into the output TIFF. If running a merge job as part of a pipeline, this input can be set to "PIPELINE", which will parse the output of the previous step (compute) and generate the input file pattern and output file name automatically. By default, "PIPELINE".

Note that you may remove any keys with a null value if you do not intend to update them yourself.

pipeline

Execute multiple steps in an analysis pipeline.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks pipeline [OPTIONS]

Options

-c, --config_file <config_file>

Path to the pipeline configuration file. This argument can be left out, but one and only one file with “pipeline” in the name should exist in the directory and contain the config information. Below is a sample template config

{
    "pipeline": [
        {
            "compute": "./config_compute.json"
        },
        {
            "merge": "./config_merge.json"
        },
        {
            "script": "./config_script.json"
        }
    ],
    "logging": {
        "log_file": null,
        "log_level": "INFO"
    }
}

pipeline:
- compute: ./config_compute.json
- merge: ./config_merge.json
- script: ./config_script.json
logging:
  log_file: null
  log_level: INFO

[[pipeline]]
compute = "./config_compute.json"

[[pipeline]]
merge = "./config_merge.json"

[[pipeline]]
script = "./config_script.json"

[logging]
log_level = "INFO"

Parameters

pipelinelist of dicts

A list of dictionaries, where each dictionary represents one step in the pipeline. Each dictionary should have one of two configurations:

A single key-value pair, where the key is the name of the CLI command to run, and the value is the path to a config file containing the configuration for that command

Exactly two key-value pairs, where one of the keys is "command", with a value that points to the name of a command to execute, while the second key is a _unique_ user-defined name of the pipeline step to execute, with a value that points to the path to a config file containing the configuration for the command specified by the other key. This configuration allows users to specify duplicate commands as part of their pipeline execution.

loggingdict, optional

Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the submission portion of the pipeline. Note, however, that each step (command) will also record the submission step log output to a common “project” log file, so it’s only ever necessary to use this input if you want a different (lower) level of verbosity than the log_level specified in the config for the step of the pipeline being executed.

--cancel: Flag to cancel all jobs associated with a given pipeline.

--monitor: Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).

-r, --recursive: Flag to recursively submit pipelines, starting from the current directory and checking every sub-directory therein. The -c option will be completely ignored if you use this option. Instead, the code will check every sub-directory for exactly one file with the word pipeline in it. If found, that file is assumed to be the pipeline config and is used to kick off the pipeline. In any other case, the directory is skipped.

--background: Flag to monitor pipeline jobs continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline ‘log_file’ to capture logs.

reset-status

Reset the pipeline/job status (progress) for a given directory (defaults to ./). Multiple directories can be supplied to reset the status of each.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks reset-status [DIRECTORY]...

Options

-f, --force: Force pipeline status reset even if jobs are queued/running

-a, --after-step <after_step>: Reset pipeline starting after the given pipeline step. The status of this step will remain unaffected, but the status of steps following it will be reset completely.

Arguments

DIRECTORY: Optional argument(s)

script

Execute the script step from a config file.

This command runs one or more terminal commands/scripts as part of a pipeline step.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks script [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the script configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "cmd": "[REQUIRED]"
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
log_directory: ./logs
log_level: INFO
cmd: '[REQUIRED]'

log_directory = "./logs"
log_level = "INFO"
cmd = "[REQUIRED]"

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

cmdstr | list

A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify

"cmd": "python my_script.py"

This will run the python file “my_script.py” (in the project directory) on a single node.

Important

It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.

To run multiple commands in parallel, supply them as a list:

"cmd": [
    "python /path/to/my_script/py -a -out out_file.txt",
    "wget https://website.org/latest.zip"
]

This input will run two commands (a python script with the specified arguments and a wget command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

status

Display the status of a project FOLDER.

By default, the status of the current working directory is displayed.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).”

setbacks status [OPTIONS] [FOLDER]

Options

-ps, --pipe_steps <pipe_steps>: Filter status for the given pipeline step(s). Multiple steps can be specified by repeating this option (e.g. -ps step1 -ps step2 ...) By default, the status of all pipeline steps is displayed.

-s, --status <status>

Filter jobs for the requested status(es). Allowed options (case-insensitive) include:

Failed: failure fail failed f

Running: running run r

Submitted: submitted submit sb pending pend p

Success: successful success s

Not submitted: unsubmitted unsubmit u not_submitted ns

Multiple status keys can be specified by repeating this option (e.g. -s status1 -s status2 ...). By default, all status values are displayed.

-i, --include <include>: Extra status keys to include in the print output for each job. Multiple status keys can be specified by repeating this option (e.g. -i key1 -i key2 ...) By default, no extra keys are displayed.

-r, --recursive: Option to perform a recursive search of directories (starting with the input directory). The status of every nested directory is reported.

Arguments

FOLDER: Optional argument

template-configs

Generate template config files for requested COMMANDS. If no COMMANDS are given, config files for the entire pipeline are generated.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

setbacks template-configs [COMMANDS]...

Options

-t, --type <type>

Configuration file type to generate. Allowed options (case-insensitive): json5 json toml yaml yml.

Default:: 'json'

Arguments

COMMANDS: Optional argument(s)