setbacks
Command Line Interface
setbacks [OPTIONS] COMMAND [ARGS]...
Options
- -v, --verbose
Flag to turn on debug logging. Default is not verbose.
batch
Execute an analysis pipeline over a parametric set of inputs.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
setbacks batch [OPTIONS]
Options
- -c, --config_file <config_file>
Required Path to the
batch
configuration file. Below is a sample template config{ "logging": { "log_file": null, "log_level": "INFO" }, "pipeline_config": "[REQUIRED]", "sets": [ { "args": "[REQUIRED]", "files": "[REQUIRED]", "set_tag": "set1" }, { "args": "[REQUIRED]", "files": "[REQUIRED]", "set_tag": "set2" } ] }
logging: log_file: null log_level: INFO pipeline_config: '[REQUIRED]' sets: - args: '[REQUIRED]' files: '[REQUIRED]' set_tag: set1 - args: '[REQUIRED]' files: '[REQUIRED]' set_tag: set2
pipeline_config = "[REQUIRED]" [[sets]] args = "[REQUIRED]" files = "[REQUIRED]" set_tag = "set1" [[sets]] args = "[REQUIRED]" files = "[REQUIRED]" set_tag = "set2" [logging] log_level = "INFO"
Parameters
- loggingdict, optional
Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the batch command. Note that each pipeline job submitted via batch has it’s own
logging
key that will initialize pipeline step logging. Therefore, it’s only ever necessary to use this input if you want logging information about the batching portion of the execution.- pipeline_configstr
Path to the pipeline configuration defining the commands to run for every parametric set.
- setslist of dicts
A list of dictionaries, where each dictionary defines a “set” of parametric runs. Each dictionary should have the following keys:
- argsdict
A dictionary defining the arguments across all input configuration files to parameterize. Each argument to be parametrized should be a key in this dictionary, and the value should be a list of the parameter values to run for this argument (single-item lists are allowed and can be used to vary a parameter value across sets).
"args": { "input_constant_1": [ 18.02, 19.04 ], "path_to_a_file": [ "/first/path.h5", "/second/path.h5", "/third/path.h5" ] }
args: input_constant_1: - 18.02 - 19.04 path_to_a_file: - /first/path.h5 - /second/path.h5 - /third/path.h5
[args] input_constant_1 = [ 18.02, 19.04,] path_to_a_file = [ "/first/path.h5", "/second/path.h5", "/third/path.h5",]
This example would run a total of six pipelines, one with each of the following arg combinations:
input_constant_1=18.20, path_to_a_file="/first/path.h5" input_constant_1=18.20, path_to_a_file="/second/path.h5" input_constant_1=18.20, path_to_a_file="/third/path.h5" input_constant_1=19.04, path_to_a_file="/first/path.h5" input_constant_1=19.04, path_to_a_file="/second/path.h5" input_constant_1=19.04, path_to_a_file="/third/path.h5"
Remember that the keys in the
args
dictionary should be part of (at least) one of your other configuration files.- fileslist
A list of paths to the configuration files that contain the arguments to be updated for every parametric run. Arguments can be spread out over multiple files. For example:
"files": [ "./config_run.yaml", "./config_analyze.json" ]
files: - ./config_run.yaml - ./config_analyze.json
files = [ "./config_run.yaml", "./config_analyze.json",]
- set_tagstr, optional
Optional string defining a set tag that will prefix each job tag for this set. This tag does not need to include an underscore, as that is provided during concatenation.
- --dry
Flag to do a dry run (make batch dirs and update files without running the pipeline).
- --cancel
Flag to cancel all jobs associated associated with the
batch_jobs.csv
file in the current batch config directory.
- --delete
Flag to delete all batch job sub directories associated with the
batch_jobs.csv
file in the current batch config directory.
- --monitor-background
Flag to monitor all batch pipelines continuously in the background. Note that the
stdout/stderr
will not be captured, but you can set a pipeline"log_file"
to capture logs.
compute
Execute the compute
step from a config file.
Setbacks can be computed for a specific turbine (hub height and rotor diameter) or more generally using a base setback distance.
Setbacks can be computed either locally (on a per-county basis with given distances/multipliers) or everywhere under a generic setback multiplier assumption applied to either the turbine tip-height or the base setback distance. These two methods can also be applied simultaneously - local setbacks are computed where given (via a the regulation file input) and a generic multiplier applied to the turbine tip-height or the base setback distance everywhere else.
Partial inclusions can be computed instead of boolean exclusions,
both of which can be fed directly into reV
.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
setbacks compute [OPTIONS]
Options
- -c, --config_file <config_file>
Required Path to the
compute
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "num_test_nodes": null, "max_workers": null }, "log_directory": "./logs", "log_level": "INFO", "excl_fpath": "[REQUIRED]", "hub_height": null, "rotor_diameter": null, "base_setback_dist": null, "regulations_fpath": null, "weights_calculation_upscale_factor": null, "replace": false, "hsds": false, "out_layers": null, "feature_specs": null, "features": "[REQUIRED]", "generic_setback_multiplier": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null num_test_nodes: null max_workers: null log_directory: ./logs log_level: INFO excl_fpath: '[REQUIRED]' hub_height: null rotor_diameter: null base_setback_dist: null regulations_fpath: null weights_calculation_upscale_factor: null replace: false hsds: false out_layers: null feature_specs: null features: '[REQUIRED]' generic_setback_multiplier: null
log_directory = "./logs" log_level = "INFO" excl_fpath = "[REQUIRED]" replace = false hsds = false features = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal"
Parameters
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- max_workers:
(int, optional) Number of workers to use for setback exclusion computation. If this value is 1, the computation runs in serial. If this value is > 1, the computation runs in parallel with that many workers. If
None
, the computation runs in parallel on all available cores. By default,None
.- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- excl_fpathstr
Path to HDF5 file containing the county FIPS layer (should be called
cnty_fips
) used to match local regulations inregulations_fpath
to counties on the grid. No data will be written to this file unless explicitly requested via theout_layers
input.- hub_heightint | float, optional
Turbine hub height (m), used along with rotor diameter to compute the blade tip-height which is used as the base setback distance for generic/local regulations. If this input is specified,
rotor_diameter
must also be given, andbase_setback_dist
must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via theregulations_fpath
csv, or thegeneric_setback_multiplier
input, or both) before setbacks are computed. By default,None
.- rotor_diameterint | float, optional
Turbine rotor diameter (m), used along with hub height to compute the blade tip-height, which is used as the base setback distance for generic/local regulations. If this input is specified,
hub_height
must also be given, andbase_setback_dist
must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via theregulations_fpath
csv, or thegeneric_setback_multiplier
input, or both) before setbacks are computed. By default,None
.- base_setback_distint | float, optional
Base setback distance (m). This value is used as the base setback distance for generic/local regulations. If this input is specified, both
hub_height``and ``rotor_diameter
must be set to None, otherwise an error in thrown. The base setback distance is scaled by generic/local multipliers (provided either via theregulations_fpath
csv, or thegeneric_setback_multiplier
input, or both) before setbacks are computed. By default,None
.- regulations_fpathstr, optional
Path to regulations
.csv
or.gpkg
file. At a minimum, this file must contain the following columns:Feature Type
: Contains labels for the type of setback that each row represents. This should be a “feature_type” label that can be found in theSETBACK_SPECS
dictionary (e.g."structures"
,"roads"
,"water"
, etc.), unless you have created your own setback calculator usingsetbacks_calculator()
, in which case this label can match the feature_type input you used for that function call.Feature Subtype
: Contains labels for feature subtypes. The feature subtypes are only used for down-selecting the local regulations that should be applied for a particular feature, so often you can leave this blank or set it toNone
. If you do specify this value, it should be a “feature_subtypes_to_exclude” label that can be found in theSETBACK_SPECS
dictionary, unless you have created your own setback calculator usingsetbacks_calculator()
, in which case this label can match the feature_subtypes_to_exclude input you used for that function call.Value Type
: Specifies wether the value is a multiplier or static height. SeeSetbackRegulations
(if using onlybase_setback_dist
input) orWindSetbackRegulations
(if usinghub_height
+rotor_diameter
input) for more info.Value
: Numeric value of the setback or multiplier.FIPS
: Specifies a unique 5-digit code for each county (this can be an integer - no leading zeros required). This is used along side thecnty_fips
layer in the excl_fpath to match the county regulations to the county’s spatial extent.
This option overrides the
generic_setback_multiplier
input, but only for counties that are listed in the input CSV file. This means bothregulations_fpath
andgeneric_setback_multiplier
can be specified simultaneously in order to compute setbacks driven by local ordinance where given + a generic multiplier applied everywhere else. By default,None
, which does not compute any local setbacks.- weights_calculation_upscale_factorint, optional
Optional input to specify partial setback calculations. If this value is an int > 1, the output will be a layer with inclusion weight values (floats ranging from 0 to 1). Note that this is backwards w.r.t the typical output of exclusion integer values (1 for excluded, 0 otherwise). Values <= 1 will still return a standard exclusion mask. For example, a cell that was previously excluded with a boolean mask (value of 1) may instead be converted to an inclusion weight value of 0.75, meaning that 75% of the area corresponding to that point should be included (i.e. the exclusion feature only intersected a small portion - 25% - of the cell). This percentage inclusion value is calculated by upscaling the output array using this input value, rasterizing the exclusion features onto it, and counting the number of resulting sub-cells excluded by the feature. For example, setting the value to
3
would split each output cell into nine sub-cells - 3 divisions in each dimension. After the feature is rasterized on this high-resolution sub-grid, the area of the non-excluded sub-cells is totaled and divided by the area of the original cell to obtain the final inclusion percentage. Therefore, a larger upscale factor results in more accurate percentage values. IfNone
(or a value <= 1), this process is skipped and the output is a boolean exclusion mask. By defaultNone
.- replacebool, optional
Flag to replace the output GeoTIFF if it already exists. By default,
False
.- hsdsbool, optional
Boolean flag to use
h5pyd
to handle HDF5 “files” hosted on AWS behind HSDS. By default,False
.- out_layersdict, optional
Dictionary mapping the input feature file names (with extension) to names of layers under which exclusions should be saved in the
excl_fpath
HDF5 file. IfNone
or empty dictionary, no layers are saved to the HDF5 file. By default,None
.- feature_specsdict, optional
Optional dictionary specifying new feature setback calculators or updates to existing ones. The keys of this dictionary should be names of the features for which a specification is being provided. If the name is already a key in
SETBACK_SPECS
, the corresponding specifications wil be updated for that feature. Otherwise, the name will represent a new feature type, which can be used as a key in thefeatures
input. The values of the feature-type keys should be dictionaries, where the keys are parameters of thesetbacks_calculator()
function. Required parameters in that function are required keys of these dictionaries. Values should be the updated value. For example, the inputfeature_specs: { "water": { "num_features_per_worker": 500 }, "oil_and_gas_pipelines": { "feature_type": "oil and gas", "feature_filter_type": "clip" } }
would update the existing
"water"
setbacks calculator to compute 500 features per worker at a time and create a new"oil_and_gas_pipelines"
feature that looks for the string"oil and gas"
in the regulations file and clips the feature to a county before calculating a setback. Note that even though"oil_and_gas_pipelines"
is not a default feature supported byreVX
, you can now use it in thefeatures
input. This can also be helpful if you need to compute the same type of setback for multiple different input datasets. For example, the inputfeature_specs: { "water-nwi": { "feature_type": "water", "buffer_type": "default", "feature_filter_type": "clip", "num_features_per_worker": 700, }, "water-nhd": { "feature_type": "water", "buffer_type": "default", "feature_filter_type": "clip", "num_features_per_worker": 10_000, } }
would allow you to set up your
features
input like so:features: { "water-nwi": "/path/to/nwi/*.gpkg", "water-nhd": "/path/to/nhd/*.gpkg", }
By default,
None
, which does not add any new setback calculators (the default ones defined inSETBACK_SPECS
are still available).- featuresdict
Dictionary specifying which features/data to process. The keys of this dictionary must be the a key from the
SETBACK_SPECS
dictionary or thefeature_specs
input dictionary specifying the feature type to run setbacks for. The value of each key must be a path or a list of paths to calculate that particular setback for. The path(s) can contain unix-style file-pattern matching syntax to point to multiple files. The paths may be specified relative to the config file. For example:features: { "parcel": "../relative/path/to/parcel_colorado.gpkg", "road": [ "/full/path/to/road/data/*.gpkg", "../../relative/path/to/data_i[l,n].gpkg", ] }
With this input, parcel setbacks would be computed for the data in
../relative/path/to/parcel_colorado.gpkg
, and road setbacks would be calculated for all GeoPackage data files in/full/path/to/road/data/
and for the files../../relative/path/to/data_il.gpkg
and../../relative/path/to/data_in.gpkg
.- generic_setback_multiplierint | float | str, optional
Optional setback multiplier to use where local regulations are not supplied. This multiplier will be applied to the
base_setback_dist
(or the turbine tip-height) to calculate the setback. If supplied along withregulations_fpath
, this input will be used to apply a setback to all counties not listed in the regulations file. This input can also be a path to a config file containing feature types as keys and feature-specific generic multipliers as values. For example:{ "parcel": 1.1, "road": 2, "structure": 3.5 }
If specified this way, every key in the
features
inputs must also be given in the generic multipliers config. IfNone
, no generic setback computation is performed. By default,None
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
merge
Execute the merge
step from a config file.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
setbacks merge [OPTIONS]
Options
- -c, --config_file <config_file>
Required Path to the
merge
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "num_test_nodes": null }, "log_directory": "./logs", "log_level": "INFO", "are_partial_inclusions": null, "purge_chunks": false, "merge_file_pattern": "PIPELINE" }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null num_test_nodes: null log_directory: ./logs log_level: INFO are_partial_inclusions: null purge_chunks: false merge_file_pattern: PIPELINE
log_directory = "./logs" log_level = "INFO" purge_chunks = false merge_file_pattern = "PIPELINE" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal"
Parameters
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- are_partial_inclusionsbool, optional
Flag indicating wether the inputs are partial inclusion values or boolean exclusions. If
None
, will try to infer automatically from the input file’s GeoTIFF profile (dtype != uint8
). By default,None
.- purge_chunksbool, optional
Flag indicating wether individual “chunk” files should be deleted after a successful merge (
True
), or if they should be stored in a “chunk_files” directory (False
). By default,False
.- merge_file_patternstr | list | dict, optional
Unix-style
/filepath/pattern*.h5
representing the files to be merged into a single output GeoTIFF file. If no output file path is specified (i.e. this input is a single pattern or a list of patterns), the output file path will be inferred from the pattern itself (specifically, the wildcard will be removed and the result will be the output file path). If a list of patterns is provided, each pattern will be merged into a separate output file. To specify the name of the output file(s), set this input to a dictionary whose keys are paths to the output file (relative paths are allowed) and the values are patterns representing the input files that should be merged into the output TIFF. If running a merge job as part of a pipeline, this input can be set to"PIPELINE"
, which will parse the output of the previous step (compute
) and generate the input file pattern and output file name automatically. By default,"PIPELINE"
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
pipeline
Execute multiple steps in an analysis pipeline.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
setbacks pipeline [OPTIONS]
Options
- -c, --config_file <config_file>
Path to the
pipeline
configuration file. This argument can be left out, but one and only one file with “pipeline” in the name should exist in the directory and contain the config information. Below is a sample template config{ "pipeline": [ { "compute": "./config_compute.json" }, { "merge": "./config_merge.json" }, { "script": "./config_script.json" } ], "logging": { "log_file": null, "log_level": "INFO" } }
pipeline: - compute: ./config_compute.json - merge: ./config_merge.json - script: ./config_script.json logging: log_file: null log_level: INFO
[[pipeline]] compute = "./config_compute.json" [[pipeline]] merge = "./config_merge.json" [[pipeline]] script = "./config_script.json" [logging] log_level = "INFO"
Parameters
- pipelinelist of dicts
A list of dictionaries, where each dictionary represents one step in the pipeline. Each dictionary should have one of two configurations:
A single key-value pair, where the key is the name of the CLI command to run, and the value is the path to a config file containing the configuration for that command
Exactly two key-value pairs, where one of the keys is
"command"
, with a value that points to the name of a command to execute, while the second key is a _unique_ user-defined name of the pipeline step to execute, with a value that points to the path to a config file containing the configuration for the command specified by the other key. This configuration allows users to specify duplicate commands as part of their pipeline execution.
- loggingdict, optional
Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the submission portion of the pipeline. Note, however, that each step (command) will also record the submission step log output to a common “project” log file, so it’s only ever necessary to use this input if you want a different (lower) level of verbosity than the log_level specified in the config for the step of the pipeline being executed.
- --cancel
Flag to cancel all jobs associated with a given pipeline.
- --monitor
Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).
- -r, --recursive
Flag to recursively submit pipelines, starting from the current directory and checking every sub-directory therein. The -c option will be completely ignored if you use this option. Instead, the code will check every sub-directory for exactly one file with the word pipeline in it. If found, that file is assumed to be the pipeline config and is used to kick off the pipeline. In any other case, the directory is skipped.
- --background
Flag to monitor pipeline jobs continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline ‘log_file’ to capture logs.
reset-status
Reset the pipeline/job status (progress) for a given directory (defaults to ./
). Multiple directories can be supplied to reset the status of each.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
setbacks reset-status [DIRECTORY]...
Options
- -f, --force
Force pipeline status reset even if jobs are queued/running
- -a, --after-step <after_step>
Reset pipeline starting after the given pipeline step. The status of this step will remain unaffected, but the status of steps following it will be reset completely.
Arguments
- DIRECTORY
Optional argument(s)
script
Execute the script
step from a config file.
This command runs one or more terminal commands/scripts as part of a pipeline step.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
setbacks script [OPTIONS]
Options
- -c, --config_file <config_file>
Required Path to the
script
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "num_test_nodes": null }, "log_directory": "./logs", "log_level": "INFO", "cmd": "[REQUIRED]" }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null num_test_nodes: null log_directory: ./logs log_level: INFO cmd: '[REQUIRED]'
log_directory = "./logs" log_level = "INFO" cmd = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal"
Parameters
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- cmdstr | list
A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify
"cmd": "python my_script.py"
This will run the python file “my_script.py” (in the project directory) on a single node.
Important
It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.
To run multiple commands in parallel, supply them as a list:
"cmd": [ "python /path/to/my_script/py -a -out out_file.txt", "wget https://website.org/latest.zip" ]
This input will run two commands (a python script with the specified arguments and a
wget
command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
status
Display the status of a project FOLDER.
By default, the status of the current working directory is displayed.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).”
setbacks status [OPTIONS] [FOLDER]
Options
- -ps, --pipe_steps <pipe_steps>
Filter status for the given pipeline step(s). Multiple steps can be specified by repeating this option (e.g.
-ps step1 -ps step2 ...
) By default, the status of all pipeline steps is displayed.
- -s, --status <status>
Filter jobs for the requested status(es). Allowed options (case-insensitive) include:
Failed:
failure
fail
failed
f
Running:
running
run
r
Submitted:
submitted
submit
sb
pending
pend
p
Success:
successful
success
s
Not submitted:
unsubmitted
unsubmit
u
not_submitted
ns
Multiple status keys can be specified by repeating this option (e.g.
-s status1 -s status2 ...
). By default, all status values are displayed.
- -i, --include <include>
Extra status keys to include in the print output for each job. Multiple status keys can be specified by repeating this option (e.g.
-i key1 -i key2 ...
) By default, no extra keys are displayed.
- -r, --recursive
Option to perform a recursive search of directories (starting with the input directory). The status of every nested directory is reported.
Arguments
- FOLDER
Optional argument
template-configs
Generate template config files for requested COMMANDS. If no COMMANDS are given, config files for the entire pipeline are generated.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
setbacks template-configs [COMMANDS]...
Options
- -t, --type <type>
Configuration file type to generate. Allowed options (case-insensitive):
json5
json
toml
yaml
yml
.- Default:
'json'
Arguments
- COMMANDS
Optional argument(s)