reVRt#
reVRt Command Line Interface.
Typically, a good place to start is to set up a reVRt job with a pipeline config that points to several reVRt modules that you want to run in serial.
To begin, you can generate some template configuration files using:
$ reVRt template-configs
By default, this generates template JSON configuration files, though you
can request JSON5, YAML, or TOML configuration files instead. You can run
$ reVRt template-configs --help
on the command line to see all available
options for the template-configs
command. Once the template configuration
files have been generated, you can fill them out by referring to the
module CLI documentation (if available) or the help pages of the module CLIs
for more details on the config options for each CLI command:
$ reVRt --help
$ reVRt layers-to-file --help
$ reVRt layers-from-file --help
$ reVRt build-routing-layers --help
$ reVRt route-characterization --help
$ reVRt script --help
After appropriately filling our the configuration files for each module you want to run, you can call the reVRt pipeline CLI using:
$ reVRt pipeline -c config_pipeline.json
This command will run each pipeline step in sequence.
Note
You will need to re-submit the pipeline
command above after
each completed pipeline step.
To check the status of the pipeline, you can run:
$ reVRt status
This will print a report to the command line detailing the progress of the
current pipeline. See $ reVRt status --help
for all status command
options.
If you need to parameterize the pipeline execution, you can use the batch
command. For details on setting up a batch config file, see the documentation
or run:
$ reVRt batch --help
on the command line. Once you set up a batch config file, you can execute it using:
$ reVRt batch -c config_batch.json
For more information on getting started, see the How to Run a Model Powered by GAPs guide.
The general structure of the reVRt CLI is given below.
reVRt [OPTIONS] COMMAND [ARGS]...
Options
- -v, --verbose#
Flag to turn on debug logging. Default is not verbose.
- --version#
Show the version and exit.
batch#
Execute an analysis pipeline over a parametric set of inputs.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
reVRt batch [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
batch
configuration file. Below is a sample template config{ "logging": { "log_file": null, "log_level": "INFO" }, "pipeline_config": "[REQUIRED]", "sets": [ { "args": "[REQUIRED]", "files": "[REQUIRED]", "set_tag": "set1" }, { "args": "[REQUIRED]", "files": "[REQUIRED]", "set_tag": "set2" } ] }
logging: log_file: null log_level: INFO pipeline_config: '[REQUIRED]' sets: - args: '[REQUIRED]' files: '[REQUIRED]' set_tag: set1 - args: '[REQUIRED]' files: '[REQUIRED]' set_tag: set2
pipeline_config = "[REQUIRED]" [[sets]] args = "[REQUIRED]" files = "[REQUIRED]" set_tag = "set1" [[sets]] args = "[REQUIRED]" files = "[REQUIRED]" set_tag = "set2" [logging] log_level = "INFO"
Parameters#
- loggingdict, optional
Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the batch command. Note that each pipeline job submitted via batch has it’s own
logging
key that will initialize pipeline step logging. Therefore, it’s only ever necessary to use this input if you want logging information about the batching portion of the execution.- pipeline_configstr
Path to the pipeline configuration defining the commands to run for every parametric set.
- setslist of dicts
A list of dictionaries, where each dictionary defines a “set” of parametric runs. Each dictionary should have the following keys:
- argsdict
A dictionary defining the arguments across all input configuration files to parameterize. Each argument to be parametrized should be a key in this dictionary, and the value should be a list of the parameter values to run for this argument (single-item lists are allowed and can be used to vary a parameter value across sets).
"args": { "input_constant_1": [ 18.02, 19.04 ], "path_to_a_file": [ "/first/path.h5", "/second/path.h5", "/third/path.h5" ] }
args: input_constant_1: - 18.02 - 19.04 path_to_a_file: - /first/path.h5 - /second/path.h5 - /third/path.h5
[args] input_constant_1 = [ 18.02, 19.04,] path_to_a_file = [ "/first/path.h5", "/second/path.h5", "/third/path.h5",]
This example would run a total of six pipelines, one with each of the following arg combinations:
input_constant_1=18.20, path_to_a_file="/first/path.h5" input_constant_1=18.20, path_to_a_file="/second/path.h5" input_constant_1=18.20, path_to_a_file="/third/path.h5" input_constant_1=19.04, path_to_a_file="/first/path.h5" input_constant_1=19.04, path_to_a_file="/second/path.h5" input_constant_1=19.04, path_to_a_file="/third/path.h5"
Remember that the keys in the
args
dictionary should be part of (at least) one of your other configuration files.- fileslist
A list of paths to the configuration files that contain the arguments to be updated for every parametric run. Arguments can be spread out over multiple files. For example:
"files": [ "./config_run.yaml", "./config_analyze.json" ]
files: - ./config_run.yaml - ./config_analyze.json
files = [ "./config_run.yaml", "./config_analyze.json",]
- set_tagstr, optional
Optional string defining a set tag that will prefix each job tag for this set. This tag does not need to include an underscore, as that is provided during concatenation.
- --dry#
Flag to do a dry run (make batch dirs and update files without running the pipeline).
- --cancel#
Flag to cancel all jobs associated associated with the
batch_jobs.csv
file in the current batch config directory.
- --delete#
Flag to delete all batch job sub directories associated with the
batch_jobs.csv
file in the current batch config directory.
- --monitor-background#
Flag to monitor all batch pipelines continuously in the background. Note that the
stdout/stderr
will not be captured, but you can set a pipeline"log_file"
to capture logs.
build-routing-layers#
Execute the build-routing-layers
step from a config file.
You can re-run this function on an existing file to add new layers without overwriting existing layers or needing to change your original config.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reVRt build-routing-layers [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
build-routing-layers
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null, "max_workers": 1, "memory_limit_per_worker": "auto" }, "log_directory": "./logs", "log_level": "INFO", "routing_file": "[REQUIRED]", "template_file": null, "input_layer_dir": ".", "output_tiff_dir": ".", "masks_dir": ".", "layers": null, "dry_costs": null, "merge_friction_and_barriers": null, "create_kwargs": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null max_workers: 1 memory_limit_per_worker: auto log_directory: ./logs log_level: INFO routing_file: '[REQUIRED]' template_file: null input_layer_dir: . output_tiff_dir: . masks_dir: . layers: null dry_costs: null merge_friction_and_barriers: null create_kwargs: null
log_directory = "./logs" log_level = "INFO" routing_file = "[REQUIRED]" input_layer_dir = "." output_tiff_dir = "." masks_dir = "." [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false max_workers = 1 memory_limit_per_worker = "auto"
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- max_workers:
(int, optional) Number of parallel workers to use for file creation. If
None
or >1, processing is performed in parallel using Dask. By default,1
.- memory_limit_per_worker:
(str, float, int, or None, default=”auto”) Sets the memory limit per worker. This only applies if
max_workers != 1
. IfNone
or0
, no limit is applied. If"auto"
, the total system memory is split evenly between the workers. If a float, that fraction of the system memory is used per worker. If a string giving a number of bytes (like “1GiB”), that amount is used per worker. If an int, that number of bytes is used per worker. By default,"auto"
- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False
, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- routing_filepath-like
Path to GeoTIFF/Zarr file to store cost layers in. If the file does not exist, it will be created based on the template_file input.
- template_filepath-like, optional
Path to template GeoTIFF (
*.tif
or*.tiff
) or Zarr (*.zarr
) file containing the profile and transform to be used for the layered costs file. IfNone
, then the routing_file is assumed to exist on disk already. By default,None
.- input_layer_dirpath-like, optional
Directory to search for input layers in, if not found in current directory. By default,
'.'
.- output_tiff_dirpath-like, optional
Directory where cost layers should be saved as GeoTIFF. By default,
"."
.- masks_dirpath-like, optional
Directory for storing/finding mask GeoTIFFs (wet, dry, landfall, wet+, dry+). By default,
"."
.- layerslist of LayerConfig, optional
Configuration for layers to be built and added to the file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default,
None
.- dry_costsDryCosts, optional
Configuration for dry cost layers to be built and added to the file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default,
None
.- merge_friction_and_barriersMergeFrictionBarriers, optional
Configuration for merging friction and barriers and adding to the layered costs file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default,
None
- create_kwargsdict, optional
Additional keyword arguments to pass to
LayeredFile.create_new()
when creating a new layered file. Do not includetemplate_file
; it will be ignored. By default,None
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
layers-from-file#
Execute the layers-from-file
step from a config file.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reVRt layers-from-file [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
layers-from-file
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null }, "log_directory": "./logs", "log_level": "INFO", "fp": "[REQUIRED]", "layers": null, "profile_kwargs": null, "out_layer_dir": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null log_directory: ./logs log_level: INFO fp: '[REQUIRED]' layers: null profile_kwargs: null out_layer_dir: null
log_directory = "./logs" log_level = "INFO" fp = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False
, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- fppath-like
Path to layered file on disk.
- layerslist, optional
List of layer names to extract. Layer names must match layers in the fp, otherwise an error will be raised. If
None
, extracts all layers from theLayeredFile
. By default,None
.- profile_kwargsdict, optional
Additional keyword arguments to pass into writing each raster. The following attributes ar ignored (they are set using properties of the source
LayeredFile
):nodata
transform
crs
count
width
height
By default,
None
.- out_layer_dirpath-like, optional
Path to output directory into which layers should be saved as GeoTIFFs. This directory will be created if it does not already exist. If not provided, will use the config directory as output. By default,
None
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
layers-to-file#
Execute the layers-to-file
step from a config file.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reVRt layers-to-file [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
layers-to-file
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null }, "log_directory": "./logs", "log_level": "INFO", "fp": "[REQUIRED]", "layers": "[REQUIRED]", "check_tiff": true, "descriptions": null, "overwrite": false, "nodata": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null log_directory: ./logs log_level: INFO fp: '[REQUIRED]' layers: '[REQUIRED]' check_tiff: true descriptions: null overwrite: false nodata: null
log_directory = "./logs" log_level = "INFO" fp = "[REQUIRED]" layers = "[REQUIRED]" check_tiff = true overwrite = false [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False
, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- fppath-like
Path to layered file on disk.
- layerslist | dict
Dictionary mapping layer names to GeoTIFFs filepaths. Each GeoTIFF will be loaded into the
LayeredFile
user the layer name. If a list of GeoTIFFs filepaths is provided, the file name stems are used as the layer names.- check_tiffbool, optional
Flag to check tiff profile and coordinates against layered file profile and coordinates. By default,
True
.- overwritebool, default=False
Option to overwrite layer data if layer already exists in
LayeredFile
.Important
When overwriting data, the encoding (and therefore things like data type, nodata value, etc) is not allowed to change. If you need to overwrite an existing layer with a new type of data, manually remove it from the file first.
By default,
False
.- nodataint | float, optional
Optional nodata value for the raster layer. This value will be added to the layer’s attributes meta dictionary under the “nodata” key.
Warning
rioxarray
does not recognize the “nodata” value when reading from a zarr file (because zarr uses the_FillValue
encoding internally). To get the correct “nodata” value back when reading aLayeredFile
, you can either 1) read fromda.rio.encoded_nodata
or 2) check the layer’s attributes for the"nodata"
key, and if present, useda.rio.write_nodata
to write the nodata value so thatda.rio.nodata
gives the right value.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
pipeline#
Execute multiple steps in an analysis pipeline.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
reVRt pipeline [OPTIONS]
Options
- -c, --config_file <config_file>#
Path to the
pipeline
configuration file. This argument can be left out, but one and only one file with “pipeline” in the name should exist in the directory and contain the config information. Below is a sample template config{ "pipeline": [ { "layers-to-file": "./config_layers_to_file.json" }, { "layers-from-file": "./config_layers_from_file.json" }, { "build-routing-layers": "./config_build_routing_layers.json" }, { "route-characterization": "./config_route_characterization.json" }, { "script": "./config_script.json" } ], "logging": { "log_file": null, "log_level": "INFO" } }
pipeline: - layers-to-file: ./config_layers_to_file.json - layers-from-file: ./config_layers_from_file.json - build-routing-layers: ./config_build_routing_layers.json - route-characterization: ./config_route_characterization.json - script: ./config_script.json logging: log_file: null log_level: INFO
[[pipeline]] layers-to-file = "./config_layers_to_file.json" [[pipeline]] layers-from-file = "./config_layers_from_file.json" [[pipeline]] build-routing-layers = "./config_build_routing_layers.json" [[pipeline]] route-characterization = "./config_route_characterization.json" [[pipeline]] script = "./config_script.json" [logging] log_level = "INFO"
Parameters#
- pipelinelist of dicts
A list of dictionaries, where each dictionary represents one step in the pipeline. Each dictionary should have one of two configurations:
A single key-value pair, where the key is the name of the CLI command to run, and the value is the path to a config file containing the configuration for that command
Exactly two key-value pairs, where one of the keys is
"command"
, with a value that points to the name of a command to execute, while the second key is a _unique_ user-defined name of the pipeline step to execute, with a value that points to the path to a config file containing the configuration for the command specified by the other key. This configuration allows users to specify duplicate commands as part of their pipeline execution.
- loggingdict, optional
Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the submission portion of the pipeline. Note, however, that each step (command) will also record the submission step log output to a common “project” log file, so it’s only ever necessary to use this input if you want a different (lower) level of verbosity than the log_level specified in the config for the step of the pipeline being executed.
- --cancel#
Flag to cancel all jobs associated with a given pipeline.
- --monitor#
Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).
- -r, --recursive#
Flag to recursively submit pipelines, starting from the current directory and checking every sub-directory therein. The -c option will be completely ignored if you use this option. Instead, the code will check every sub-directory for exactly one file with the word pipeline in it. If found, that file is assumed to be the pipeline config and is used to kick off the pipeline. In any other case, the directory is skipped.
- --background#
Flag to monitor pipeline jobs continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline ‘log_file’ to capture logs.
reset-status#
Reset the pipeline/job status (progress) for a given directory (defaults to ./
). Multiple directories can be supplied to reset the status of each.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
reVRt reset-status [DIRECTORY]...
Options
- -f, --force#
Force pipeline status reset even if jobs are queued/running
- -a, --after-step <after_step>#
Reset pipeline starting after the given pipeline step. The status of this step will remain unaffected, but the status of steps following it will be reset completely.
Arguments
- DIRECTORY#
Optional argument(s)
route-characterization#
Execute the route-characterization
step from a config file.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reVRt route-characterization [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
route-characterization
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null, "max_workers": 1, "memory_limit_per_worker": "auto" }, "log_directory": "./logs", "log_level": "INFO", "layers": "[REQUIRED]", "default_route_fp": null, "default_copy_properties": null, "default_row_width_key": null, "default_chunks": null, "row_widths": null, "row_width_ranges": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null max_workers: 1 memory_limit_per_worker: auto log_directory: ./logs log_level: INFO layers: '[REQUIRED]' default_route_fp: null default_copy_properties: null default_row_width_key: null default_chunks: null row_widths: null row_width_ranges: null
log_directory = "./logs" log_level = "INFO" layers = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false max_workers = 1 memory_limit_per_worker = "auto"
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- max_workers:
(int, optional) Number of parallel workers to use for computation. If
None
or >1, processing is performed in parallel (using Dask). If your paths span large areas, keep this value low (~10) to avoid running into memory errors. By default,1
.- memory_limit_per_worker:
(str, float, int, or None, default=”auto”) Sets the memory limit per worker. This only applies if
max_workers != 1
. IfNone
or0
, no limit is applied. If"auto"
, the total system memory is split evenly between the workers. If a float, that fraction of the system memory is used per worker. If a string giving a number of bytes (like “1GiB”), that amount is used per worker. If an int, that number of bytes is used per worker. By default,"auto"
- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False
, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- layersdict or list of dict
A single dictionary or a list of dictionaries specifying the statistics to compute. Each dictionary should contain the following keys:
geotiff_fp: (REQUIRED) Path to the raster file.
route_fp: (REQUIRED) Path to the vector file of routes. Must contain a “geometry” column and the row_width_key column (used to map to path ROW width).
stats: (OPTIONAL) Names of all statistics to compute. Statistics must be one of the members of
Stat
orFractionalStat
, or must start with thepercentile_
prefix and end with an int or float representing the percentile to compute (e.g.percentile_10.5
). If only one statistic is to be computed, you can provide it directly as a string. Otherwise, provide a list of statistic names or a string with the names separated by a space. You can also provide the string"ALL"
or"*"
to specify that all statistics should be computed (i.e. all options from bothStat
andFractionalStat
). If no input, empty input, orNone
is provided, then only the base stats (“count”, “min”, “max”, “mean”) are configured. To summarize, all of the following are valid inputs:stats: "*"
orstats="ALL"
orstats="All"
stats: "min"
stats: "min max"
stats: ["min"]
stats: ["min", "max", "percentile_10.5"]
nodata : (OPTIONAL) Value in the raster that represents nodata. This value will not show up in any statistics except for the nodata statistic itself, which computes the number of nodata values within the buffered routes. Note that this value is used in addition to any NODATA value in the raster’s metadata.
all_touched : (OPTIONAL) Boolean flag indicating whether to include every raster cell touched by a geometry (
True
), or only those having a center point within the polygon (False
). By default,True
.category_map : (OPTIONAL) Dictionary mapping raster values to new names. If given, this mapping will be applied to the pixel count dictionary, so you can use it to map raster values to human-readable category names.
multiplier_scalar: (OPTIONAL) Optional multiplier value to apply to layer before computing statistics. This is useful if you want to scale the values in the raster before computing statistics. By default,
1.0
.prefix: (OPTIONAL) A string representing a prefix to add to each stat name. If you wish to have the prefix separated by a delimiter, you must include it in this string (e.g.
prefix="test_"
).copy_properties: (OPTIONAL) List of columns names to copy over from the vector file of routes.
row_width_key: (OPTIONAL) Name of column in vector file of routes used to map to the ROW widths. By default,
"voltage"
.chunks : (OPTIONAL)
chunks
keyword argument to pass down torioxarray.open_rasterio()
. Use this to control the Dask chunk size. By default,"auto"
.
- default_route_fppath-like, optional
Default path to the vector file of routes. This will be used only if no route_fp is provided in a layer’s stats dictionary. Must contain a “geometry” column and the row_width_key column (used to map to path ROW width). By default,
None
.- default_copy_propertiesiterable of str, optional
Default iterable of columns names to copy over from the zone feature. This will be used only if no copy_properties is provided in a layer’s stats dictionary. By default,
None
.- default_row_width_keystr, optional
Default name of column in vector file of routes used to map to the ROW widths. This will be used only if no row_width_key is provided in a layer’s stats dictionary. By default,
None
.- default_chunkstuple or str, optional
Default
chunks
keyword argument to pass down torioxarray.open_rasterio()
. This will be used only if no chunks is provided in a layer’s stats dictionary. Use this to control the Dask chunk size. By default,None
, which uses"auto"
as the final chunk input.- row_widthsdict or path-like, optional
A dictionary specifying the row widths in the following format:
{"row_width_id": row_width_meters}
. Therow_width_id
is a value used to match each route with a particular ROW width (this is typically a voltage). The value should be found under therow_width_key
entry of theroute_fp
.Important
At least one of row_widths or row_width_ranges must be provided.
Warning
Routes without a valid voltage in the row_widths or row_width_ranges input will not be characterized.
If a path is provided, it should point to a JSON file containing the row width dictionary as specified above. By default,
None
.- row_width_rangeslist, optional
Optional list of dictionaries, where each dictionary contains the keys “min”, “max”, and “width”. This can be used to specify row widths based on ranges of values (e.g. voltage). For example, the following input:
[ {"min": 0, "max": 70, "width": 20}, {"min": 70, "max": 150, "width": 30}, {"min": 200, "max": 350, "width": 40}, {"min": 400, "max": 500, "width": 50}, ]
would map voltages in the range
0 <= volt < 70
to a row width of 20 meters,70 <= volt < 150
to a row width of 30 meters,200 <= volt < 350
to a row width of 40 meters, and so-on.Important
Any values in the row_widths dict will take precedence over these ranges. So if a voltage of 138 kV is mapped to a row width of 25 meters in the row_widths dict, that value will be used instead of the 30 meter width specified by the ranges above.
If a path is provided, it should point to a JSON file containing the list of dictionaries. By default,
None
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
script#
Execute the script
step from a config file.
This command runs one or more terminal commands/scripts as part of a pipeline step.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reVRt script [OPTIONS]
Options
- -c, --config_file <config_file>#
Required Path to the
script
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "keep_sh": false, "num_test_nodes": null }, "log_directory": "./logs", "log_level": "INFO", "cmd": "[REQUIRED]" }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null keep_sh: false num_test_nodes: null log_directory: ./logs log_level: INFO cmd: '[REQUIRED]'
log_directory = "./logs" log_level = "INFO" cmd = "[REQUIRED]" [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal" keep_sh = false
Parameters#
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- keep_sh:
(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default,
False
, which purges the submission scripts after each job is submitted.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- cmdstr | list
A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify
"cmd": "python my_script.py"
This will run the python file “my_script.py” (in the project directory) on a single node.
Important
It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.
To run multiple commands in parallel, supply them as a list:
"cmd": [ "python /path/to/my_script/py -a -out out_file.txt", "wget https://website.org/latest.zip" ]
This input will run two commands (a python script with the specified arguments and a
wget
command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.
status#
Display the status of a project FOLDER.
By default, the status of the current working directory is displayed.
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).”
reVRt status [OPTIONS] [FOLDER]
Options
- -ps, --pipe_steps <pipe_steps>#
Filter status for the given pipeline step(s). Multiple steps can be specified by repeating this option (e.g.
-ps step1 -ps step2 ...
) By default, the status of all pipeline steps is displayed.
- -s, --status <status>#
Filter jobs for the requested status(es). Allowed options (case-insensitive) include:
Failed:
failure
fail
failed
f
Running:
running
run
r
Submitted:
submitted
submit
sb
pending
pend
p
Success:
successful
success
s
Not submitted:
unsubmitted
unsubmit
u
not_submitted
ns
Multiple status keys can be specified by repeating this option (e.g.
-s status1 -s status2 ...
). By default, all status values are displayed.
- -i, --include <include>#
Extra status keys to include in the print output for each job. Multiple status keys can be specified by repeating this option (e.g.
-i key1 -i key2 ...
) By default, no extra keys are displayed.
- -r, --recursive#
Option to perform a recursive search of directories (starting with the input directory). The status of every nested directory is reported.
Arguments
- FOLDER#
Optional argument
template-configs#
Generate template config files for requested COMMANDS. If no COMMANDS are given, config files for the entire pipeline are generated.
The general structure for calling this CLI command is given below (add --help
to print help info to the terminal).
reVRt template-configs [COMMANDS]...
Options
- -t, --type <type>#
Configuration file type to generate. Allowed options (case-insensitive):
json5
json
toml
yaml
yml
.- Default:
'json'
Arguments
- COMMANDS#
Optional argument(s)