reVRt#

reVRt Command Line Interface.

Typically, a good place to start is to set up a reVRt job with a pipeline config that points to several reVRt modules that you want to run in serial.

To begin, you can generate some template configuration files using:

$ reVRt template-configs

By default, this generates template JSON configuration files, though you can request JSON5, YAML, or TOML configuration files instead. You can run $ reVRt template-configs --help on the command line to see all available options for the template-configs command. Once the template configuration files have been generated, you can fill them out by referring to the module CLI documentation (if available) or the help pages of the module CLIs for more details on the config options for each CLI command:

$ reVRt --help

$ reVRt layers-to-file --help

$ reVRt layers-from-file --help

$ reVRt build-routing-layers --help

$ reVRt route-characterization --help

$ reVRt script --help

After appropriately filling our the configuration files for each module you want to run, you can call the reVRt pipeline CLI using:

$ reVRt pipeline -c config_pipeline.json

This command will run each pipeline step in sequence.

Note

You will need to re-submit the pipeline command above after each completed pipeline step.

To check the status of the pipeline, you can run:

$ reVRt status

This will print a report to the command line detailing the progress of the current pipeline. See $ reVRt status --help for all status command options.

If you need to parameterize the pipeline execution, you can use the batch command. For details on setting up a batch config file, see the documentation or run:

$ reVRt batch --help

on the command line. Once you set up a batch config file, you can execute it using:

$ reVRt batch -c config_batch.json

For more information on getting started, see the How to Run a Model Powered by GAPs guide.

The general structure of the reVRt CLI is given below.

reVRt [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose#

Flag to turn on debug logging. Default is not verbose.

--version#

Show the version and exit.

batch#

Execute an analysis pipeline over a parametric set of inputs.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt batch [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the batch configuration file. Below is a sample template config

{
    "logging": {
        "log_file": null,
        "log_level": "INFO"
    },
    "pipeline_config": "[REQUIRED]",
    "sets": [
        {
            "args": "[REQUIRED]",
            "files": "[REQUIRED]",
            "set_tag": "set1"
        },
        {
            "args": "[REQUIRED]",
            "files": "[REQUIRED]",
            "set_tag": "set2"
        }
    ]
}

Parameters#

loggingdict, optional

Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the batch command. Note that each pipeline job submitted via batch has it’s own logging key that will initialize pipeline step logging. Therefore, it’s only ever necessary to use this input if you want logging information about the batching portion of the execution.

pipeline_configstr

Path to the pipeline configuration defining the commands to run for every parametric set.

setslist of dicts

A list of dictionaries, where each dictionary defines a “set” of parametric runs. Each dictionary should have the following keys:

argsdict

A dictionary defining the arguments across all input configuration files to parameterize. Each argument to be parametrized should be a key in this dictionary, and the value should be a list of the parameter values to run for this argument (single-item lists are allowed and can be used to vary a parameter value across sets).

"args": {
    "input_constant_1": [
        18.02,
        19.04
    ],
    "path_to_a_file": [
        "/first/path.h5",
        "/second/path.h5",
        "/third/path.h5"
    ]
}

This example would run a total of six pipelines, one with each of the following arg combinations:

input_constant_1=18.20, path_to_a_file="/first/path.h5"
input_constant_1=18.20, path_to_a_file="/second/path.h5"
input_constant_1=18.20, path_to_a_file="/third/path.h5"
input_constant_1=19.04, path_to_a_file="/first/path.h5"
input_constant_1=19.04, path_to_a_file="/second/path.h5"
input_constant_1=19.04, path_to_a_file="/third/path.h5"

Remember that the keys in the args dictionary should be part of (at least) one of your other configuration files.

fileslist

A list of paths to the configuration files that contain the arguments to be updated for every parametric run. Arguments can be spread out over multiple files. For example:

"files": [
    "./config_run.yaml",
    "./config_analyze.json"
]
set_tagstr, optional

Optional string defining a set tag that will prefix each job tag for this set. This tag does not need to include an underscore, as that is provided during concatenation.

--dry#

Flag to do a dry run (make batch dirs and update files without running the pipeline).

--cancel#

Flag to cancel all jobs associated associated with the batch_jobs.csv file in the current batch config directory.

--delete#

Flag to delete all batch job sub directories associated with the batch_jobs.csv file in the current batch config directory.

--monitor-background#

Flag to monitor all batch pipelines continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline "log_file" to capture logs.

build-routing-layers#

Execute the build-routing-layers step from a config file.

You can re-run this function on an existing file to add new layers without overwriting existing layers or needing to change your original config.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt build-routing-layers [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the build-routing-layers configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null,
        "max_workers": 1,
        "memory_limit_per_worker": "auto"
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "routing_file": "[REQUIRED]",
    "template_file": null,
    "input_layer_dir": ".",
    "output_tiff_dir": ".",
    "masks_dir": ".",
    "layers": null,
    "dry_costs": null,
    "merge_friction_and_barriers": null,
    "create_kwargs": null
}

Parameters#

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

max_workers:

(int, optional) Number of parallel workers to use for file creation. If None or >1, processing is performed in parallel using Dask. By default, 1.

memory_limit_per_worker:

(str, float, int, or None, default=”auto”) Sets the memory limit per worker. This only applies if max_workers != 1. If None or 0, no limit is applied. If "auto", the total system memory is split evenly between the workers. If a float, that fraction of the system memory is used per worker. If a string giving a number of bytes (like “1GiB”), that amount is used per worker. If an int, that number of bytes is used per worker. By default, "auto"

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

keep_sh:

(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

routing_filepath-like

Path to GeoTIFF/Zarr file to store cost layers in. If the file does not exist, it will be created based on the template_file input.

template_filepath-like, optional

Path to template GeoTIFF (*.tif or *.tiff) or Zarr (*.zarr) file containing the profile and transform to be used for the layered costs file. If None, then the routing_file is assumed to exist on disk already. By default, None.

input_layer_dirpath-like, optional

Directory to search for input layers in, if not found in current directory. By default, '.'.

output_tiff_dirpath-like, optional

Directory where cost layers should be saved as GeoTIFF. By default, ".".

masks_dirpath-like, optional

Directory for storing/finding mask GeoTIFFs (wet, dry, landfall, wet+, dry+). By default, ".".

layerslist of LayerConfig, optional

Configuration for layers to be built and added to the file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default, None.

dry_costsDryCosts, optional

Configuration for dry cost layers to be built and added to the file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default, None.

merge_friction_and_barriersMergeFrictionBarriers, optional

Configuration for merging friction and barriers and adding to the layered costs file. At least one of layers, dry_costs, or merge_friction_and_barriers must be defined. By default, None

create_kwargsdict, optional

Additional keyword arguments to pass to LayeredFile.create_new() when creating a new layered file. Do not include template_file; it will be ignored. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

layers-from-file#

Execute the layers-from-file step from a config file.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt layers-from-file [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the layers-from-file configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "fp": "[REQUIRED]",
    "layers": null,
    "profile_kwargs": null,
    "out_layer_dir": null
}

Parameters#

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

keep_sh:

(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

fppath-like

Path to layered file on disk.

layerslist, optional

List of layer names to extract. Layer names must match layers in the fp, otherwise an error will be raised. If None, extracts all layers from the LayeredFile. By default, None.

profile_kwargsdict, optional

Additional keyword arguments to pass into writing each raster. The following attributes ar ignored (they are set using properties of the source LayeredFile):

  • nodata

  • transform

  • crs

  • count

  • width

  • height

By default, None.

out_layer_dirpath-like, optional

Path to output directory into which layers should be saved as GeoTIFFs. This directory will be created if it does not already exist. If not provided, will use the config directory as output. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

layers-to-file#

Execute the layers-to-file step from a config file.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt layers-to-file [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the layers-to-file configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "fp": "[REQUIRED]",
    "layers": "[REQUIRED]",
    "check_tiff": true,
    "descriptions": null,
    "overwrite": false,
    "nodata": null
}

Parameters#

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

keep_sh:

(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

fppath-like

Path to layered file on disk.

layerslist | dict

Dictionary mapping layer names to GeoTIFFs filepaths. Each GeoTIFF will be loaded into the LayeredFile user the layer name. If a list of GeoTIFFs filepaths is provided, the file name stems are used as the layer names.

check_tiffbool, optional

Flag to check tiff profile and coordinates against layered file profile and coordinates. By default, True.

overwritebool, default=False

Option to overwrite layer data if layer already exists in LayeredFile.

Important

When overwriting data, the encoding (and therefore things like data type, nodata value, etc) is not allowed to change. If you need to overwrite an existing layer with a new type of data, manually remove it from the file first.

By default, False.

nodataint | float, optional

Optional nodata value for the raster layer. This value will be added to the layer’s attributes meta dictionary under the “nodata” key.

Warning

rioxarray does not recognize the “nodata” value when reading from a zarr file (because zarr uses the _FillValue encoding internally). To get the correct “nodata” value back when reading a LayeredFile, you can either 1) read from da.rio.encoded_nodata or 2) check the layer’s attributes for the "nodata" key, and if present, use da.rio.write_nodata to write the nodata value so that da.rio.nodata gives the right value.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

pipeline#

Execute multiple steps in an analysis pipeline.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt pipeline [OPTIONS]

Options

-c, --config_file <config_file>#

Path to the pipeline configuration file. This argument can be left out, but one and only one file with “pipeline” in the name should exist in the directory and contain the config information. Below is a sample template config

{
    "pipeline": [
        {
            "layers-to-file": "./config_layers_to_file.json"
        },
        {
            "layers-from-file": "./config_layers_from_file.json"
        },
        {
            "build-routing-layers": "./config_build_routing_layers.json"
        },
        {
            "route-characterization": "./config_route_characterization.json"
        },
        {
            "script": "./config_script.json"
        }
    ],
    "logging": {
        "log_file": null,
        "log_level": "INFO"
    }
}

Parameters#

pipelinelist of dicts

A list of dictionaries, where each dictionary represents one step in the pipeline. Each dictionary should have one of two configurations:

  • A single key-value pair, where the key is the name of the CLI command to run, and the value is the path to a config file containing the configuration for that command

  • Exactly two key-value pairs, where one of the keys is "command", with a value that points to the name of a command to execute, while the second key is a _unique_ user-defined name of the pipeline step to execute, with a value that points to the path to a config file containing the configuration for the command specified by the other key. This configuration allows users to specify duplicate commands as part of their pipeline execution.

loggingdict, optional

Dictionary containing keyword-argument pairs to pass to init_logger. This initializes logging for the submission portion of the pipeline. Note, however, that each step (command) will also record the submission step log output to a common “project” log file, so it’s only ever necessary to use this input if you want a different (lower) level of verbosity than the log_level specified in the config for the step of the pipeline being executed.

--cancel#

Flag to cancel all jobs associated with a given pipeline.

--monitor#

Flag to monitor pipeline jobs continuously. Default is not to monitor (kick off jobs and exit).

-r, --recursive#

Flag to recursively submit pipelines, starting from the current directory and checking every sub-directory therein. The -c option will be completely ignored if you use this option. Instead, the code will check every sub-directory for exactly one file with the word pipeline in it. If found, that file is assumed to be the pipeline config and is used to kick off the pipeline. In any other case, the directory is skipped.

--background#

Flag to monitor pipeline jobs continuously in the background. Note that the stdout/stderr will not be captured, but you can set a pipeline ‘log_file’ to capture logs.

reset-status#

Reset the pipeline/job status (progress) for a given directory (defaults to ./). Multiple directories can be supplied to reset the status of each.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt reset-status [DIRECTORY]...

Options

-f, --force#

Force pipeline status reset even if jobs are queued/running

-a, --after-step <after_step>#

Reset pipeline starting after the given pipeline step. The status of this step will remain unaffected, but the status of steps following it will be reset completely.

Arguments

DIRECTORY#

Optional argument(s)

route-characterization#

Execute the route-characterization step from a config file.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt route-characterization [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the route-characterization configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null,
        "max_workers": 1,
        "memory_limit_per_worker": "auto"
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "layers": "[REQUIRED]",
    "default_route_fp": null,
    "default_copy_properties": null,
    "default_row_width_key": null,
    "default_chunks": null,
    "row_widths": null,
    "row_width_ranges": null
}

Parameters#

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

max_workers:

(int, optional) Number of parallel workers to use for computation. If None or >1, processing is performed in parallel (using Dask). If your paths span large areas, keep this value low (~10) to avoid running into memory errors. By default, 1.

memory_limit_per_worker:

(str, float, int, or None, default=”auto”) Sets the memory limit per worker. This only applies if max_workers != 1. If None or 0, no limit is applied. If "auto", the total system memory is split evenly between the workers. If a float, that fraction of the system memory is used per worker. If a string giving a number of bytes (like “1GiB”), that amount is used per worker. If an int, that number of bytes is used per worker. By default, "auto"

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

keep_sh:

(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

layersdict or list of dict

A single dictionary or a list of dictionaries specifying the statistics to compute. Each dictionary should contain the following keys:

  • geotiff_fp: (REQUIRED) Path to the raster file.

  • route_fp: (REQUIRED) Path to the vector file of routes. Must contain a “geometry” column and the row_width_key column (used to map to path ROW width).

  • stats: (OPTIONAL) Names of all statistics to compute. Statistics must be one of the members of Stat or FractionalStat, or must start with the percentile_ prefix and end with an int or float representing the percentile to compute (e.g. percentile_10.5). If only one statistic is to be computed, you can provide it directly as a string. Otherwise, provide a list of statistic names or a string with the names separated by a space. You can also provide the string "ALL" or "*" to specify that all statistics should be computed (i.e. all options from both Stat and FractionalStat). If no input, empty input, or None is provided, then only the base stats (“count”, “min”, “max”, “mean”) are configured. To summarize, all of the following are valid inputs:

    • stats: "*" or stats="ALL" or stats="All"

    • stats: "min"

    • stats: "min max"

    • stats: ["min"]

    • stats: ["min", "max", "percentile_10.5"]

  • nodata : (OPTIONAL) Value in the raster that represents nodata. This value will not show up in any statistics except for the nodata statistic itself, which computes the number of nodata values within the buffered routes. Note that this value is used in addition to any NODATA value in the raster’s metadata.

  • all_touched : (OPTIONAL) Boolean flag indicating whether to include every raster cell touched by a geometry (True), or only those having a center point within the polygon (False). By default, True.

  • category_map : (OPTIONAL) Dictionary mapping raster values to new names. If given, this mapping will be applied to the pixel count dictionary, so you can use it to map raster values to human-readable category names.

  • multiplier_scalar: (OPTIONAL) Optional multiplier value to apply to layer before computing statistics. This is useful if you want to scale the values in the raster before computing statistics. By default, 1.0.

  • prefix: (OPTIONAL) A string representing a prefix to add to each stat name. If you wish to have the prefix separated by a delimiter, you must include it in this string (e.g. prefix="test_").

  • copy_properties: (OPTIONAL) List of columns names to copy over from the vector file of routes.

  • row_width_key: (OPTIONAL) Name of column in vector file of routes used to map to the ROW widths. By default, "voltage".

  • chunks : (OPTIONAL) chunks keyword argument to pass down to rioxarray.open_rasterio(). Use this to control the Dask chunk size. By default, "auto".

default_route_fppath-like, optional

Default path to the vector file of routes. This will be used only if no route_fp is provided in a layer’s stats dictionary. Must contain a “geometry” column and the row_width_key column (used to map to path ROW width). By default, None.

default_copy_propertiesiterable of str, optional

Default iterable of columns names to copy over from the zone feature. This will be used only if no copy_properties is provided in a layer’s stats dictionary. By default, None.

default_row_width_keystr, optional

Default name of column in vector file of routes used to map to the ROW widths. This will be used only if no row_width_key is provided in a layer’s stats dictionary. By default, None.

default_chunkstuple or str, optional

Default chunks keyword argument to pass down to rioxarray.open_rasterio(). This will be used only if no chunks is provided in a layer’s stats dictionary. Use this to control the Dask chunk size. By default, None, which uses "auto" as the final chunk input.

row_widthsdict or path-like, optional

A dictionary specifying the row widths in the following format: {"row_width_id": row_width_meters}. The row_width_id is a value used to match each route with a particular ROW width (this is typically a voltage). The value should be found under the row_width_key entry of the route_fp.

Important

At least one of row_widths or row_width_ranges must be provided.

Warning

Routes without a valid voltage in the row_widths or row_width_ranges input will not be characterized.

If a path is provided, it should point to a JSON file containing the row width dictionary as specified above. By default, None.

row_width_rangeslist, optional

Optional list of dictionaries, where each dictionary contains the keys “min”, “max”, and “width”. This can be used to specify row widths based on ranges of values (e.g. voltage). For example, the following input:

[
    {"min": 0, "max": 70, "width": 20},
    {"min": 70, "max": 150, "width": 30},
    {"min": 200, "max": 350, "width": 40},
    {"min": 400, "max": 500, "width": 50},
]

would map voltages in the range 0 <= volt < 70 to a row width of 20 meters, 70 <= volt < 150 to a row width of 30 meters, 200 <= volt < 350 to a row width of 40 meters, and so-on.

Important

Any values in the row_widths dict will take precedence over these ranges. So if a voltage of 138 kV is mapped to a row width of 25 meters in the row_widths dict, that value will be used instead of the 30 meter width specified by the ranges above.

If a path is provided, it should point to a JSON file containing the list of dictionaries. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

script#

Execute the script step from a config file.

This command runs one or more terminal commands/scripts as part of a pipeline step.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt script [OPTIONS]

Options

-c, --config_file <config_file>#

Required Path to the script configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "cmd": "[REQUIRED]"
}

Parameters#

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

keep_sh:

(bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

cmdstr | list

A single command represented as a string or a list of command strings to execute on a node. If the input is a list, each command string in the list will be executed on a separate node. For example, to run a python script, simply specify

"cmd": "python my_script.py"

This will run the python file “my_script.py” (in the project directory) on a single node.

Important

It is inefficient to run scripts that only use a single processor on HPC nodes for extended periods of time. Always make sure your long-running scripts use Python’s multiprocessing library wherever possible to make the most use of shared HPC resources.

To run multiple commands in parallel, supply them as a list:

"cmd": [
    "python /path/to/my_script/py -a -out out_file.txt",
    "wget https://website.org/latest.zip"
]

This input will run two commands (a python script with the specified arguments and a wget command to download a file from the web), each on their own node and in parallel as part of this pipeline step. Note that commands are always executed from the project directory.

Note that you may remove any keys with a null value if you do not intend to update them yourself.

status#

Display the status of a project FOLDER.

By default, the status of the current working directory is displayed.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).”

reVRt status [OPTIONS] [FOLDER]

Options

-ps, --pipe_steps <pipe_steps>#

Filter status for the given pipeline step(s). Multiple steps can be specified by repeating this option (e.g. -ps step1 -ps step2 ...) By default, the status of all pipeline steps is displayed.

-s, --status <status>#

Filter jobs for the requested status(es). Allowed options (case-insensitive) include:

  • Failed: failure fail failed f

  • Running: running run r

  • Submitted: submitted submit sb pending pend p

  • Success: successful success s

  • Not submitted: unsubmitted unsubmit u not_submitted ns

Multiple status keys can be specified by repeating this option (e.g. -s status1 -s status2 ...). By default, all status values are displayed.

-i, --include <include>#

Extra status keys to include in the print output for each job. Multiple status keys can be specified by repeating this option (e.g. -i key1 -i key2 ...) By default, no extra keys are displayed.

-r, --recursive#

Option to perform a recursive search of directories (starting with the input directory). The status of every nested directory is reported.

Arguments

FOLDER#

Optional argument

template-configs#

Generate template config files for requested COMMANDS. If no COMMANDS are given, config files for the entire pipeline are generated.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reVRt template-configs [COMMANDS]...

Options

-t, --type <type>#

Configuration file type to generate. Allowed options (case-insensitive): json5 json toml yaml yml.

Default:

'json'

Arguments

COMMANDS#

Optional argument(s)