reV rep-profiles

Execute the rep-profiles step from a config file.

reV rep profiles compute representative generation profiles for each supply curve point output by reV supply curve aggregation. Representative profiles can either be a spatial aggregation of generation profiles or actual generation profiles that most closely resemble an aggregated profile (selected based on an error metric).

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reV rep-profiles [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the rep-profiles configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null,
        "max_workers": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "gen_fpath": "[REQUIRED]",
    "rev_summary": "[REQUIRED]",
    "reg_cols": "[REQUIRED]",
    "cf_dset": "cf_profile",
    "rep_method": "meanoid",
    "err_method": "rmse",
    "weight": "gid_counts",
    "n_profiles": 1,
    "aggregate_profiles": false,
    "save_rev_summary": true,
    "scaled_precision": false,
    "analysis_years": null
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
  max_workers: null
log_directory: ./logs
log_level: INFO
gen_fpath: '[REQUIRED]'
rev_summary: '[REQUIRED]'
reg_cols: '[REQUIRED]'
cf_dset: cf_profile
rep_method: meanoid
err_method: rmse
weight: gid_counts
n_profiles: 1
aggregate_profiles: false
save_rev_summary: true
scaled_precision: false
analysis_years: null

log_directory = "./logs"
log_level = "INFO"
gen_fpath = "[REQUIRED]"
rev_summary = "[REQUIRED]"
reg_cols = "[REQUIRED]"
cf_dset = "cf_profile"
rep_method = "meanoid"
err_method = "rmse"
weight = "gid_counts"
n_profiles = 1
aggregate_profiles = false
save_rev_summary = true
scaled_precision = false

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
max_workers:: (int, optional) Number of parallel rep profile workers. 1 will run serial, while None will use all available. By default, None.
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

gen_fpathstr

Filepath to reV generation output HDF5 file to extract cf_dset dataset from.

Note

If executing reV from the command line, this path can contain brackets {} that will be filled in by the analysis_years input. Alternatively, this input can be set to "PIPELINE", which will parse this input from one of these preceding pipeline steps: multi-year, collect, generation, or supply-curve-aggregation. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the gen_fpath input will have to be specified manually.

rev_summarystr | pd.DataFrame

Aggregated reV supply curve summary file. Must include the following columns:

res_gids : string representation of python list containing the resource GID values corresponding to each supply curve point.

gen_gids : string representation of python list containing the reV generation GID values corresponding to each supply curve point.

weight column (name based on weight input) : string representation of python list containing the resource GID weights for each supply curve point.

Note

If executing reV from the command line, this input can be set to "PIPELINE", which will parse this input from one of these preceding pipeline steps: supply-curve-aggregation or supply-curve. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the rev_summary input will have to be specified manually.

reg_colsstr | list

Label(s) for a categorical region column(s) to extract profiles for. For example, "state" will extract a rep profile for each unique entry in the "state" column in rev_summary. To get a profile for each supply curve point, try setting reg_cols to a primary key such as "sc_gid".

cf_dsetstr, optional

Dataset name to pull generation profiles from. This dataset must be present in the gen_fpath HDF5 file. By default, "cf_profile"

Note

If executing reV from the command line, this name can contain brackets {} that will be filled in by the analysis_years input (e.g. "cf_profile-{}").

rep_method{‘mean’, ‘meanoid’, ‘median’, ‘medianoid’}, optional

Method identifier for calculation of the representative profile. By default, 'meanoid'

err_method{‘mbe’, ‘mae’, ‘rmse’}, optional

Method identifier for calculation of error from the representative profile. If this input is None, the representative meanoid / medianoid profile will be returned directly. By default, 'rmse'.

weightstr, optional

Column in rev_summary used to apply weights when computing mean profiles. The supply curve table data in the weight column should have weight values corresponding to the res_gids in the same row (i.e. string representation of python list containing weight values).

Important

You’ll often want to set this value to something other than None (typically "gid_counts" if running on standard reV outputs). Otherwise, the unique generation profiles within each supply curve point are weighted equally. For example, if you have a 64x64 supply curve point, and one generation profile takes up 4095 (99.98%) 90m cells while a second generation profile takes up only one 90m cell (0.02%), they will contribute equally to the meanoid profile unless these weights are specified.

By default, SupplyCurveField.GID_COUNTS.

n_profilesint, optional

Number of representative profiles to save to the output file. By default, 1.

aggregate_profilesbool, optional

Flag to calculate the aggregate (weighted meanoid) profile for each supply curve point. This behavior is in lieu of finding the single profile per region closest to the meanoid. If you set this flag to True, the rep_method, err_method, and n_profiles inputs will be forcibly set to the default values. By default, False.

save_rev_summarybool, optional

Flag to save full reV supply curve table to rep profile output. By default, True.

scaled_precisionbool, optional

Flag to scale cf_profiles by 1000 and save as uint16. By default, False.

analysis_yearsint | list, optional

A single year or list of years to perform analysis for. These years will be used to fill in any brackets {} in the cf_dset or gen_fpath inputs. If None, the cf_dset and gen_fpath inputs are assumed to be the full dataset name and the full path to the single resource file to be processed, respectively. Note that only one of cf_dset or gen_fpath are allowed to contain brackets ({}) to be filled in by the analysis years. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.