reV rep-profiles
Execute the rep-profiles
step from a config file.
reV
rep profiles compute representative generation profiles
for each supply curve point output by reV
supply curve
aggregation. Representative profiles can either be a spatial
aggregation of generation profiles or actual generation profiles
that most closely resemble an aggregated profile (selected based
on an error metric).
The general structure for calling this CLI command is given below
(add --help
to print help info to the terminal).
reV rep-profiles [OPTIONS]
Options
- -c, --config_file <config_file>
Required Path to the
rep-profiles
configuration file. Below is a sample template config{ "execution_control": { "option": "local", "allocation": "[REQUIRED IF ON HPC]", "walltime": "[REQUIRED IF ON HPC]", "qos": "normal", "memory": null, "queue": null, "feature": null, "conda_env": null, "module": null, "sh_script": null, "num_test_nodes": null, "max_workers": null }, "log_directory": "./logs", "log_level": "INFO", "gen_fpath": "[REQUIRED]", "rev_summary": "[REQUIRED]", "reg_cols": "[REQUIRED]", "cf_dset": "cf_profile", "rep_method": "meanoid", "err_method": "rmse", "weight": "gid_counts", "n_profiles": 1, "aggregate_profiles": false, "save_rev_summary": true, "scaled_precision": false, "analysis_years": null }
execution_control: option: local allocation: '[REQUIRED IF ON HPC]' walltime: '[REQUIRED IF ON HPC]' qos: normal memory: null queue: null feature: null conda_env: null module: null sh_script: null num_test_nodes: null max_workers: null log_directory: ./logs log_level: INFO gen_fpath: '[REQUIRED]' rev_summary: '[REQUIRED]' reg_cols: '[REQUIRED]' cf_dset: cf_profile rep_method: meanoid err_method: rmse weight: gid_counts n_profiles: 1 aggregate_profiles: false save_rev_summary: true scaled_precision: false analysis_years: null
log_directory = "./logs" log_level = "INFO" gen_fpath = "[REQUIRED]" rev_summary = "[REQUIRED]" reg_cols = "[REQUIRED]" cf_dset = "cf_profile" rep_method = "meanoid" err_method = "rmse" weight = "gid_counts" n_profiles = 1 aggregate_profiles = false save_rev_summary = true scaled_precision = false [execution_control] option = "local" allocation = "[REQUIRED IF ON HPC]" walltime = "[REQUIRED IF ON HPC]" qos = "normal"
Parameters
- execution_controldict
Dictionary containing execution control arguments. Allowed arguments are:
- option:
({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
- allocation:
(str) HPC project (allocation) handle.
- walltime:
(int) Node walltime request in hours.
- qos:
(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default,
"normal"
.- memory:
(int, optional) Node memory max limit (in GB). By default,
None
, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set toNone
) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).- max_workers:
(int, optional) Number of parallel rep profile workers.
1
will run serial, whileNone
will use all available. By default,None
.- queue:
(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default,
None
, which uses “test_queue”.- feature:
(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default,
None
, which does not specify any additional flags.- conda_env:
(str, optional) Name of conda environment to activate. By default,
None
, which does not load any environments.- module:
(str, optional) Module to load. By default,
None
, which does not load any modules.- sh_script:
(str, optional) Extra shell script to run before command call. By default,
None
, which does not run any scripts.- num_test_nodes:
(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default,
None
, which submits all node jobs.
Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.
- log_directorystr
Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default,
"./logs"
.- log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}
String representation of desired logger verbosity. Suitable options are
DEBUG
(most verbose),INFO
(moderately verbose),WARNING
(only log warnings and errors), andERROR
(only log errors). By default,"INFO"
.- gen_fpathstr
Filepath to
reV
generation output HDF5 file to extract cf_dset dataset from.Note
If executing
reV
from the command line, this path can contain brackets{}
that will be filled in by the analysis_years input. Alternatively, this input can be set to"PIPELINE"
, which will parse this input from one of these preceding pipeline steps:multi-year
,collect
,generation
, orsupply-curve-aggregation
. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the gen_fpath input will have to be specified manually.- rev_summarystr | pd.DataFrame
Aggregated
reV
supply curve summary file. Must include the following columns:res_gids
: string representation of python list containing the resource GID values corresponding to each supply curve point.gen_gids
: string representation of python list containing thereV
generation GID values corresponding to each supply curve point.weight column (name based on weight input) : string representation of python list containing the resource GID weights for each supply curve point.
Note
If executing
reV
from the command line, this input can be set to"PIPELINE"
, which will parse this input from one of these preceding pipeline steps:supply-curve-aggregation
orsupply-curve
. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the rev_summary input will have to be specified manually.- reg_colsstr | list
Label(s) for a categorical region column(s) to extract profiles for. For example,
"state"
will extract a rep profile for each unique entry in the"state"
column in rev_summary. To get a profile for each supply curve point, try setting reg_cols to a primary key such as"sc_gid"
.- cf_dsetstr, optional
Dataset name to pull generation profiles from. This dataset must be present in the gen_fpath HDF5 file. By default,
"cf_profile"
Note
If executing
reV
from the command line, this name can contain brackets{}
that will be filled in by the analysis_years input (e.g."cf_profile-{}"
).- rep_method{‘mean’, ‘meanoid’, ‘median’, ‘medianoid’}, optional
Method identifier for calculation of the representative profile. By default,
'meanoid'
- err_method{‘mbe’, ‘mae’, ‘rmse’}, optional
Method identifier for calculation of error from the representative profile. If this input is
None
, the representative meanoid / medianoid profile will be returned directly. By default,'rmse'
.- weightstr, optional
Column in rev_summary used to apply weights when computing mean profiles. The supply curve table data in the weight column should have weight values corresponding to the res_gids in the same row (i.e. string representation of python list containing weight values).
Important
You’ll often want to set this value to something other than
None
(typically"gid_counts"
if running on standardreV
outputs). Otherwise, the unique generation profiles within each supply curve point are weighted equally. For example, if you have a 64x64 supply curve point, and one generation profile takes up 4095 (99.98%) 90m cells while a second generation profile takes up only one 90m cell (0.02%), they will contribute equally to the meanoid profile unless these weights are specified.By default,
SupplyCurveField.GID_COUNTS
.- n_profilesint, optional
Number of representative profiles to save to the output file. By default,
1
.- aggregate_profilesbool, optional
Flag to calculate the aggregate (weighted meanoid) profile for each supply curve point. This behavior is in lieu of finding the single profile per region closest to the meanoid. If you set this flag to
True
, the rep_method, err_method, and n_profiles inputs will be forcibly set to the default values. By default,False
.- save_rev_summarybool, optional
Flag to save full
reV
supply curve table to rep profile output. By default,True
.- scaled_precisionbool, optional
Flag to scale cf_profiles by 1000 and save as uint16. By default,
False
.- analysis_yearsint | list, optional
A single year or list of years to perform analysis for. These years will be used to fill in any brackets
{}
in the cf_dset or gen_fpath inputs. IfNone
, the cf_dset and gen_fpath inputs are assumed to be the full dataset name and the full path to the single resource file to be processed, respectively. Note that only one of cf_dset or gen_fpath are allowed to contain brackets ({}
) to be filled in by the analysis years. By default,None
.
Note that you may remove any keys with a
null
value if you do not intend to update them yourself.