reV generation

Execute the generation step from a config file.

reV generation analysis runs SAM simulations by piping in renewable energy resource data (usually from the NSRDB or WTK), loading the SAM config, and then executing the PySAM compute module for a given technology. See the documentation for the reV SAM class (e.g. reV.SAM.generation.WindPower, reV.SAM.generation.PvWattsv8, reV.SAM.generation.Geothermal, etc.) for info on the allowed and/or required SAM config file inputs. If economic parameters are supplied in the SAM config, then you can bundle a “follow-on” econ calculation by just adding the desired econ output keys to the output_request. You can request reV to run the analysis for one or more “sites”, which correspond to the meta indices in the resource data (also commonly called the gid's).

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reV generation [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the generation configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "nodes": 1,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "keep_sh": false,
        "num_test_nodes": null,
        "max_workers": 1,
        "sites_per_worker": null,
        "memory_utilization_limit": 0.4,
        "timeout": 1800,
        "pool_size": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "technology": "[REQUIRED]",
    "project_points": "[REQUIRED]",
    "sam_files": "[REQUIRED]",
    "resource_file": "[REQUIRED]",
    "low_res_resource_file": null,
    "output_request": [
        "cf_mean"
    ],
    "site_data": null,
    "curtailment": null,
    "gid_map": null,
    "drop_leap": false,
    "scale_outputs": true,
    "write_mapped_gids": false,
    "bias_correct": null,
    "analysis_years": null
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  nodes: 1
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  keep_sh: false
  num_test_nodes: null
  max_workers: 1
  sites_per_worker: null
  memory_utilization_limit: 0.4
  timeout: 1800
  pool_size: null
log_directory: ./logs
log_level: INFO
technology: '[REQUIRED]'
project_points: '[REQUIRED]'
sam_files: '[REQUIRED]'
resource_file: '[REQUIRED]'
low_res_resource_file: null
output_request:
- cf_mean
site_data: null
curtailment: null
gid_map: null
drop_leap: false
scale_outputs: true
write_mapped_gids: false
bias_correct: null
analysis_years: null

log_directory = "./logs"
log_level = "INFO"
technology = "[REQUIRED]"
project_points = "[REQUIRED]"
sam_files = "[REQUIRED]"
resource_file = "[REQUIRED]"
output_request = [ "cf_mean",]
drop_leap = false
scale_outputs = true
write_mapped_gids = false

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"
nodes = 1
keep_sh = false
max_workers = 1
memory_utilization_limit = 0.4
timeout = 1800

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
nodes:: (int, optional) Number of nodes to split the project points across. Note that the total number of requested nodes for a job may be larger than this value if the command splits across other inputs. Default is 1.
max_workers:: (int, optional) Number of local workers to run on. If None, or if running from the command line and omitting this argument from your config file completely, this input is set to os.cpu_count(). Otherwise, the default is 1.
sites_per_worker:: (int, optional) Number of sites to run in series on a worker. None defaults to the resource file chunk size. By default, None.
memory_utilization_limit:: (float, optional) Memory utilization limit (fractional). Must be a value between 0 and 1. This input sets how many site results will be stored in-memory at any given time before flushing to disk. By default, 0.4.
timeout:: (int, optional) Number of seconds to wait for parallel run iteration to complete before returning zeros. By default, 1800 seconds.
pool_size:: (int, optional) Number of futures to submit to a single process pool for parallel futures. If None, the pool size is set to os.cpu_count() * 2. By default, None.
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
keep_sh:: (bool, optional) Option to keep the HPC submission script on disk. Only has effect if executing on HPC. By default, False, which purges the submission scripts after each job is submitted.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without submitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

technologystr

String indicating which SAM technology to analyze. Must be one of the keys of OPTIONS. The string should be lower-cased with spaces and underscores removed.

Input specifying which sites to process. A single integer representing the generation GID of a site may be specified to evaluate reV at a single location. A list or tuple of integers (or slice) representing the generation GIDs of multiple sites can be specified to evaluate reV at multiple specific locations. A string pointing to a project points CSV file may also be specified. Typically, the CSV contains the following columns:

gid: Integer specifying the generation GID of each site.

config: This is an optional column that contains a key from the sam_files input dictionary (see below) corresponding to the SAM configuration to use for each particular site. This value can also be None, "default", or left out completely if you specify only a single SAM configuration file as the sam_files input.

curtailment: This is an optional column that contains a key from the curtailment input dictionary (see below) corresponding to the curtailment to apply at that particular site. This value can also be None, "default", or left out completely if you specify only a single curtailment configuration file as the curtailment input.

capital_cost_multiplier: This is an optional multiplier input that, if included, will be used to regionally scale the capital_cost input in the SAM config. If you include this column in your CSV, you do not need to specify capital_cost, unless you would like that value to vary regionally and independently of the multiplier (i.e. the multiplier will still be applied on top of the capital_cost input).

The CSV file may also contain other site-specific inputs by including a column named after a config keyword (e.g. a column called wind_turbine_rotor_diameter may be included to specify a site-specific turbine diameter for each location). Columns that do not correspond to a config key may also be included, but they will be ignored. A DataFrame following the same guidelines as the CSV input (or a dictionary that can be used to initialize such a DataFrame) may be used for this input as well.

Note

By default, the generation GID of each site is assumed to match the resource GID to be evaluated for that site. However, unique generation GID’s can be mapped to non-unique resource GID’s via the gid_map input (see the documentation for gid_map for more details).

sam_filesdict | str

A dictionary mapping SAM input configuration ID(s) to SAM configuration(s). Keys are the SAM config ID(s) which correspond to the config column in the project points CSV. Values for each key are either a path to a corresponding SAM config file or a full dictionary of SAM config inputs. For example:

sam_files = {
    "default": "/path/to/default/sam.json",
    "onshore": "/path/to/onshore/sam_config.yaml",
    "offshore": {
        "sam_key_1": "sam_value_1",
        "sam_key_2": "sam_value_2",
        ...
    },
    ...
}

This input can also be a string pointing to a single SAM config file. In this case, the config column of the CSV points input should be set to None or left out completely. See the documentation for the reV SAM class (e.g. reV.SAM.generation.WindPower, reV.SAM.generation.PvWattsv8, reV.SAM.generation.Geothermal, etc.) for info on the allowed and/or required SAM config file inputs.

resource_filestr

Filepath to resource data. This input can be path to a single resource HDF5 file or a path including a wildcard input like /h5_dir/prefix*suffix (i.e. if your datasets like wind speed, wind direction, pressure, and so on are spread out over multiple files). In all cases, the resource data must be readable by rex.resource.Resource or rex.multi_file_resource.MultiFileResource. (i.e. the resource data conform to the rex data format). This means the data file(s) must contain a 1D time_index dataset indicating the UTC time of observation, a 1D meta dataset represented by a DataFrame with site-specific columns, and 2D resource datasets that match the dimensions of (time_index, meta). The time index must start at 00:00 of January 1st of the year under consideration, and its shape must be a multiple of 8760.

Note

If executing reV from the command line, this input string can contain brackets {} that will be filled in by the analysis_years input. If your datasets span multiple files (e.g. “wtk_wind_speed_2012.h5”, “wtk_pressure_2012.h5”, “wtk_wind_direction_2012.h5”), you may use a wildcard input along with brackets, like so: "wtk_*_{}.h5". Alternatively, this input can be a list of explicit files to process. In this case, the length of the list must match the length of the analysis_years input exactly, and the paths are assumed to align with the analysis_years (i.e. the first path corresponds to the first analysis year, the second path corresponds to the second analysis year, and so on). Wild cards are allowed, even if you list out the years explicitly (i.e. ["wtk_*_2012.h5", "wtk_*_2013.h5", ...])

Important

If you are using custom resource data (i.e. not NSRDB/WTK/Sup3rCC, etc.), ensure the following:

The data conforms to the rex data format.

The meta DataFrame is organized such that every row is a pixel and at least the columns latitude, longitude, timezone, and elevation are given for each location.

The time index and associated temporal data is in UTC.

The latitude is between -90 and 90 and longitude is between -180 and 180.

For solar data, ensure the DNI/DHI are not zero. You can calculate one of these these inputs from the other using the relationship

\[GHI = DNI * cos(SZA) + DHI\]

low_res_resource_filestr, optional

Optional low resolution resource file that will be dynamically mapped+interpolated to the nominal-resolution resource_file. This needs to be of the same format as resource_file - both files need to be handled by the same rex Resource handler (e.g. WindResource). All of the requirements from the resource_file apply to this input as well. If None, no dynamic mapping to higher resolutions is performed. By default, None.

output_requestlist | tuple, optional

List of output variables requested from SAM. Can be any of the parameters in the “Outputs” group of the PySAM module (e.g. PySAM.Windpower.Windpower.Outputs, PySAM.Pvwattsv8.Pvwattsv8.Outputs, PySAM.Geothermal.Geothermal.Outputs, etc.) being executed. This list can also include a select number of SAM config/resource parameters to include in the output: any key in any of the output attribute JSON files may be requested. If cf_mean is not included in this list, it will automatically be added. Time-series profiles requested via this input are output in UTC.

Note

If you are performing reV solar runs using PVWatts and would like reV to include AC capacity values in your aggregation/supply curves, then you must include the "dc_ac_ratio" time series as an output in output_request when running reV generation. The AC capacity outputs will automatically be added during the aggregation/supply curve step if the "dc_ac_ratio" dataset is detected in the generation file.

By default, ('cf_mean',).

site_datastr | pd.DataFrame, optional

Site-specific input data for SAM calculation. If this input is a string, it should be a path that points to a CSV file. Otherwise, this input should be a DataFrame with pre-extracted site data. Rows in this table should match the input sites via a gid column. The rest of the columns should match configuration input keys that will take site-specific values. Note that some or all site-specific inputs can be specified via the project_points input table instead. If None, no site-specific data is considered.

Note

This input is often used to provide site-based regional capital cost multipliers. reV does not ingest multipliers directly; instead, this file is expected to have a capital_cost column that gives the multiplier-adjusted capital cost value for each location. Therefore, you must re-create this input file every time you change your base capital cost assumption.

By default, None.

curtailmentdict | str, optional

Input for curtailment parameters, which can be one of:

Single string representing path to curtailment config file. In this case, the curtailment config is given the name “default” and applied everywhere (if the project points “curtailment” column is missing or all None) or only where the project points “curtailment” column contains a value of “default”

Dictionary mapping user-defined curtailment “names” to either A) strings (paths) or B) explicit namespaces of curtailment configurations (dicts). Mixing these two _is_ allowed.

The allowed key-value input pairs in the curtailment configuration are documented as properties of the reV.config.curtailment.Curtailment class. If None, no curtailment is modeled. You can select which curtailment gets applied to which site using the “curtailment” column key in the project points input. By default, None.

gid_mapdict | str, optional

Mapping of unique integer generation gids (keys) to single integer resource gids (values). This enables unique generation gids in the project points to map to non-unique resource gids, which can be useful when evaluating multiple resource datasets in reV (e.g., forecasted ECMWF resource data to complement historical WTK meteorology). This input can be a pre-extracted dictionary or a path to a JSON or CSV file. If this input points to a CSV file, the file must have the columns gid (which matches the project points) and gid_map (gids to extract from the resource input). If None, the GID values in the project points are assumed to match the resource GID values. By default, None.

drop_leapbool, optional

Drop leap day instead of final day of year when handling leap years. By default, False.

scale_outputsbool, optional

Flag to scale outputs in-place immediately upon Gen returning data. By default, True.

write_mapped_gidsbool, optional

Option to write mapped gids to output meta instead of resource gids. By default, False.

bias_correctstr | pd.DataFrame, optional

Optional DataFrame or CSV filepath to a wind or solar resource bias correction table. This has columns:

gid: GID of site (can be index name of dataframe)

method: function name from rex.bias_correction module

The gid field should match the true resource gid regardless of the optional gid_map input. Only windspeed or GHI + DNI + DHI are corrected, depending on the technology (wind for the former, PV or CSP for the latter). See the functions in the rex.bias_correction module for available inputs for method. Any additional kwargs required for the requested method can be input as additional columns in the bias_correct table e.g., for linear bias correction functions you can include scalar and adder inputs as columns in the bias_correct table on a site-by-site basis. If None, no corrections are applied. By default, None.

log_directorystr

Path to log output directory.

analysis_yearsint | list, optional

A single year or list of years to perform analysis for. These years will be used to fill in any brackets {} in the resource_file input. If None, the resource_file input is assumed to be the full path to the single resource file to be processed. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.