reV bespoke

Execute the bespoke step from a config file.

Much like generation, reV bespoke analysis runs SAM simulations by piping in renewable energy resource data (usually from the WTK), loading the SAM config, and then executing the PySAM.Windpower.Windpower compute module. However, unlike reV generation, bespoke analysis is performed on the supply-curve grid resolution, and the plant layout is optimized for every supply-curve point based on an optimization objective specified by the user. See the NREL publication on the bespoke methodology for more information.

See the documentation for the reV SAM class (e.g. reV.SAM.generation.WindPower, reV.SAM.generation.PvWattsv8, reV.SAM.generation.Geothermal, etc.) for info on the allowed and/or required SAM config file inputs.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reV bespoke [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the bespoke configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "nodes": 1,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null,
        "max_workers": null
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "excl_fpath": "[REQUIRED]",
    "res_fpath": "[REQUIRED]",
    "tm_dset": "[REQUIRED]",
    "objective_function": "[REQUIRED]",
    "capital_cost_function": "[REQUIRED]",
    "fixed_operating_cost_function": "[REQUIRED]",
    "variable_operating_cost_function": "[REQUIRED]",
    "balance_of_system_cost_function": "[REQUIRED]",
    "project_points": "[REQUIRED]",
    "sam_files": "[REQUIRED]",
    "min_spacing": "5x",
    "wake_loss_multiplier": 1,
    "ga_kwargs": null,
    "output_request": [
        "system_capacity",
        "cf_mean"
    ],
    "ws_bins": [
        0.0,
        20.0,
        5.0
    ],
    "wd_bins": [
        0.0,
        360.0,
        45.0
    ],
    "excl_dict": null,
    "area_filter_kernel": "queen",
    "min_area": null,
    "resolution": 64,
    "excl_area": null,
    "data_layers": null,
    "pre_extract_inclusions": false,
    "eos_mult_baseline_cap_mw": 200,
    "prior_run": null,
    "gid_map": null,
    "bias_correct": null,
    "pre_load_data": false
}

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:

({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).

allocation:

(str) HPC project (allocation) handle.

walltime:

(int) Node walltime request in hours.

qos:

(str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".

memory:

(int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).

nodes:

(int, optional) Number of nodes to split the project points across. Note that the total number of requested nodes for a job may be larger than this value if the command splits across other inputs. Default is 1.

max_workers:

(int, optional) Number of local workers to run on. If None, uses all available cores (typically 36). By default, None.

queue:

(str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.

feature:

(str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.

conda_env:

(str, optional) Name of conda environment to activate. By default, None, which does not load any environments.

module:

(str, optional) Module to load. By default, None, which does not load any modules.

sh_script:

(str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.

num_test_nodes:

(str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

excl_fpathstr | list | tuple

Filepath to exclusions data HDF5 file. The exclusions HDF5 file should contain the layers specified in excl_dict and data_layers. These layers may also be spread out across multiple HDF5 files, in which case this input should be a list or tuple of filepaths pointing to the files containing the layers. Note that each data layer must be uniquely defined (i.e.only appear once and in a single input file).

res_fpathstr

Unix shell style path to wind resource HDF5 file in NREL WTK format. Can also be a path including a wildcard input like /h5_dir/prefix*suffix to run bespoke on multiple years of resource data. Can also be an explicit list of resource HDF5 file paths, which themselves can contain wildcards. If multiple files are specified in this way, they must have the same coordinates but can have different time indices (i.e. different years). This input must be readable by rex.multi_year_resource.MultiYearWindResource (i.e. the resource data conform to the rex data format). This means the data file(s) must contain a 1D time_index dataset indicating the UTC time of observation, a 1D meta dataset represented by a DataFrame with site-specific columns, and 2D resource datasets that match the dimensions of (time_index, meta). The time index must start at 00:00 of January 1st of the year under consideration, and its shape must be a multiple of 8760.

tm_dsetstr

Dataset name in the excl_fpath file containing the techmap (exclusions-to-resource mapping data). This data layer links the supply curve GID’s to the generation GID’s that are used to evaluate the performance metrics of each wind plant. By default, the generation GID’s are assumed to match the resource GID’s, but this mapping can be customized via the gid_map input (see the documentation for gid_map for more details).

Important

This dataset uniquely couples the (typically high-resolution) exclusion layers to the (typically lower-resolution) resource data. Therefore, a separate techmap must be used for every unique combination of resource and exclusion coordinates.

objective_functionstr

The objective function of the optimization written out as a string. This expression should compute the objective to be minimized during layout optimization. Variables available for computation are:

  • n_turbines: the number of turbines

  • system_capacity: wind plant capacity

  • aep: annual energy production

  • avg_sl_dist_to_center_m: Average straight-line distance to the supply curve point center from all turbine locations (in m). Useful for computing plant BOS costs.

  • avg_sl_dist_to_medoid_m: Average straight-line distance to the medoid of all turbine locations (in m). Useful for computing plant BOS costs.

  • nn_conn_dist_m: Total BOS connection distance using nearest-neighbor connections. This variable is only available for the balance_of_system_cost_function equation.

  • fixed_charge_rate: user input fixed_charge_rate if included as part of the sam system config.

  • capital_cost: plant capital cost as evaluated by capital_cost_function

  • fixed_operating_cost: plant fixed annual operating cost as evaluated by fixed_operating_cost_function

  • variable_operating_cost: plant variable annual operating cost as evaluated by variable_operating_cost_function

  • balance_of_system_cost: plant balance of system cost as evaluated by balance_of_system_cost_function

  • self.wind_plant: the SAM wind plant object, through which all SAM variables can be accessed

capital_cost_functionstr

The plant capital cost function written out as a string. This expression must return the total plant capital cost in $. This expression has access to the same variables as the objective_function argument above.

fixed_operating_cost_functionstr

The plant annual fixed operating cost function written out as a string. This expression must return the fixed operating cost in $/year. This expression has access to the same variables as the objective_function argument above.

variable_operating_cost_functionstr

The plant annual variable operating cost function written out as a string. This expression must return the variable operating cost in $/kWh. This expression has access to the same variables as the objective_function argument above. You can set this to “0” to effectively ignore variable operating costs.

balance_of_system_cost_functionstr

The plant balance-of-system cost function as a string, must return the variable operating cost in $. Has access to the same variables as the objective_function. You can set this to “0” to effectively ignore balance-of-system costs.

project_pointsint | list | tuple | str | dict | pd.DataFrame | slice

Input specifying which sites to process. A single integer representing the supply curve GID of a site may be specified to evaluate reV at a supply curve point. A list or tuple of integers (or slice) representing the supply curve GIDs of multiple sites can be specified to evaluate reV at multiple specific locations. A string pointing to a project points CSV file may also be specified. Typically, the CSV contains the following columns:

  • gid: Integer specifying the supply curve GID of each site.

  • config: Key in the sam_files input dictionary (see below) corresponding to the SAM configuration to use for each particular site. This value can also be None (or left out completely) if you specify only a single SAM configuration file as the sam_files input.

The CSV file may also contain site-specific inputs by including a column named after a config keyword (e.g. a column called capital_cost may be included to specify a site-specific capital cost value for each location). Columns that do not correspond to a config key may also be included, but they will be ignored. The CSV file input can also have these extra, optional columns:

  • capital_cost_multiplier

  • fixed_operating_cost_multiplier

  • variable_operating_cost_multiplier

  • balance_of_system_cost_multiplier

These particular inputs are treated as multipliers to be applied to the respective cost curves (capital_cost_function, fixed_operating_cost_function, variable_operating_cost_function, and balance_of_system_cost_function) both during and after the optimization. A DataFrame following the same guidelines as the CSV input (or a dictionary that can be used to initialize such a DataFrame) may be used for this input as well. If you would like to obtain all available reV supply curve points to run, you can use the reV.supply_curve.extent.SupplyCurveExtent class like so:

import pandas as pd
from reV.supply_curve.extent import SupplyCurveExtent

excl_fpath = "..."
resolution = ...
with SupplyCurveExtent(excl_fpath, resolution) as sc:
    points = sc.valid_sc_points(tm_dset).tolist()
    points = pd.DataFrame({"gid": points})
    points["config"] = "default"  # or a list of config choices

# Use the points directly or save them to csv for CLI usage
points.to_csv("project_points.csv", index=False)
sam_filesdict | str

A dictionary mapping SAM input configuration ID(s) to SAM configuration(s). Keys are the SAM config ID(s) which correspond to the config column in the project points CSV. Values for each key are either a path to a corresponding SAM config file or a full dictionary of SAM config inputs. For example:

sam_files = {
    "default": "/path/to/default/sam.json",
    "onshore": "/path/to/onshore/sam_config.yaml",
    "offshore": {
        "sam_key_1": "sam_value_1",
        "sam_key_2": "sam_value_2",
        ...
    },
    ...
}

This input can also be a string pointing to a single SAM config file. In this case, the config column of the CSV points input should be set to None or left out completely. See the documentation for the reV SAM class (e.g. reV.SAM.generation.WindPower, reV.SAM.generation.PvWattsv8, reV.SAM.generation.Geothermal, etc.) for info on the allowed and/or required SAM config file inputs.

min_spacingfloat | int | str, optional

Minimum spacing between turbines (in meters). This input can also be a string like “5x”, which is interpreted as 5 times the turbine rotor diameter. By default, "5x".

wake_loss_multiplierfloat, optional

A multiplier used to scale the annual energy lost due to wake losses.

Warning

This multiplier will ONLY be applied during the optimization process and will NOT come through in output values such as the hourly profiles, aep, any of the cost functions, or even the output objective.

By default, 1.

ga_kwargsdict, optional

Dictionary of keyword arguments to pass to GA initialization. If None, default initialization values are used. See GeneticAlgorithm for a description of the allowed keyword arguments. By default, None.

output_requestlist | tuple, optional

Outputs requested from the SAM windpower simulation after the bespoke plant layout optimization. Can be any of the parameters in the “Outputs” group of the PySAM module PySAM.Windpower.Windpower.Outputs, PySAM module. This list can also include a select number of SAM config/resource parameters to include in the output: any key in any of the output attribute JSON files may be requested. Time-series profiles requested via this input are output in UTC. This input can also be used to request resource means like "ws_mean", "windspeed_mean", "temperature_mean", and "pressure_mean". By default, ('system_capacity', 'cf_mean').

ws_binstuple, optional

A 3-entry tuple with (start, stop, step) for the windspeed binning of the wind joint probability distribution. The stop value is inclusive, so ws_bins=(0, 20, 5) would result in four bins with bin edges (0, 5, 10, 15, 20). By default, (0.0, 20.0, 5.0).

wd_binstuple, optional

A 3-entry tuple with (start, stop, step) for the wind direction binning of the wind joint probability distribution. The stop value is inclusive, so wd_bins=(0, 360, 90) would result in four bins with bin edges (0, 90, 180, 270, 360). By default, (0.0, 360.0, 45.0).

excl_dictdict, optional

Dictionary of exclusion keyword arguments of the format {layer_dset_name: {kwarg: value}}, where layer_dset_name is a dataset in the exclusion h5 file and the kwarg: value pair is a keyword argument to the reV.supply_curve.exclusions.LayerMask class. For example:

excl_dict = {
    "typical_exclusion": {
        "exclude_values": 255,
    },
    "another_exclusion": {
        "exclude_values": [2, 3],
        "weight": 0.5
    },
    "exclusion_with_nodata": {
        "exclude_range": [10, 100],
        "exclude_nodata": True,
        "nodata_value": -1
    },
    "partial_setback": {
        "use_as_weights": True
    },
    "height_limit": {
        "exclude_range": [0, 200]
    },
    "slope": {
        "include_range": [0, 20]
    },
    "developable_land": {
        "force_include_values": 42
    },
    "more_developable_land": {
        "force_include_range": [5, 10]
    },
    ...
}

Note that all the keys given in this dictionary should be datasets of the excl_fpath file. If None or empty dictionary, no exclusions are applied. By default, None.

area_filter_kernel{“queen”, “rook”}, optional

Contiguous area filter method to use on final exclusions mask. The filters are defined as:

# Queen:     # Rook:
[[1,1,1],    [[0,1,0],
 [1,1,1],     [1,1,1],
 [1,1,1]]     [0,1,0]]

These filters define how neighboring pixels are “connected”. Once pixels in the final exclusion layer are connected, the area of each resulting cluster is computed and compared against the min_area input. Any cluster with an area less than min_area is excluded from the final mask. This argument has no effect if min_area is None. By default, "queen".

min_areafloat, optional

Minimum area (in km2) required to keep an isolated cluster of (included) land within the resulting exclusions mask. Any clusters of land with areas less than this value will be marked as exclusions. See the documentation for area_filter_kernel for an explanation of how the area of each land cluster is computed. If None, no area filtering is performed. By default, None.

resolutionint, optional

Supply Curve resolution. This value defines how many pixels are in a single side of a supply curve cell. For example, a value of 64 would generate a supply curve where the side of each supply curve cell is 64x64 exclusion pixels. By default, 64.

excl_areafloat, optional

Area of a single exclusion mask pixel (in km2). If None, this value will be inferred from the profile transform attribute in excl_fpath. By default, None.

data_layersdict, optional

Dictionary of aggregation data layers of the format:

data_layers = {
    "output_layer_name": {
        "dset": "layer_name",
        "method": "mean",
        "fpath": "/path/to/data.h5"
    },
    "another_output_layer_name": {
        "dset": "input_layer_name",
        "method": "mode",
        # optional "fpath" key omitted
    },
    ...
}

The "output_layer_name" is the column name under which the aggregated data will appear in the meta DataFrame of the output file. The "output_layer_name" does not have to match the dset input value. The latter should match the layer name in the HDF5 from which the data to aggregate should be pulled. The method should be one of {"mode", "mean", "min", "max", "sum", "category"}, describing how the high-resolution data should be aggregated for each supply curve point. fpath is an optional key that can point to an HDF5 file containing the layer data. If left out, the data is assumed to exist in the file(s) specified by the excl_fpath input. If None, no data layer aggregation is performed. By default, None.

pre_extract_inclusionsbool, optional

Optional flag to pre-extract/compute the inclusion mask from the excl_dict input. It is typically faster to compute the inclusion mask on the fly with parallel workers. By default, False.

eos_mult_baseline_cap_mwint | float, optional

Baseline plant capacity (MW) used to calculate economies of scale (EOS) multiplier from the capital_cost_function. EOS multiplier is calculated as the $-per-kW of the wind plant divided by the $-per-kW of a plant with this baseline capacity. By default, 200 (MW), which aligns the baseline with ATB assumptions. See here: https://tinyurl.com/y85hnu6h.

prior_runstr, optional

Optional filepath to a bespoke output HDF5 file belonging to a prior run. If specified, this module will only run the timeseries power generation step and assume that all of the wind plant layouts are fixed from the prior run. The meta data of this file must contain the following columns (automatically satisfied if the HDF5 file was generated by reV bespoke):

  • capacity : Capacity of the plant, in MW.

  • turbine_x_coords: A string representation of a python list containing the X coordinates (in m; origin of cell at bottom left) of the turbines within the plant (supply curve cell).

  • turbine_y_coords : A string representation of a python list containing the Y coordinates (in m; origin of cell at bottom left) of the turbines within the plant (supply curve cell).

If None, no previous run data is considered. By default, None

gid_mapstr | dict, optional

Mapping of unique integer generation gids (keys) to single integer resource gids (values). This enables unique generation gids in the project points to map to non-unique resource gids, which can be useful when evaluating multiple resource datasets in reV (e.g., forecasted ECMWF resource data to complement historical WTK meteorology). This input can be a pre-extracted dictionary or a path to a JSON or CSV file. If this input points to a CSV file, the file must have the columns gid (which matches the project points) and gid_map (gids to extract from the resource input). If None, the GID values in the project points are assumed to match the resource GID values. By default, None.

bias_correctstr | pd.DataFrame, optional

Optional DataFrame or CSV filepath to a wind or solar resource bias correction table. This has columns:

  • gid: GID of site (can be index name of dataframe)

  • method: function name from rex.bias_correction module

The gid field should match the true resource gid regardless of the optional gid_map input. Only windspeed or GHI + DNI + DHI are corrected, depending on the technology (wind for the former, PV or CSP for the latter). See the functions in the rex.bias_correction module for available inputs for method. Any additional kwargs required for the requested method can be input as additional columns in the bias_correct table e.g., for linear bias correction functions you can include scalar and adder inputs as columns in the bias_correct table on a site-by-site basis. If None, no corrections are applied. By default, None.

pre_load_databool, optional

Option to pre-load resource data. This step can be time-consuming up front, but it drastically reduces the number of parallel reads to the res_fpath HDF5 file(s), and can have a significant overall speedup on systems with slow parallel I/O capabilities. Pre-loaded data can use a significant amount of RAM, so be sure to split execution across many nodes (e.g. 100 nodes, 36 workers each for CONUS) or request large amounts of memory for a smaller number of nodes. By default, False.

log_directorystr

Path to log output directory.

Note that you may remove any keys with a null value if you do not intend to update them yourself.