reV supply-curve-aggregation

Execute the supply-curve-aggregation step from a config file.

reV supply curve aggregation combines a high-resolution (e.g. 90m) exclusion dataset with a (typically) lower resolution (e.g. 2km) generation dataset by mapping all data onto the high- resolution grid and aggregating it by a large factor (e.g. 64 or 128). The result is coarsely-gridded data that summarizes capacity and generation potential as well as associated economics under a particular land access scenario. This module can also summarize extra data layers during the aggregation process, allowing for complementary land characterization analysis.

The general structure for calling this CLI command is given below (add --help to print help info to the terminal).

reV supply-curve-aggregation [OPTIONS]

Options

-c, --config_file <config_file>

Required Path to the supply-curve-aggregation configuration file. Below is a sample template config

{
    "execution_control": {
        "option": "local",
        "allocation": "[REQUIRED IF ON HPC]",
        "walltime": "[REQUIRED IF ON HPC]",
        "qos": "normal",
        "memory": null,
        "queue": null,
        "feature": null,
        "conda_env": null,
        "module": null,
        "sh_script": null,
        "num_test_nodes": null,
        "max_workers": null,
        "sites_per_worker": 100
    },
    "log_directory": "./logs",
    "log_level": "INFO",
    "excl_fpath": "[REQUIRED]",
    "tm_dset": "[REQUIRED]",
    "econ_fpath": null,
    "excl_dict": null,
    "area_filter_kernel": "queen",
    "min_area": null,
    "resolution": 64,
    "excl_area": null,
    "res_fpath": null,
    "gids": null,
    "pre_extract_inclusions": false,
    "res_class_dset": null,
    "res_class_bins": null,
    "cf_dset": "cf_mean-means",
    "lcoe_dset": "lcoe_fcr-means",
    "h5_dsets": null,
    "data_layers": null,
    "power_density": null,
    "friction_fpath": null,
    "friction_dset": null,
    "cap_cost_scale": null,
    "recalc_lcoe": true,
    "gen_fpath": null,
    "args": null
}

execution_control:
  option: local
  allocation: '[REQUIRED IF ON HPC]'
  walltime: '[REQUIRED IF ON HPC]'
  qos: normal
  memory: null
  queue: null
  feature: null
  conda_env: null
  module: null
  sh_script: null
  num_test_nodes: null
  max_workers: null
  sites_per_worker: 100
log_directory: ./logs
log_level: INFO
excl_fpath: '[REQUIRED]'
tm_dset: '[REQUIRED]'
econ_fpath: null
excl_dict: null
area_filter_kernel: queen
min_area: null
resolution: 64
excl_area: null
res_fpath: null
gids: null
pre_extract_inclusions: false
res_class_dset: null
res_class_bins: null
cf_dset: cf_mean-means
lcoe_dset: lcoe_fcr-means
h5_dsets: null
data_layers: null
power_density: null
friction_fpath: null
friction_dset: null
cap_cost_scale: null
recalc_lcoe: true
gen_fpath: null
args: null

log_directory = "./logs"
log_level = "INFO"
excl_fpath = "[REQUIRED]"
tm_dset = "[REQUIRED]"
area_filter_kernel = "queen"
resolution = 64
pre_extract_inclusions = false
cf_dset = "cf_mean-means"
lcoe_dset = "lcoe_fcr-means"
recalc_lcoe = true

[execution_control]
option = "local"
allocation = "[REQUIRED IF ON HPC]"
walltime = "[REQUIRED IF ON HPC]"
qos = "normal"
sites_per_worker = 100

Parameters

execution_controldict

Dictionary containing execution control arguments. Allowed arguments are:

option:: ({‘local’, ‘kestrel’, ‘eagle’, ‘awspc’, ‘slurm’, ‘peregrine’}) Hardware run option. Determines the type of job scheduler to use as well as the base AU cost. The “slurm” option is a catchall for HPC systems that use the SLURM scheduler and should only be used if desired hardware is not listed above. If “local”, no other HPC-specific keys in are required in execution_control (they are ignored if provided).
allocation:: (str) HPC project (allocation) handle.
walltime:: (int) Node walltime request in hours.
qos:: (str, optional) Quality-of-service specifier. For Kestrel users: This should be one of {‘standby’, ‘normal’, ‘high’}. Note that ‘high’ priority doubles the AU cost. By default, "normal".
memory:: (int, optional) Node memory max limit (in GB). By default, None, which uses the scheduler’s default memory limit. For Kestrel users: If you would like to use the full node memory, leave this argument unspecified (or set to None) if you are running on standard nodes. However, if you would like to use the bigmem nodes, you must specify the full upper limit of memory you would like for your job, otherwise you will be limited to the standard node memory size (250GB).
max_workers:: (int, optional) Number of cores to run summary on. None is all available CPUs. By default, None.
sites_per_worker:: (int, optional) Number of sc_points to summarize on each worker. By default, 100.
queue:: (str, optional; PBS ONLY) HPC queue to submit job to. Examples include: ‘debug’, ‘short’, ‘batch’, ‘batch-h’, ‘long’, etc. By default, None, which uses “test_queue”.
feature:: (str, optional) Additional flags for SLURM job (e.g. “-p debug”). By default, None, which does not specify any additional flags.
conda_env:: (str, optional) Name of conda environment to activate. By default, None, which does not load any environments.
module:: (str, optional) Module to load. By default, None, which does not load any modules.
sh_script:: (str, optional) Extra shell script to run before command call. By default, None, which does not run any scripts.
num_test_nodes:: (str, optional) Number of nodes to submit before terminating the submission process. This can be used to test a new submission configuration without sumbitting all nodes (i.e. only running a handful to ensure the inputs are specified correctly and the outputs look reasonable). By default, None, which submits all node jobs.

Only the option key is required for local execution. For execution on the HPC, the allocation and walltime keys are also required. All other options are populated with default values, as seen above.

log_directorystr

Path to directory where logs should be written. Path can be relative and does not have to exist on disk (it will be created if missing). By default, "./logs".

log_level{“DEBUG”, “INFO”, “WARNING”, “ERROR”}

String representation of desired logger verbosity. Suitable options are DEBUG (most verbose), INFO (moderately verbose), WARNING (only log warnings and errors), and ERROR (only log errors). By default, "INFO".

excl_fpathstr | list | tuple

Filepath to exclusions data HDF5 file. The exclusions HDF5 file should contain the layers specified in excl_dict and data_layers. These layers may also be spread out across multiple HDF5 files, in which case this input should be a list or tuple of filepaths pointing to the files containing the layers. Note that each data layer must be uniquely defined (i.e.only appear once and in a single input file).

tm_dsetstr

Dataset name in the excl_fpath file containing the techmap (exclusions-to-resource mapping data). This data layer links the supply curve GID’s to the generation GID’s that are used to evaluate performance metrics such as mean_cf.

Important

This dataset uniquely couples the (typically high-resolution) exclusion layers to the (typically lower-resolution) resource data. Therefore, a separate techmap must be used for every unique combination of resource and exclusion coordinates.

Note

If executing reV from the command line, you can specify a name that is not in the exclusions HDF5 file, and reV will calculate the techmap for you. Note however that computing the techmap and writing it to the exclusion HDF5 file is a blocking operation, so you may only run a single reV aggregation step at a time this way.

econ_fpathstr, optional

Filepath to HDF5 file with reV econ output results containing an lcoe_dset dataset. If None, lcoe_dset should be a dataset in the gen_fpath HDF5 file that aggregation is executed on.

Note

If executing reV from the command line, this input can be set to "PIPELINE" to parse the output from one of these preceding pipeline steps: multi-year, collect, or generation. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the econ_fpath input will have to be specified manually.

By default, None.

excl_dictdict | None

Dictionary of exclusion keyword arguments of the format {layer_dset_name: {kwarg: value}}, where layer_dset_name is a dataset in the exclusion h5 file and the kwarg: value pair is a keyword argument to the reV.supply_curve.exclusions.LayerMask class. For example:

excl_dict = {
    "typical_exclusion": {
        "exclude_values": 255,
    },
    "another_exclusion": {
        "exclude_values": [2, 3],
        "weight": 0.5
    },
    "exclusion_with_nodata": {
        "exclude_range": [10, 100],
        "exclude_nodata": True,
        "nodata_value": -1
    },
    "partial_setback": {
        "use_as_weights": True
    },
    "height_limit": {
        "exclude_range": [0, 200]
    },
    "slope": {
        "include_range": [0, 20]
    },
    "developable_land": {
        "force_include_values": 42
    },
    "more_developable_land": {
        "force_include_range": [5, 10]
    },
    "viewsheds": {
        "exclude_values": 1,
        "extent": {
            "layer": "federal_parks",
            "include_range": [1, 5]
        }
    }
    ...
}

Note that all the keys given in this dictionary should be datasets of the excl_fpath file. If None or empty dictionary, no exclusions are applied. By default, None.

area_filter_kernel{“queen”, “rook”}, optional

Contiguous area filter method to use on final exclusions mask. The filters are defined as:

# Queen:     # Rook:
[[1,1,1],    [[0,1,0],
 [1,1,1],     [1,1,1],
 [1,1,1]]     [0,1,0]]

These filters define how neighboring pixels are “connected”. Once pixels in the final exclusion layer are connected, the area of each resulting cluster is computed and compared against the min_area input. Any cluster with an area less than min_area is excluded from the final mask. This argument has no effect if min_area is None. By default, "queen".

min_areafloat, optional

Minimum area (in km²) required to keep an isolated cluster of (included) land within the resulting exclusions mask. Any clusters of land with areas less than this value will be marked as exclusions. See the documentation for area_filter_kernel for an explanation of how the area of each land cluster is computed. If None, no area filtering is performed. By default, None.

resolutionint, optional

Supply Curve resolution. This value defines how many pixels are in a single side of a supply curve cell. For example, a value of 64 would generate a supply curve where the side of each supply curve cell is 64x64 exclusion pixels. By default, 64.

excl_areafloat, optional

Area of a single exclusion mask pixel (in km²). If None, this value will be inferred from the profile transform attribute in excl_fpath. By default, None.

res_fpathstr, optional

Filepath to HDF5 resource file (e.g. WTK or NSRDB). This input is required if techmap dset is to be created or if the gen_fpath input to the summarize or run methods is None. By default, None.

gidslist, optional

List of supply curve point gids to get summary for. If you would like to obtain all available reV supply curve points to run, you can use the reV.supply_curve.extent.SupplyCurveExtent class like so:

import pandas as pd
from reV.supply_curve.extent import SupplyCurveExtent

excl_fpath = "..."
resolution = ...
tm_dset = "..."
with SupplyCurveExtent(excl_fpath, resolution) as sc:
    gids = sc.valid_sc_points(tm_dset).tolist()
...

If None, supply curve aggregation is computed for all gids in the supply curve extent. By default, None.

pre_extract_inclusionsbool, optional

Optional flag to pre-extract/compute the inclusion mask from the excl_dict input. It is typically faster to compute the inclusion mask on the fly with parallel workers. By default, False.

res_class_dsetstr, optional

Name of dataset in the reV generation HDF5 output file containing resource data. If None, no aggregated resource classification is performed (i.e. no mean_res output), and the res_class_bins is ignored. By default, None.

res_class_binslist, optional

Optional input to perform separate aggregations for various resource data ranges. If None, only a single aggregation per supply curve point is performed. Otherwise, this input should be a list of floats or ints representing the resource bin boundaries. One aggregation per resource value range is computed, and only pixels within the given resource range are aggregated. By default, None.

cf_dsetstr, optional

Dataset name from the reV generation HDF5 output file containing a 1D dataset of mean capacity factor values. This dataset will be mapped onto the high-resolution grid and used to compute the mean capacity factor for non-excluded area. By default, "cf_mean-means".

lcoe_dsetstr, optional

Dataset name from the reV generation HDF5 output file containing a 1D dataset of mean LCOE values. This dataset will be mapped onto the high-resolution grid and used to compute the mean LCOE for non-excluded area, but only if the LCOE is not re-computed during processing (see the recalc_lcoe input for more info). By default, "lcoe_fcr-means".

h5_dsetslist, optional

Optional list of additional datasets from the reV generation/econ HDF5 output file to aggregate. If None, no extra datasets are aggregated.

Warning

This input is meant for passing through 1D datasets. If you specify a 2D or higher-dimensional dataset, you may run into memory errors. If you wish to aggregate 2D datasets, see the rep-profiles module.

By default, None.

data_layersdict, optional

Dictionary of aggregation data layers of the format:

data_layers = {
    "output_layer_name": {
        "dset": "layer_name",
        "method": "mean",
        "fpath": "/path/to/data.h5"
    },
    "another_output_layer_name": {
        "dset": "input_layer_name",
        "method": "mode",
        # optional "fpath" key omitted
    },
    ...
}

The "output_layer_name" is the column name under which the aggregated data will appear in the output CSV file. The "output_layer_name" does not have to match the dset input value. The latter should match the layer name in the HDF5 from which the data to aggregate should be pulled. The method should be one of {"mode", "mean", "min", "max", "sum", "category"}, describing how the high-resolution data should be aggregated for each supply curve point. fpath is an optional key that can point to an HDF5 file containing the layer data. If left out, the data is assumed to exist in the file(s) specified by the excl_fpath input. If None, no data layer aggregation is performed. By default, None

power_densityfloat | str, optional

Power density value (in MW/km²) or filepath to variable power density CSV file containing the following columns:

gid : resource gid (typically wtk or nsrdb gid)

power_density : power density value (in MW/km²)

If None, a constant power density is inferred from the generation meta data technology. By default, None.

friction_fpathstr, optional

Filepath to friction surface data (cost based exclusions). Must be paired with the friction_dset input below. The friction data must be the same shape as the exclusions. Friction input creates a new output column "mean_lcoe_friction" which is the nominal LCOE multiplied by the friction data. If None, no friction data is aggregated. By default, None.

friction_dsetstr, optional

Dataset name in friction_fpath for the friction surface data. Must be paired with the friction_fpath above. If None, no friction data is aggregated. By default, None.

cap_cost_scalestr, optional

Optional LCOE scaling equation to implement “economies of scale”. Equations must be in python string format and must return a scalar value to multiply the capital cost by. Independent variables in the equation should match the names of the columns in the reV supply curve aggregation output table (see the documentation of SupplyCurveAggregation for details on available outputs). If None, no economies of scale are applied. By default, None.

recalc_lcoebool, optional

Flag to re-calculate the LCOE from the multi-year mean capacity factor and annual energy production data. This requires several datasets to be aggregated in the h5_dsets input:

system_capacity

fixed_charge_rate

capital_cost

fixed_operating_cost

variable_operating_cost

If any of these datasets are missing from the reV generation HDF5 output, or if recalc_lcoe is set to False, the mean LCOE will be computed from the data stored under the lcoe_dset instead. By default, True.

gen_fpathstr, optional

Filepath to HDF5 file with reV generation output results. If None, a simple aggregation without any generation, resource, or cost data is performed.

Note

If executing reV from the command line, this input can be set to "PIPELINE" to parse the output from one of these preceding pipeline steps: multi-year, collect, or econ. However, note that duplicate executions of any of these commands within the pipeline may invalidate this parsing, meaning the gen_fpath input will have to be specified manually.

By default, None.

argstuple | list, optional

List of columns to include in summary output table. None defaults to all available args defined in the SupplyCurveAggregation documentation. By default, None.

Note that you may remove any keys with a null value if you do not intend to update them yourself.