Supply Curve + Temporal Profiles#

Prerequisites:

  • Required: None

  • Recommended: None


Introduction#

One major advantage of reV supply curve outputs is that they come with temporal capacity factor profiles that can be used for downstream analysis. In this guide, we will demonstrate how to take a supply curve CSV and extract the corresponding temporal profiles form the accompanying HDF5 file.

We will demonstrate how to do this using two methods: Using the custom rex library (recommended) or the h5py library.

Downloading the data#

Before we dive into the code, we first have to download some supply curve and temporal data from Siting Lab. In particular, we will be using data from GDS [GDS23a] and GDS [GDS23b].

The download step can take some time, since the temporal HDF5 files are quite large. If you have already downloaded the data, you can skip this step (just make sure path variables below are set correctly). We’ll start by defining the local file path destinations:

# Modify these to point to your local files if you have already downloaded them
PV_SUPPLY_CURVE_FILEPATH = "reference_access_2030_moderate_supply-curve.csv"
WIND_SUPPLY_CURVE_FILEPATH = "reference_access_2030_moderate_115hh_170rd_supply-curve.csv"
PV_CF_FILEPATH = "reference_access_2030_moderate_rep-profiles_2012.h5"
WIND_CF_FILEPATH = "reference_access_2030_moderate_115hh_170rd_rep-profiles_2012.h5"

Next, we can write a short download function using urllib, which is part of the Python standard library.

Hide code cell content
# Set the web links for each file (you shouldn't have to modify this at all)
URLS = {
    PV_SUPPLY_CURVE_FILEPATH: "https://data.openei.org/files/6001/reference_access_2030_moderate_supply-curve.csv",
    WIND_SUPPLY_CURVE_FILEPATH : "https://data.openei.org/files/6119/reference_access_2030_moderate_115hh_170rd_supply-curve%20(1).csv",
    PV_CF_FILEPATH: "https://data.openei.org/files/6001/reference_access_2030_moderate_rep-profiles_2012.h5",
    WIND_CF_FILEPATH: "https://data.openei.org/files/6119/reference_access_2030_moderate_115hh_170rd_rep-profiles_2012%20(1).h5",
}

def download_sc_file(local_filepath):
    if local_filepath not in URLS:
        print(f"Unknown destination file: {str(local_filepath)!r}")
        return

    if Path(local_filepath).exists():
        print(f"{str(local_filepath)!r} already exists!")
        return

    urllib.request.urlretrieve(URLS[local_filepath], local_filepath)
    print(f"Downloaded {str(local_filepath)!r}!")

Now, we can download the files by calling this function on each of our target file destinations:

files_to_download = [
    PV_SUPPLY_CURVE_FILEPATH,
    WIND_SUPPLY_CURVE_FILEPATH,
    PV_CF_FILEPATH,
    WIND_CF_FILEPATH,
]

with ThreadPool(len(files_to_download)) as p:
    p.map(download_sc_file, files_to_download)

Using h5py#

You can also use the standard h5py library to read the temporal profiles. The access pattern remains largely the same, except that you have to perform more processing as you load in the data.

Let’s begin the example once agin by reading in the supply curve CSV using pandas (this time for land-based wind):

reference_wind_supply_curve = pd.read_csv(WIND_SUPPLY_CURVE_FILEPATH)
reference_wind_supply_curve.head()
area_sq_km capacity_mw cnty_fips country county dist_km elevation latitude lcot longitude ... reinforcement_dist_km state timezone total_lcoe trans_cap_cost_per_mw reg_mult turbine_capacity sc_point_gid hub_height windspeed_m_per_s
0 0.218700 54 53073 United States Whatcom 18.433006 843.500000 49.008 22.455529 -122.048 ... 225.081896 Washington -8 56.537274 3.204940e+05 1.048863 6000 800 115 6.49
1 16.823700 198 53009 United States Clallam 103.197032 172.680000 48.347 23.066024 -124.653 ... 312.365922 Washington -8 48.777093 6.680578e+05 1.056176 6000 1543 115 7.67
2 9.744219 60 53009 United States Clallam 45.533792 179.500000 48.376 36.894848 -124.495 ... 380.864701 Washington -8 64.306488 1.171806e+06 1.055045 6000 1544 115 7.35
3 3.507300 42 53009 United States Clallam 49.799519 124.166664 48.224 52.876482 -124.761 ... 380.864701 Washington -8 84.894720 1.770461e+06 1.056924 6000 1922 115 6.88
4 39.058200 276 53009 United States Clallam 144.164214 189.341460 48.253 27.324301 -124.604 ... 268.619792 Washington -8 56.309941 8.713656e+05 1.055727 6000 1923 115 7.03

5 rows × 24 columns

Extracting info by SC point GID#

As before, extraction is fairly straightforward if we know the sc_point_gid we are interested in:

reference_wind_supply_curve[
    reference_wind_supply_curve["sc_point_gid"] == 1543
]
area_sq_km capacity_mw cnty_fips country county dist_km elevation latitude lcot longitude ... reinforcement_dist_km state timezone total_lcoe trans_cap_cost_per_mw reg_mult turbine_capacity sc_point_gid hub_height windspeed_m_per_s
1 16.8237 198 53009 United States Clallam 103.197032 172.68 48.347 23.066024 -124.653 ... 312.365922 Washington -8 48.777093 668057.7913 1.056176 6000 1543 115 7.67

1 rows × 24 columns

When we go to read the HDF5 file, we first have to look through all the datasets and shapes contained within:

import h5py

with h5py.File(WIND_CF_FILEPATH, "r") as fh:
    for dset in fh.keys():
        print(f"{dset}: {fh[dset].shape}")
meta: (49875,)
rep_profiles_0: (8760, 49875)
time_index: (8760,)

As before, the profiles we are interested in are stored under the dataset name “rep_profiles_0”. Let’s extract the correct one now, along with the time index. Note the extra steps we have to take to scale the profiles appropriately and get the time index converted into a pandas.DateTimeIndex. The custom rex library perfoms all of this processing under the hood and therefore drastically reduces the complexity of I/O.

sc_point_gid = 2334

with h5py.File(WIND_CF_FILEPATH, "r") as fh:
    meta = pd.DataFrame(fh["meta"][:])
    ind = meta[meta["sc_point_gid"] == sc_point_gid].index
    profile_scale_factor = fh["rep_profiles_0"].attrs["scale_factor"]
    profile = fh["rep_profiles_0"][:, ind] / profile_scale_factor
    ti = pd.to_datetime(fh["time_index"][:].astype(str))

profile.shape
fig = go.Figure()
fig.add_trace(go.Scatter(x=ti, y=profile[:, 0], line={"width": 4}))
fig.update_layout(
    title=f"CF Profile for SC Point GID: {sc_point_gid:,}",
    xaxis_title="Time index (hours)",
    yaxis_title="Capacity factor (AC)",
    template="none",
)
fig.update_xaxes(type="date", range=["2012-01-01", "2012-01-06"])
fig.show()

Reading in multiple profiles at once can be done as well:

sc_point_gids_of_interest = [2334, 34357, 23399, 97844]
points = reference_wind_supply_curve[
    reference_wind_supply_curve["sc_point_gid"].isin(sc_point_gids_of_interest)
]

with h5py.File(WIND_CF_FILEPATH, "r") as fh:
    meta = pd.DataFrame(fh["meta"][:])
    inds = meta[meta["sc_point_gid"].isin(sc_point_gids_of_interest)].index
    profile_scale_factor = fh["rep_profiles_0"].attrs["scale_factor"]
    profiles = fh["rep_profiles_0"][:, inds] / profile_scale_factor
    ti = pd.to_datetime(fh["time_index"][:].astype(str))

profiles.shape
fig = go.Figure()
for point, profile in zip(sc_point_gids_of_interest, profiles.T):
    fig.add_trace(
        go.Scatter(x=ti, y=profile, name=point, line={"width": 4})
    )

fig.update_layout(
    title="CF Profiles",
    xaxis_title="Time index (hours)",
    yaxis_title="Capacity factor (AC)",
    legend_title_text="SC Point GID",
    template="none",
)
fig.update_xaxes(type="date", range=["2012-01-01", "2012-01-06"])
fig.show()

Extracting info by Lat/Lon#

Extracting info for a particular lat/lon is trickier without rex, but still doable. You have a few options - the simplest is to do a supply curve sort by a distance to your desired lat/lon (this is what we show below). A more sophisticated but likely more accurate approach is to use a cKDtree to look up the closest lat/lon pair to the point in question. This approach is not covered in this notebook.

my_lat, my_lon = 48.985001, -119.624001

reference_wind_supply_curve["dist_to_my_point"] = np.hypot(
    reference_wind_supply_curve["latitude"] - my_lat,
    reference_wind_supply_curve["longitude"] - my_lon
)
point = reference_wind_supply_curve.sort_values(
    by="dist_to_my_point"
).iloc[[0]]
point
area_sq_km capacity_mw cnty_fips country county dist_km elevation latitude lcot longitude ... state timezone total_lcoe trans_cap_cost_per_mw reg_mult turbine_capacity sc_point_gid hub_height windspeed_m_per_s dist_to_my_point
20 9.169119 174 53047 United States Okanogan 39.078734 753.7826 48.985 17.410387 -119.624 ... Washington -8 56.62692 248247.3016 1.057722 6000 2334 115 5.65 0.000001

1 rows × 25 columns

Once you know the sc_point_gid, you can proceed as before:

with h5py.File(WIND_CF_FILEPATH, "r") as fh:
    meta = pd.DataFrame(fh["meta"][:])
    ind = meta[meta["sc_point_gid"] == point["sc_point_gid"].values[0]].index
    profile_scale_factor = fh["rep_profiles_0"].attrs["scale_factor"]
    profile = fh["rep_profiles_0"][:, ind] / profile_scale_factor
    ti = pd.to_datetime(fh["time_index"][:].astype(str))

profile.shape
fig = go.Figure()
fig.add_trace(go.Scatter(x=ti, y=profile[:, 0], line={"width": 4}))
fig.update_layout(
    title=f"CF Profile for SC Point GID: {point['sc_point_gid'].iloc[0]:,}",
    xaxis_title="Time index (hours)",
    yaxis_title="Capacity factor (AC)",
    template="none",
)
fig.update_xaxes(type="date", range=["2012-01-01", "2012-01-06"])
fig.show()

Conclusion#

In this tutorial, we have walked through the basic steps required to extract temporal profiles for a given supply curve CSV. You should now be able to:

  • Read reV HDF5 files using rex

  • Read reV HDF5 files using h5py (not recommended)

  • Extract sc_point_gid for a latitude/longitude pair

  • Link supply curve points to temporal profiles using sc_point_gid