NREL Renewable Energy Resource Data
Welcome to the docs page for NREL’s renewable energy resource datasets! These docs apply to all of the NREL spatiotemporal meteorological datasets stored in HDF5 files including data for solar, wind, wave, and temperature variables. For example, these docs apply to these NREL data products (not an exhaustive list!):
The National Solar Radiation Database (NSRDB)
The Wind Integration National Dataset Toolkit (WIND Toolkit)
Other wind data stored in the WIND Toolkit AWS bucket (e.g., NOW-23, PR-100, Sup3rWind, international wind data, etc…)
High-resolution downscaled climate change data (Sup3rCC)
High Resolution Ocean Surface Wave Hindcast (US Wave) Data
Other spatiotemporal meteorological data from NREL!
Definitions
attributes
- Meta data associated with an NREL h5 file or a dataset within that h5 file. This can be information about how the file was created, the software versions used to create the data, physical units of datasets, scale factors for compressed integer storage, or something else.attributes
are stored in namespaces similar to python dictionaries for every h5 file and every dataset in every h5 file. This is not typically spatial meta data and is not related to themeta
dataset. For more information, see the h5py attributes docs.
chunks
- Data arrays in an h5 dataset are stored inchunks
which are subsets of the data array stored sequentially on disk. When reading an h5 file, you only have to read one chunk of data at a time, so if a file has a 1TB dataset with shape (8760, N) but the chunk shape is (8760, 100), you don’t have to read the full 1TB of data to access a singlegid
, you only have to read the single chunk of data (in this case a 8760x100 array). For more details, see the h5py chunks docs.
CLI
- Command Line Interface (CLI). A program you can run from a command line call in a shell e.g.,hsds
,hsls
, etc…
datasets
- Named arrays (e.g., “windspeed_100m”, “ghi”, “temperature_2m”, etc…) stored in an h5 file. These are frequently 2D arrays with dimensions (time, space) and can be sliced with a[idy, idx]
syntax. See the h5py dataset docs for details. We also refer to all our NREL data products as “datasets” so sorry for the confusion!
gid
- We commonly refer to locations in a spatiotemporal NREL dataset by the location’sgid
which is the spatial index of the location of interest (zero-indexed). For example, in a 2D dataset with shape (time, space),gid=99
(zero-indexed) would be the 100th column (1-indexed) in the 2D array.
h5
- File extension for the heirarchical data format (e.g., “HDF5”) that is widely used for spatiotemporal data at NREL. See the h5py library for more details.
h5pyd
- The python library that provides the HDF REST interface to NREL data hosted on the cloud. This allows for the public to access small parts of large cloud-hosted datasets. See the h5pyd library for more details.
hsds
- The highly scalable data service (HSDS) that we recommend to access small chunks of very large cloud-hosted NREL datasets. See the hsds library for more details.
meta
- Thedataset
in an NREL h5 file that contains information about the spatial axis. This is typically a pandas DataFrame with columns such as “latitude”, “longitude”, “state”, etc… The DataFrame is typically converted to a records array for storage in an h5dataset
. The length of the meta data should match the length of axis 1 of a 2D spatiotemporaldataset
.
S3
- Amazon Simple Storage Service (S3) is a basic cloud file storage system we use to store raw .h5 files in their full volume. Downloading files directly from S3 may not be the easiest way to access the data because each file tends to be multiple terabytes. Instead, you can stream small chunks of the files via HSDS.
scale_factor
- We frequently scale data by a multiplicative factor, round the data to integer precision, and store the data in integer arrays. Thescale_factor
is an attribute associated with the relevant h5dataset
that defines the multiplicative factor required to unscale the data from integer storage to the original physical units.
time_index
- Thedataset
in an NREL h5 file that contains information about the temporal axis. This is typically a pandas DatetimeIndex that has been converted to a string array for storage in an h5dataset
. The length of thisdataset
should match the length of axis 0 of a 2D spatiotemporaldataset
.
Data Format
NREL data is frequently provided in heirarchical data format (HDF5 or .h5).
Each file contains many datasets, with each dataset
representing a physical
variable or meta data. Datasets are commonly 2 dimensional time-series arrays
with dimensions (time, space). The temporal axis is defined by time_index
,
while the spatial axis is defined by meta
. For storage efficiency, we
commonly scale each dataset
by a multiplicative factor and store as an
integer. The scale_factor is provided in the scale_factor
attribute. The
units for each variable are also commonly provided as an attribute called
units
.
Data Location - NREL Users
If you are at NREL, the easiest way to access this data is on the NREL
high-performance computing system (HPC). Go to the NREL HPC website and request access via an NREL project with an
HPC allocation. Once you are on the HPC, you can find that datasets in the
/datasets/
directory (e.g., run the linux command $ ls /datasets/
). Go
through the directory tree until you find the .h5 files you are looking for.
This datasets directory should not be confused with a dataset
from an h5
file.
When using the rex
examples below, update the file paths with the relevant
NREL HPC file paths in /datasets/
.
Data Location - External Users
If you are not at NREL, you can’t just download these files. They are massive
and downloading the full files would crash your computer. The easiest way to
access this data is probably with fsspec
, which allows you to access files
directly on S3 with only one additional installation and no server setup.
However, this method is slow. The most performant method is via HSDS
.
HSDS
provides a solution to stream small chunks of the data to your laptop
or server for just the time or space domain you’re interested in.
See this docs page
for easy (but slow) access of the source .h5 files on s3 with fsspec
that
requires basically zero setup. To find relevant S3 files, you can explore the
S3 directory structure on OEDI or
with the AWS CLI
See this docs page for
instructions on how to set up HSDS for more performant data access that
requires a bit of setup. To find relevant HSDS files, you can use HSDS and
h5pyd to explore the NREL public data directory listings. For example, if you
are running an HSDS local server, you can use the CLI utility hsls
, for
example, run: $ hsls /nrel/
or $ hsls /nrel/nsrdb/v3/
. You can also use
h5pyd to do the same thing. In a python kernel, import h5pyd
and then run:
print(list(h5pyd.Folder('/nrel/')))
to list the /nrel/
directory.
There is also an experiment with using zarr, but the examples below may not work with these utilities and the zarr example is not regularly tested.
The Open Energy Data Initiative (OEDI) is also invaluable for finding the source s3 filepaths and for finding energy-relevant public datasets that are not necessarily spatiotemporal h5 data.
Data Access Examples
If you are on the NREL HPC, update the file paths with the relevant NREL HPC
file paths in /datasets/
.
If you are not at NREL, see the “Data Location - External Users” section above for S3 instructions or for how to setup HSDS and how to find the files that you’re interested in. Then update the file paths to the files you want either on HSDS or S3.
The rex Resource Class
Data access in rex is built on the Resource
class. The class can be used to
open and explore NREL h5 files, extract and automatically unscale data, and
retrieve time_index
and meta
datasets in their native pandas datatypes.
from rex import Resource
with Resource('/nrel/nsrdb/current/nsrdb_2020.h5') as res:
ghi = res['ghi', :, 500]
print(res.dsets)
print(res.attrs['ghi'])
print(res.time_index)
print(res.meta)
print(ghi)
Here, we are retrieving the ghi
dataset for all time indices (axis=0) for
gid
500 and also printing other useful meta data.
For a full description the Resource
class API see the docs here.
There are also special Resource
subclasses for many of the renewable energy
resource types. For a list of these classes and their corresponding
documentation, see the docs page here. For
example, the WindResource
class can be used to open files in the WIND
Toolkit bucket (including datasets like NOW-23 and Sup3rWind) and will
interpolate windspeeds to the desired hub height, even if the requested
windspeed is not available as a dataset
:
from rex import WindResource
with WindResource('/nrel/wtk/conus/wtk_conus_2007.h5') as res:
ws88 = res['windspeed_88m', :, 1000]
print(res.dsets)
print(ws88)
Here, notice that windspeed_88m
is not a dataset
available in the WIND
Toolkit file, but it can be requested by the WindResource
class, which
interpolates the windspeeds between the available 80 and 100 meter hub heights.
The rex Resource Extraction Class
There are also classes that implement additional quality-of-life features. For
example, you can use the ResourceX
class to retrieve a timeseries DataFrame
for a requested coordinate:
from rex import ResourceX
with ResourceX('/nrel/wtk/conus/wtk_conus_2007.h5') as res:
df = res.get_lat_lon_df('temperature_2m', (39.7407, -105.1686))
print(df)
Note that in this example, the ResourceX
object first has to download the
full meta
data, build a KDTree
, then query the tree. This takes a lot
of time for a single coordinate query. If you are querying multiple
coordinates, take a look at other methods like ResourceX.lat_lon_gid
that get the gid
for multiple coordinates at once. Also consider saving the
gid
indices you are interested in and reusing them instead of querying
these methods repeatedly.
You can also use a ResourceX
class specific to a given resource type (e.g.,
wind or solar) to retrieve a DataFrame with all variables you will need to run
the System Advisor Model (SAM). For example, try:
from rex import SolarX
with SolarX('/nrel/nsrdb/current/nsrdb_2020.h5') as res:
df = res.get_SAM_lat_lon((39.7407, -105.1686))
print(df)
For a full list of ResourceX
classes with additional features specific to
various renewable energy resource types, see the docs here.