GeoTIFFs to reV HDF5 Files#

Prerequisites:


Introduction#

In the previous tutorial, we demonstrated how we can use reVX’s Geotiff handler to manage geotiff files.

In this tutorial, we will go over getting TIFF files into a reV-ready format using the LayeredH5 handler.

We’ll cover the following steps:

  1. Creating a Layered HDF5 file using a GeoTIFF template

  2. Writing layers to the Layered HDF5 file

  3. Extracting layers from the Layered HDF5 file

  4. All of the above from the command line

Let’s get started!

Downloading the data#

Before we dive into the code, we first have to download a sample TIFF from Siting Lab to use as an example of adding data to a layered HDF5 file. In particular, we will be using data from [GDS24a], [GDS24b], [GDS24c], and [GDS24d].

If you have already downloaded the data, you can skip this step (just make sure path variables below are set correctly). We’ll start by defining the local file path destination:

AIRPORT_HELIPORT_SETBACKS = "airport_heliport_setbacks.tif"
NEXRAD_GREEN_LOS = "NEXRAD_green_los.tif"
SETBACKS_PIPELINE_REFERENCE = "setbacks_pipeline_reference.tif"
SETBACKS_STRUCTURE_115HH_170RD = "setbacks_structure_115hh_170rd.tif"
SETBACKS_STRUCTURE_REFERENCE = "setbacks_structure_reference.tif"

Let’s also define the URL for each of these files:

FILE_URLS = {
    AIRPORT_HELIPORT_SETBACKS: "https://data.openei.org/files/6120/airport_heliport_setbacks.tif",
    NEXRAD_GREEN_LOS: "https://data.openei.org/files/6121/nexrad_4km.tif",
    SETBACKS_PIPELINE_REFERENCE: "https://data.openei.org/files/6125/setbacks_pipeline_115hh_170rd_extrapolated.tif",
    SETBACKS_STRUCTURE_115HH_170RD: "https://data.openei.org/files/6132/setbacks_structure_115hh_170rd_extrapolated.tif",
    SETBACKS_STRUCTURE_REFERENCE: "https://data.openei.org/files/6132/setbacks_structure_115hh_170rd.tif"
}

Next, we can use a siting lab utility function to download the data. This function uses urllib (which is part of the Python standard library) under the hood.

Note: The source TIFF files are large (90m resolution for all of CONUS), so we specified crop=True to crop the data immediately after downloading it to make it easier to work with. If you have a machine with sufficiently large memory (32GB+), or you are downloading the file in order to use it for analysis purposes, you should set crop=False.
def download(local_filepath):
    url = FILE_URLS[local_filepath]
    download_tiff_file(url, local_filepath, crop=True)


with ThreadPool(len(FILE_URLS)) as p:
    p.map(download, FILE_URLS)

Working with Layered HDF5 files#

In this section, we will outline some basic workflows using the LayeredH5 class.

Creating the Layered HDF5 file from TIFF#

First, we will initialize the LayeredH5 object.

If creating a new HDF5 file that does not exist, we use the .create_new() method.

When creating a new HDF5, a template filepath must be specified. The template file is used to define the properties of the HDF5 file including:

  1. The profile information

  2. Coordinate reference system and projection

  3. The geographic extent, spatial resolution

All other files that are subsequently added to the HDF5 file will be transformed/adjusted to fit the properties of the template file before being written to the file.

H5_PATH = "example.h5"

# Initialize layered h5 object
h5 = LayeredH5(H5_PATH, template_file=NEXRAD_GREEN_LOS)

# If file doesn't exist, create new h5
h5.create_new()

Inspecting the HDF5 file (using the layers property), we see that the first two layers are longitude and latitude arrays. These are the coordinate locations for each grid cell defined by the template file pixels.

# Use the layer method to see the layers in the H5 file
h5.layers
['latitude', 'longitude']

Meta data information about the HDF5 file can be retrieved by using .profile and .shape

print(f"H5 profile: {h5.profile}")
print(f"shape: {h5.shape}")
H5 profile: {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': 255.0, 'width': 2000, 'height': 2000, 'count': 1, 'crs': '+init=epsg:5070', 'transform': (90.0, 0.0, 1829980.2632930684, 0.0, -90.0, 2297068.2309463923), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'lzma', 'interleave': 'band'}
shape: (2000, 2000)

Writing layers to the HDF5 file#

Once the HDF5 file is created (or if it exists already), we can write NumPy arrays and TIFF files into the h5 files using the .write_layer_to_h5() and .write_geotiff_to_h5() respectively

# Adding numpy arrays
with Geotiff(NEXRAD_GREEN_LOS) as geo:
    h5.write_layer_to_h5(
        values=geo.values,
        layer_name="nexrad_green_los",
        profile=geo.profile,
        description="NEXRAD Line of sight"
    )
# Adding a geotiff file directly
h5.write_geotiff_to_h5(
    geotiff=AIRPORT_HELIPORT_SETBACKS,
    layer_name="airport_heliport_setbacks",
    description="Setbacks from airports and heliports",
    replace=False
)

Now we can check to see what layers are currently in the HDF5 file:

# Checking current layers in the
h5.layers
['airport_heliport_setbacks', 'latitude', 'longitude', 'nexrad_green_los']

We can also add multiple GeoTIFFs into the h5 using the .layers_to_h5() method. This method accepts two types of inputs:

  1. A list of GeoTIFFs filepaths. In this case, the layer name in the HDF5 file will be the stem of the filename.

  2. A dictionary mapping layer names to GeoTIFFs filepaths.

Optionally, you can include a dictionary mapping layer names to layer descriptions (in text format) using the description argument.

file_list = [
    SETBACKS_PIPELINE_REFERENCE,
    SETBACKS_STRUCTURE_115HH_170RD,
    SETBACKS_STRUCTURE_REFERENCE,
]

print(f"Adding {len(file_list)} file(s) to the h5...")
for fn in file_list:
    print(fn.split(".")[0])

h5.layers_to_h5(
    layers=file_list,
    replace=False,
    descriptions={
        "setbacks_pipeline_reference": "This dataset represents wind energy "
        "setback requirements from oil and gas pipelines. A setback "
        "requirement is a minimum distance from a pipeline that an energy "
        "project may be developed. As of April 2022, no ordinances were "
        "discovered for any counties. Such ordinances are likely to arise as "
        "regulations continue to expand. Therefore, this dataset applies a "
        "median setback equivalent to 1.1 times the turbine tip-height, "
        "sourced from trends in other infrastructure. The turbine parameters "
        "used were a hub-height of 115 meters and a rotor diameter of 170 "
        "meters, as obtained from the Annual Technology Baseline (ATB) 2022."
    }
)
Adding 3 file(s) to the h5...
setbacks_pipeline_reference
setbacks_structure_115hh_170rd
setbacks_structure_reference

We can check that our layers have indeed been added to the HDF5 file:

# Checking current layers in the h5
h5.layers
['airport_heliport_setbacks',
 'latitude',
 'longitude',
 'nexrad_green_los',
 'setbacks_pipeline_reference',
 'setbacks_structure_115hh_170rd',
 'setbacks_structure_reference']

We can also check that our description for airport_heliport_setbacks has been properly added:

with h5py.File(H5_PATH) as h5_fh:
    print(h5_fh["setbacks_pipeline_reference"].attrs["description"])
This dataset represents wind energy setback requirements from oil and gas pipelines. A setback requirement is a minimum distance from a pipeline that an energy project may be developed. As of April 2022, no ordinances were discovered for any counties. Such ordinances are likely to arise as regulations continue to expand. Therefore, this dataset applies a median setback equivalent to 1.1 times the turbine tip-height, sourced from trends in other infrastructure. The turbine parameters used were a hub-height of 115 meters and a rotor diameter of 170 meters, as obtained from the Annual Technology Baseline (ATB) 2022.

Extracting Layers from the HDF5 file#

Layers in the HDF5 file can also be extracted to GeoTIFFs. Simply call .layer_to_geotiff() to extract a single layer:

# Extracting a single layer
layer = "airport_heliport_setbacks"
out_filepath = "airport_heliport_setbacks_h5_extract.tif"
h5.layer_to_geotiff(layer=layer, geotiff=out_filepath)

Alternatively, you can call .extract_layers() to extract multiple layers. This method requires you to pass a dictionary mapping layer names to output filepaths:

# Extracting multiple layers
layers = {
    "nexrad_green_los": "nexrad_green_los_h5_extract.tif",
    "setbacks_pipeline_reference": "setbacks_pipeline_reference_h5_extract.tif"
}
h5.extract_layers(layers)

All the layers in the HDF5 can be extracted using the .extract_all_layers() method. To use it, simply pass an output directory where the extracted files should be written:

h5.extract_all_layers(out_dir=".")

Layered HDF5 file via CLI#

Alternatively, the command line can be used to create, add to, and extract layers from the layered HDF5 file.

Adding GeoTIFFs to the Layered HDF5 file#

First, we need to construct a JSON config file that contains layer name mapping to GeoTIFF filepaths. This JSON configuration file can optionally contain layer descriptions as a dictionary. For example, suppose we create a layers.json file with the following content:

{
    "layers":{
        "nexrad_green_los": "nexrad_green_los.tif",
        "setbacks_pipeline_reference": "./setbacks_pipeline_reference.tif"
    },
    "descriptions": {
        "setbacks_pipeline_reference": "This dataset represents wind energy setback requirements from oil and gas pipelines."
    }
}

Then we can run the following command:

$ reVX exclusions -h5 example_cli.h5 layers-to-h5 --layers layers.json

We can check wether the write was successful or not using the h5ls command (you may have to run conda install h5py):

$ h5ls example_cli.h5 
latitude                 Dataset {2000, 2000}
longitude                Dataset {2000, 2000}
nexrad_green_los         Dataset {1, 2000, 2000}
setbacks_pipeline_reference Dataset {1, 2000, 2000}

Extracting GeoTIFF layers from the HDF5 file#

To extract layers from the h5 file, we pass a list of layers to extract as well as the desired output directory as arguments to the command:

$ mkdir data
$ reVX exclusions -h5 example_cli.h5 layers-from-h5 -l nexrad_green_los,setbacks_pipeline_reference -o ./data

We can check wether the write was successful or not using the h5ls command (you may have to run conda install h5py):

$ ls data
nexrad_green_los.tif  setbacks_pipeline_reference.tif

Conclusion#

In this tutorial, we have walked through the basic steps to create and add to a Layered HDF5 file. This type of file is used primarily as the exclusions layer input for reV supply curve aggregation. You should now be able to:

  • Create a layered HDF5 file

  • Add layers to the HDF5 file

  • Extract layers from the HDF5 file