sup3r.preprocessing.data_handling.exogenous_data_handling.ExogenousDataHandler

class ExogenousDataHandler(file_paths, feature, steps, models=None, exo_resolution=None, source_file=None, target=None, shape=None, temporal_slice=None, raster_file=None, max_delta=20, input_handler=None, exo_handler=None, cache_data=True, cache_dir='./exo_cache', res_kwargs=None)[source]

Bases: object

Class to extract exogenous features for multistep forward passes. e.g. Multiple topography arrays at different resolutions for multiple spatial enhancement steps.

Parameters:
  • file_paths (str | list) – A single source h5 file or netcdf file to extract raster data from. The string can be a unix-style file path which will be passed through glob.glob. This is typically low-res WRF output or GCM netcdf data that is source low-resolution data intended to be sup3r resolved.

  • feature (str) – Exogenous feature to extract from file_paths

  • models (list) – List of models used with the given steps list. This list of models is used to determine the input and output resolution and enhancement factors for each model step which is then used to determine aggregation factors. If agg factors and enhancement factors are provided in the steps list the model list is not needed.

  • steps (list) – List of dictionaries containing info on which models to use for a given step index and what type of exo data the step requires. e.g. [{‘model’: 0, ‘combine_type’: ‘input’},

    {‘model’: 0, ‘combine_type’: ‘layer’}]

    Each step entry can also contain s_enhance, t_enhance, s_agg_factor, t_agg_factor. e.g. [{‘model’: 0, ‘combine_type’: ‘input’, ‘s_agg_factor’: 900,

    ‘s_enhance’: 1, ‘t_agg_factor’: 5, ‘t_enhance’: 1},

    {‘model’: 0, ‘combine_type’: ‘layer’, ‘s_agg_factor’, 100,

    ‘s_enhance’: 3, ‘t_agg_factor’: 5, ‘t_enhance’: 1}]

    If they are not included they will be computed using exo_resolution and model attributes.

  • exo_resolution (dict) – Dictionary of spatiotemporal resolution for the given exo data source. e.g. {‘spatial’: ‘4km’, ‘temporal’: ‘60min’}. This is used only if agg factors are not provided in the steps list.

  • source_file (str) – Filepath to source wtk, nsrdb, or netcdf file to get hi-res data from which will be mapped to the enhanced grid of the file_paths input. Pixels from this file will be mapped to their nearest low-res pixel in the file_paths input. Accordingly, the input should be a significantly higher resolution than file_paths. Warnings will be raised if the low-resolution pixels in file_paths do not have unique nearest pixels from this exo source data.

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • temporal_slice (slice | None) – slice used to extract interval from temporal dimension for input data and source data

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. Either need target+shape or raster_file.

  • max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20

  • input_handler (str) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.

  • exo_handler (str) – Feature extract class to use for source data. For example, if feature=’topography’ this should be either TopoExtractH5 or TopoExtractNC. If None the correct handler will be guessed based on file type and time series properties.

  • cache_data (bool) – Flag to cache exogeneous data in <cache_dir>/exo_cache/ this can speed up forward passes with large temporal extents

  • cache_dir (str) – Directory for storing cache data. Default is ‘./exo_cache’

  • res_kwargs (dict | None) – Dictionary of kwargs passed to lowest level resource handler. e.g. xr.open_dataset(file_paths, **res_kwargs)

Methods

get_agg_factors(input_res, exo_res)

Compute aggregation ratio for exo data given input and output resolution

get_exo_data(feature, s_enhance, t_enhance, ...)

Get the exogenous topography data

get_exo_handler(feature, source_file, ...)

Get exogenous feature extraction class for source file

input_check()

Make sure agg factors are provided or exo_resolution and models are provided.

Attributes

AVAILABLE_HANDLERS

input_check()[source]

Make sure agg factors are provided or exo_resolution and models are provided. Make sure enhancement factors are provided or models are provided

get_agg_factors(input_res, exo_res)[source]

Compute aggregation ratio for exo data given input and output resolution

Parameters:
  • input_res (dict | None) – Input resolution. e.g. {‘spatial’: ‘30km’, ‘temporal’: ‘60min’}

  • exo_res (dict | None) – Exogenous data resolution. e.g. {‘spatial’: ‘1km’, ‘temporal’: ‘5min’}

Returns:

  • s_agg_factor (int) – Spatial aggregation factor for exogenous data extraction.

  • t_agg_factor (int) – Temporal aggregation factor for exogenous data extraction.

get_exo_data(feature, s_enhance, t_enhance, s_agg_factor, t_agg_factor)[source]

Get the exogenous topography data

Parameters:
  • feature (str) – Name of feature to get exo data for

  • s_enhance (int) – Spatial enhancement for this exogeneous data step (cumulative for all model steps up to the current step).

  • t_enhance (int) – Temporal enhancement for this exogeneous data step (cumulative for all model steps up to the current step).

  • s_agg_factor (int) – Factor by which to aggregate the exo_source data to the spatial resolution of the file_paths input enhanced by s_enhance.

  • t_agg_factor (int) – Factor by which to aggregate the exo_source data to the temporal resolution of the file_paths input enhanced by t_enhance.

Returns:

data (np.ndarray) – 2D or 3D array of exo data with shape (lat, lon) or (lat, lon, temporal)

classmethod get_exo_handler(feature, source_file, exo_handler)[source]

Get exogenous feature extraction class for source file

Parameters:
  • feature (str) – Name of feature to get exo handler for

  • source_file (str) – Filepath to source wtk, nsrdb, or netcdf file to get hi-res (2km or 4km) data from which will be mapped to the enhanced grid of the file_paths input

  • exo_handler (str) – Feature extract class to use for source data. For example, if feature=’topography’ this should be either TopoExtractH5 or TopoExtractNC. If None the correct handler will be guessed based on file type and time series properties.

Returns:

exo_handler (str) – Exogenous feature extraction class to use for source data.