sup3r.preprocessing.data_handling.exo_extraction.TopoExtractNC

class TopoExtractNC(file_paths, exo_source, s_enhance, t_enhance, t_agg_factor, target=None, shape=None, temporal_slice=None, raster_file=None, max_delta=20, input_handler=None, cache_data=True, cache_dir='./exo_cache/', ti_workers=1, distance_upper_bound=None, res_kwargs=None)[source]

Bases: TopoExtractH5

TopoExtract for netCDF files

Parameters:
  • file_paths (str | list) – A single source h5 file to extract raster data from or a list of netcdf files with identical grid. The string can be a unix-style file path which will be passed through glob.glob. This is typically low-res WRF output or GCM netcdf data files that is source low-resolution data intended to be sup3r resolved.

  • exo_source (str) – Filepath to source data file to get hi-res elevation data from which will be mapped to the enhanced grid of the file_paths input. Pixels from this exo_source will be mapped to their nearest low-res pixel in the file_paths input. Accordingly, exo_source should be a significantly higher resolution than file_paths. Warnings will be raised if the low-resolution pixels in file_paths do not have unique nearest pixels from exo_source. File format can be .h5 for TopoExtractH5 or .nc for TopoExtractNC

  • s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data from file_paths input. For example, if getting topography data, file_paths has 100km data, and s_enhance is 4, this class will output a topography raster corresponding to the file_paths grid enhanced 4x to ~25km

  • t_enhance (int) – Factor by which the Sup3rGan model will enhance the temporal dimension of low resolution data from file_paths input. For example, if getting sza data, file_paths has hourly data, and t_enhance is 4, this class will output a sza raster corresponding to the file_paths temporally enhanced 4x to 15 min

  • t_agg_factor (int) – Factor by which to aggregate / subsample the exo_source data to the resolution of the file_paths input enhanced by t_enhance. For example, if getting sza data, file_paths have hourly data, and t_enhance is 4 resulting in a target resolution of 15 min and exo_source has a resolution of 5 min, the t_agg_factor should be 3 so that only timesteps that are a multiple of 15min are selected e.g., [0, 5, 10, 15, 20, 25, 30][slice(0, None, 3)] = [0, 15, 30]

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • temporal_slice (slice | None) – slice used to extract interval from temporal dimension for input data and source data

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. Either need target+shape or raster_file.

  • max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20

  • input_handler (str) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.

  • cache_data (bool) – Flag to cache exogeneous data in <cache_dir>/exo_cache/ this can speed up forward passes with large temporal extents when the exo data is time independent.

  • cache_dir (str) – Directory for storing cache data. Default is ‘./exo_cache’

  • ti_workers (int | None) – max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.

  • distance_upper_bound (float | None) – Maximum distance to map high-resolution data from exo_source to the low-resolution file_paths input. None (default) will calculate this based on the median distance between points in exo_source

  • res_kwargs (dict | None) – Dictionary of kwargs passed to lowest level resource handler. e.g. xr.open_dataset(file_paths, **res_kwargs)

Methods

get_cache_file(feature, s_enhance, ...)

Get cache file name.

get_data()

Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).

get_exo_raster(file_paths, s_enhance, ...[, ...])

Get the exo feature raster corresponding to the spatially enhanced grid from the file_paths input

Attributes

data

Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).

distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

hr_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.

hr_shape

Get the high-resolution spatial shape tuple

hr_time_index

Get the full time index for aggregated source data

lr_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.

lr_shape

Get the low-resolution spatial shape tuple

nn

Get the nearest neighbor indices

source_data

Get the 1D array of elevation data from the exo_source_nc

source_handler

Get the DataHandlerNC object that handles the .nc source topography data file.

source_lat_lon

Get the 2D array (n, 2) of lat, lon data from the exo_source_nc

source_temporal_slice

Get the temporal slice for the exo_source data corresponding to the input file temporal slice

source_time_index

Time index of the source exo data

tree

Get the KDTree built on the target lat lon data from the file_paths input with s_enhance

property source_handler

Get the DataHandlerNC object that handles the .nc source topography data file.

property source_data

Get the 1D array of elevation data from the exo_source_nc

property source_lat_lon

Get the 2D array (n, 2) of lat, lon data from the exo_source_nc

property data

Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance). The shape is (lats, lons, temporal, 1)

property distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

get_cache_file(feature, s_enhance, t_enhance, t_agg_factor)

Get cache file name. This uses a time independent naming convention.

Parameters:
  • feature (str) – Name of feature to get cache file for

  • s_enhance (int) – Spatial enhancement for this exogeneous data step (cumulative for all model steps up to the current step).

  • t_enhance (int) – Temporal enhancement for this exogeneous data step (cumulative for all model steps up to the current step).

  • t_agg_factor (int) – Factor by which to aggregate the exo_source data to the temporal resolution of the file_paths input enhanced by t_enhance.

Returns:

cache_fp (str) – Name of cache file

get_data()

Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance). The shape is (lats, lons, 1)

classmethod get_exo_raster(file_paths, s_enhance, t_enhance, t_agg_factor, exo_source=None, target=None, shape=None, temporal_slice=None, raster_file=None, max_delta=20, input_handler=None, cache_data=True, cache_dir='./exo_cache/')

Get the exo feature raster corresponding to the spatially enhanced grid from the file_paths input

Parameters:
  • file_paths (str | list) – A single source h5 file to extract raster data from or a list of netcdf files with identical grid. The string can be a unix-style file path which will be passed through glob.glob

  • s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data from file_paths input. For example, if file_paths has 100km data and s_enhance is 4, this class will output a topography raster corresponding to the file_paths grid enhanced 4x to ~25km

  • t_enhance (int) – Factor by which the Sup3rGan model will enhance the temporal dimension of low resolution data from file_paths input. For example, if getting sza data, file_paths has hourly data, and t_enhance is 4, this class will output a sza raster corresponding to the file_paths temporally enhanced 4x to 15 min

  • t_agg_factor (int) – Factor by which to aggregate the exo_source data to the resolution of the file_paths input enhanced by t_enhance. For example, if getting sza data, file_paths have hourly data, and t_enhance is 4 resulting in a desired resolution of 5 min and exo_source has a resolution of 5 min, the t_agg_factor should be 4 so that every fourth timestep in the exo_source data is skipped.

  • exo_source (str) – Filepath to source wtk, nsrdb, or netcdf file to get hi-res (2km or 4km) data from which will be mapped to the enhanced grid of the file_paths input

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • temporal_slice (slice | None) – slice used to extract interval from temporal dimension for input data and source data

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. Either need target+shape or raster_file.

  • max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20

  • input_handler (str) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.

  • cache_data (bool) – Flag to cache exogeneous data in <cache_dir>/exo_cache/ this can speed up forward passes with large temporal extents when the exo data is time independent.

  • cache_dir (str) – Directory for storing cache data. Default is ‘./exo_cache’

Returns:

exo_raster (np.ndarray) – Exo feature raster with shape (hr_rows, hr_cols, h_temporal) corresponding to the shape of the spatiotemporally enhanced data from file_paths * s_enhance * t_enhance. The data units correspond to the source units in exo_source_h5. This is usually meters when feature=’topography’

property hr_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This corresponds to the enhanced meta data from the file_paths input * s_enhance.

Returns:

ndarray

property hr_shape

Get the high-resolution spatial shape tuple

property hr_time_index

Get the full time index for aggregated source data

property lr_lat_lon

Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This corresponds to the raw meta data from the file_paths input.

Returns:

ndarray

property lr_shape

Get the low-resolution spatial shape tuple

property nn

Get the nearest neighbor indices

property source_temporal_slice

Get the temporal slice for the exo_source data corresponding to the input file temporal slice

property source_time_index

Time index of the source exo data

property tree

Get the KDTree built on the target lat lon data from the file_paths input with s_enhance