sup3r.preprocessing.data_handling.exo_extraction.SzaExtract
- class SzaExtract(file_paths, exo_source, s_enhance, t_enhance, t_agg_factor, target=None, shape=None, temporal_slice=None, raster_file=None, max_delta=20, input_handler=None, cache_data=True, cache_dir='./exo_cache/', ti_workers=1, distance_upper_bound=None, res_kwargs=None)[source]
Bases:
ExoExtract
SzaExtract for H5 files
- Parameters:
file_paths (str | list) – A single source h5 file to extract raster data from or a list of netcdf files with identical grid. The string can be a unix-style file path which will be passed through glob.glob. This is typically low-res WRF output or GCM netcdf data files that is source low-resolution data intended to be sup3r resolved.
exo_source (str) – Filepath to source data file to get hi-res elevation data from which will be mapped to the enhanced grid of the file_paths input. Pixels from this exo_source will be mapped to their nearest low-res pixel in the file_paths input. Accordingly, exo_source should be a significantly higher resolution than file_paths. Warnings will be raised if the low-resolution pixels in file_paths do not have unique nearest pixels from exo_source. File format can be .h5 for TopoExtractH5 or .nc for TopoExtractNC
s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data from file_paths input. For example, if getting topography data, file_paths has 100km data, and s_enhance is 4, this class will output a topography raster corresponding to the file_paths grid enhanced 4x to ~25km
t_enhance (int) – Factor by which the Sup3rGan model will enhance the temporal dimension of low resolution data from file_paths input. For example, if getting sza data, file_paths has hourly data, and t_enhance is 4, this class will output a sza raster corresponding to the file_paths temporally enhanced 4x to 15 min
t_agg_factor (int) – Factor by which to aggregate / subsample the exo_source data to the resolution of the file_paths input enhanced by t_enhance. For example, if getting sza data, file_paths have hourly data, and t_enhance is 4 resulting in a target resolution of 15 min and exo_source has a resolution of 5 min, the t_agg_factor should be 3 so that only timesteps that are a multiple of 15min are selected e.g., [0, 5, 10, 15, 20, 25, 30][slice(0, None, 3)] = [0, 15, 30]
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
temporal_slice (slice | None) – slice used to extract interval from temporal dimension for input data and source data
raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. Either need target+shape or raster_file.
max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20
input_handler (str) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.
cache_data (bool) – Flag to cache exogeneous data in <cache_dir>/exo_cache/ this can speed up forward passes with large temporal extents when the exo data is time independent.
cache_dir (str) – Directory for storing cache data. Default is ‘./exo_cache’
ti_workers (int | None) – max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.
distance_upper_bound (float | None) – Maximum distance to map high-resolution data from exo_source to the low-resolution file_paths input. None (default) will calculate this based on the median distance between points in exo_source
res_kwargs (dict | None) – Dictionary of kwargs passed to lowest level resource handler. e.g. xr.open_dataset(file_paths, **res_kwargs)
Methods
get_cache_file
(feature, s_enhance, ...)Get cache file name
get_data
()Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).
get_exo_raster
(file_paths, s_enhance, ...[, ...])Get the exo feature raster corresponding to the spatially enhanced grid from the file_paths input
Attributes
Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).
Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.
Get the high-resolution spatial shape tuple
Get the full time index for aggregated source data
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.
Get the low-resolution spatial shape tuple
Get the nearest neighbor indices
Get the 1D array of sza data from the exo_source_h5
Get the 2D array (n, 2) of lat, lon data from the exo_source_h5
Get the temporal slice for the exo_source data corresponding to the input file temporal slice
Get the full time index of the exo_source data
Get the KDTree built on the target lat lon data from the file_paths input with s_enhance
- property source_data
Get the 1D array of sza data from the exo_source_h5
- get_data()[source]
Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance). The shape is (lats, lons, temporal)
- property data
Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance). The shape is (lats, lons, temporal, 1)
- property distance_upper_bound
Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.
- get_cache_file(feature, s_enhance, t_enhance, t_agg_factor)
Get cache file name
- Parameters:
feature (str) – Name of feature to get cache file for
s_enhance (int) – Spatial enhancement for this exogeneous data step (cumulative for all model steps up to the current step).
t_enhance (int) – Temporal enhancement for this exogeneous data step (cumulative for all model steps up to the current step).
t_agg_factor (int) – Factor by which to aggregate the exo_source data to the temporal resolution of the file_paths input enhanced by t_enhance.
- Returns:
cache_fp (str) – Name of cache file
- classmethod get_exo_raster(file_paths, s_enhance, t_enhance, t_agg_factor, exo_source=None, target=None, shape=None, temporal_slice=None, raster_file=None, max_delta=20, input_handler=None, cache_data=True, cache_dir='./exo_cache/')
Get the exo feature raster corresponding to the spatially enhanced grid from the file_paths input
- Parameters:
file_paths (str | list) – A single source h5 file to extract raster data from or a list of netcdf files with identical grid. The string can be a unix-style file path which will be passed through glob.glob
s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data from file_paths input. For example, if file_paths has 100km data and s_enhance is 4, this class will output a topography raster corresponding to the file_paths grid enhanced 4x to ~25km
t_enhance (int) – Factor by which the Sup3rGan model will enhance the temporal dimension of low resolution data from file_paths input. For example, if getting sza data, file_paths has hourly data, and t_enhance is 4, this class will output a sza raster corresponding to the file_paths temporally enhanced 4x to 15 min
t_agg_factor (int) – Factor by which to aggregate the exo_source data to the resolution of the file_paths input enhanced by t_enhance. For example, if getting sza data, file_paths have hourly data, and t_enhance is 4 resulting in a desired resolution of 5 min and exo_source has a resolution of 5 min, the t_agg_factor should be 4 so that every fourth timestep in the exo_source data is skipped.
exo_source (str) – Filepath to source wtk, nsrdb, or netcdf file to get hi-res (2km or 4km) data from which will be mapped to the enhanced grid of the file_paths input
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
temporal_slice (slice | None) – slice used to extract interval from temporal dimension for input data and source data
raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. Either need target+shape or raster_file.
max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20
input_handler (str) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.
cache_data (bool) – Flag to cache exogeneous data in <cache_dir>/exo_cache/ this can speed up forward passes with large temporal extents when the exo data is time independent.
cache_dir (str) – Directory for storing cache data. Default is ‘./exo_cache’
- Returns:
exo_raster (np.ndarray) – Exo feature raster with shape (hr_rows, hr_cols, h_temporal) corresponding to the shape of the spatiotemporally enhanced data from file_paths * s_enhance * t_enhance. The data units correspond to the source units in exo_source_h5. This is usually meters when feature=’topography’
- property hr_lat_lon
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This corresponds to the enhanced meta data from the file_paths input * s_enhance.
- Returns:
ndarray
- property hr_shape
Get the high-resolution spatial shape tuple
- property hr_time_index
Get the full time index for aggregated source data
- property lr_lat_lon
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This corresponds to the raw meta data from the file_paths input.
- Returns:
ndarray
- property lr_shape
Get the low-resolution spatial shape tuple
- property nn
Get the nearest neighbor indices
- property source_lat_lon
Get the 2D array (n, 2) of lat, lon data from the exo_source_h5
- property source_temporal_slice
Get the temporal slice for the exo_source data corresponding to the input file temporal slice
- property source_time_index
Get the full time index of the exo_source data
- property tree
Get the KDTree built on the target lat lon data from the file_paths input with s_enhance