sup3r.preprocessing.rasterizers.exo.BaseExoRasterizer#
- class BaseExoRasterizer(feature: str | None = None, file_paths: str | None = None, source_files: str | None = None, source_handler_kwargs: dict | None = None, s_enhance: int = 1, t_enhance: int = 1, input_handler_name: str | None = None, input_handler_kwargs: dict | None = None, cache_dir: str | None = None, chunks: str | dict | None = 'auto', distance_upper_bound: int | None = None, fill_nans: bool = True, scale_factor: float = 1.0, max_workers: int = 1, verbose: bool = False)[source]#
Bases:
ABC
Class to extract high-res (4km+) data rasters for new spatially-enhanced datasets (e.g. GCM files after spatial enhancement) using nearest neighbor mapping and aggregation from high-res datasets (e.g. WTK or NSRDB)
- Parameters:
feature (str) – Name of exogenous feature to rasterize.
file_paths (str | list) – A single source h5 file to extract raster data from or a list of netcdf files with identical grid. The string can be a unix-style file path which will be passed through glob.glob. This is typically low-res WRF output or GCM netcdf data files that is source low-resolution data intended to be sup3r resolved.
source_files (str | list | None) – Filepath(s) to source data file(s) to get hi-res exogenous data, which will be mapped to the enhanced grid of the file_paths input. Pixels from these files will be mapped to their nearest low-res pixel in the file_paths input. Accordingly, source_files should be a significantly higher resolution than file_paths. Warnings will be raised if the low-resolution pixels in file_paths do not have unique nearest pixels from source_files. File format can be .h5 or .nc
s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data from file_paths input. For example, if getting topography data, file_paths has 100km data, and s_enhance is 4, this class will output a topography raster corresponding to the file_paths grid enhanced 4x to ~25km. This parameter is calculated automatically when running the forward pass with a config file.
t_enhance (int) – Factor by which the Sup3rGan model will enhance the temporal dimension of low resolution data from file_paths input. For example, if getting “sza” data, file_paths has hourly data, and t_enhance is 4, this class will output an “sza” raster corresponding to
file_paths
, temporally enhanced 4x to 15 min. This parameter is calculated automatically when running the forward pass with a config file.input_handler_name (str) – data handler class to use for input data. Provide a string name to match a
data_handler
orrasterizer
imported into~sup3r.preprocessing
. If None the correct handler will be guessed based on file type and time series properties.input_handler_kwargs (dict | None) – Any kwargs for initializing the
input_handler_name
class.source_handler_kwargs (dict | None) – Any kwargs for initializing the source handler (
Loader
).cache_dir (str | None) – Directory to use for caching rasterized data. If None (default) then no data will be cached. If a string is provided then this will be created if it does not exist and the rasterized data will be saved to this directory. This is useful for speeding up forward passes on large domains since the rasterized data will be cached once and then used for all forward passes on chunks of the full domain. Files will be saved to this directory with the name defined in
.cache_file
property. definechunks (str | dict) – Dictionary of dimension chunk sizes for returned exo data. e.g. {‘time’: 100, ‘south_north’: 100, ‘west_east’: 100}. This can also just be “auto”. This is passed to
.chunk()
before returning exo data through.data
attributedistance_upper_bound (float | None) – Maximum distance to map high-resolution data from source_files to the low-resolution file_paths input. None (default) will calculate this based on the median distance between points in source_files
fill_nans (bool) – Whether to fill nans in the output data. This should probably be True for all cases except for sparse observation data.
scale_factor (float) – Scale factor to apply to the raw data from the source_files. This is useful for scaling observation data which might systematically under or over estimate the true value. For example, MADIS data is negatively biased compared to 10m WTK data.
max_workers (int) – Number of workers used for writing data to cache files. Gets passed to
Cacher._write_single.
verbose (bool) – Whether to log output as each chunk is written to cache file.
Methods
get_data
()Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).
Maximum distance (float) to map high-resolution data from source_files to the low-resolution file_paths input.
Attributes
STATIC_FEATURES
cache_dir
Get cache file name
chunks
Get coords dictionary for initializing xr.Dataset.
Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance).
distance_upper_bound
feature
file_paths
fill_nans
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension.
Get the high-resolution spatiotemporal shape
Get the full time index for aggregated source data
input_handler_kwargs
input_handler_name
Get the low-resolution spatiotemporal shape
max_workers
Get the nearest neighbor indices.
s_enhance
scale_factor
Get the array of exogenous data from the source_files
source_files
Get the Loader object that handles the exogenous data file.
source_handler_kwargs
Get the 2D array (n, 2) of lat, lon data from the source_files
t_enhance
Get the KDTree built on the target lat lon data from the file_paths input with s_enhance
verbose
- property source_handler#
Get the Loader object that handles the exogenous data file.
- property source_data#
Get the array of exogenous data from the source_files
- property cache_file#
Get cache file name
- Returns:
cache_fp (str) – Name of cache file. This is a netcdf file which will be saved with
Cacher
and loaded withLoader
- property coords#
Get coords dictionary for initializing xr.Dataset.
- property source_lat_lon#
Get the 2D array (n, 2) of lat, lon data from the source_files
- property lr_shape#
Get the low-resolution spatiotemporal shape
- property hr_shape#
Get the high-resolution spatiotemporal shape
- property hr_lat_lon#
Lat lon grid for data in format (spatial_1, spatial_2, 2) Lat/Lon array with same ordering in last dimension. This corresponds to the enhanced meta data from the file_paths input * s_enhance.
- Returns:
ndarray
- property hr_time_index#
Get the full time index for aggregated source data
- get_distance_upper_bound()[source]#
Maximum distance (float) to map high-resolution data from source_files to the low-resolution file_paths input.
- property tree#
Get the KDTree built on the target lat lon data from the file_paths input with s_enhance
- property nn#
Get the nearest neighbor indices. This uses a single neighbor by default
- property data#
Get a raster of source values corresponding to the high-resolution grid (the file_paths input grid * s_enhance * t_enhance). The shape is (lats, lons, temporal, 1)