sup3r.preprocessing.rasterizers.extended.Rasterizer#
- class Rasterizer(file_paths, features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, raster_file=None, max_delta=20, BaseLoader=None)[source]#
Bases:
BaseRasterizer
Extended Rasterizer class which also handles the flattened data format used for some H5 files (e.g. Wind Toolkit or NSRDB data), and rasterizes directly from file paths rather than taking a Loader as input
- Parameters:
file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass
features (list | str) – Features to return in loaded dataset. If ‘all’ then all available features will be returned.
res_kwargs (dict) – Additional keyword arguments passed through to the
BaseLoader
. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.chunks (dict | str | None) – Dictionary of chunk sizes to pass through to
dask.array.from_array()
orxr.Dataset().chunk()
. Will be converted to a tuple when used infrom_array()
. These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” in additional to a dictionary. If this is None then the data will not be chunked and instead loaded directly into memory.target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list
[start, stop, step]
threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.
raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None and raster_index is not provided raster_index will be calculated directly. Either need target+shape, raster_file, or raster_index input.
max_delta (int) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances.
BaseLoader (Callable) – Optional base loader method update. This is a function which takes file_paths and **kwargs and returns an initialized base loader with those arguments. The default for h5 is a method which returns MultiFileWindX(file_paths, **kwargs) and for nc the default is xarray.open_mfdataset(file_paths, **kwargs)
Methods
check_target_and_shape
(full_lat_lon)The data is assumed to use a regular grid so if either target or shape is not given we can easily find the values that give the maximum extent.
get_closest_row_col
(lat_lon, target)Get closest indices to target lat lon
Get the 2D array of coordinates corresponding to the requested target and shape.
Get set of slices or indices selecting the requested region from the contained data.
post_init_log
([args_dict])Log additional arguments after initialization.
Get rasterized data.
Save raster index to cache file.
wrap
(data)Return a
Sup3rDataset
object or tuple of such.Attributes
Return underlying data.
Return the grid_shape based on the raster_index, since self._grid_shape does not need to be provided as an input if the raster_file is.
Get 2D grid of coordinates with target as the lower left coordinate.
Get shape of underlying data.
Return the true value based on the closest lat lon instead of the user provided value self._target, which is used to find the closest lat lon.
Return time slice for rasterized time period.
- get_raster_index()[source]#
Get set of slices or indices selecting the requested region from the contained data.
- get_lat_lon()[source]#
Get the 2D array of coordinates corresponding to the requested target and shape.
- check_target_and_shape(full_lat_lon)#
The data is assumed to use a regular grid so if either target or shape is not given we can easily find the values that give the maximum extent.
- property data#
Return underlying data.
- Returns:
See also
- get_closest_row_col(lat_lon, target)#
Get closest indices to target lat lon
- Parameters:
lat_lon (ndarray) – Array of lat/lon (spatial_1, spatial_2, 2) Last dimension in order of (lat, lon)
target (tuple) – (lat, lon) for target coordinate
- Returns:
row (int) – row index for closest lat/lon to target lat/lon
col (int) – col index for closest lat/lon to target lat/lon
- property grid_shape#
Return the grid_shape based on the raster_index, since self._grid_shape does not need to be provided as an input if the raster_file is.
- property lat_lon#
Get 2D grid of coordinates with target as the lower left coordinate. (lats, lons, 2)
- post_init_log(args_dict=None)#
Log additional arguments after initialization.
- property shape#
Get shape of underlying data.
- property target#
Return the true value based on the closest lat lon instead of the user provided value self._target, which is used to find the closest lat lon.
- property time_slice#
Return time slice for rasterized time period.
- wrap(data)#
Return a
Sup3rDataset
object or tuple of such. This is a tuple when the .data attribute belongs to aCollection
object likeBatchHandler
. Otherwise this isSup3rDataset
object, which is either a wrapped 2-tuple or 1-tuple (e.g.len(data) == 2
orlen(data) == 1)
. This is a 2-tuple when.data
belongs to a dual container object likeDualSampler
and a 1-tuple otherwise.