sup3r.preprocessing.rasterizers.extended.Rasterizer#

class Rasterizer(file_paths, features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, raster_file=None, max_delta=20, BaseLoader=None)[source]#

Bases: BaseRasterizer

Extended Rasterizer class which also handles the flattened data format used for some H5 files (e.g. Wind Toolkit or NSRDB data), and rasterizes directly from file paths rather than taking a Loader as input

Parameters:
  • file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass

  • features (list | str) – Features to return in loaded dataset. If ‘all’ then all available features will be returned.

  • res_kwargs (dict) – Additional keyword arguments passed through to the BaseLoader. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.

  • chunks (dict | str | None) – Dictionary of chunk sizes to pass through to dask.array.from_array() or xr.Dataset().chunk(). Will be converted to a tuple when used in from_array(). These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” in additional to a dictionary. If this is None then the data will not be chunked and instead loaded directly into memory.

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list [start, stop, step]

  • threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None and raster_index is not provided raster_index will be calculated directly. Either need target+shape, raster_file, or raster_index input.

  • max_delta (int) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances.

  • BaseLoader (Callable) – Optional base loader method update. This is a function which takes file_paths and **kwargs and returns an initialized base loader with those arguments. The default for h5 is a method which returns MultiFileWindX(file_paths, **kwargs) and for nc the default is xarray.open_mfdataset(file_paths, **kwargs)

Methods

check_target_and_shape(full_lat_lon)

The data is assumed to use a regular grid so if either target or shape is not given we can easily find the values that give the maximum extent.

get_closest_row_col(lat_lon, target)

Get closest indices to target lat lon

get_lat_lon()

Get the 2D array of coordinates corresponding to the requested target and shape.

get_raster_index()

Get set of slices or indices selecting the requested region from the contained data.

post_init_log([args_dict])

Log additional arguments after initialization.

rasterize_data()

Get rasterized data.

save_raster_index()

Save raster index to cache file.

wrap(data)

Return a Sup3rDataset object or tuple of such.

Attributes

data

Return underlying data.

grid_shape

Return the grid_shape based on the raster_index, since self._grid_shape does not need to be provided as an input if the raster_file is.

lat_lon

Get 2D grid of coordinates with target as the lower left coordinate.

shape

Get shape of underlying data.

target

Return the true value based on the closest lat lon instead of the user provided value self._target, which is used to find the closest lat lon.

time_slice

Return time slice for rasterized time period.

rasterize_data()[source]#

Get rasterized data.

save_raster_index()[source]#

Save raster index to cache file.

get_raster_index()[source]#

Get set of slices or indices selecting the requested region from the contained data.

get_lat_lon()[source]#

Get the 2D array of coordinates corresponding to the requested target and shape.

check_target_and_shape(full_lat_lon)#

The data is assumed to use a regular grid so if either target or shape is not given we can easily find the values that give the maximum extent.

property data#

Return underlying data.

Returns:

Sup3rDataset

See also

wrap()

get_closest_row_col(lat_lon, target)#

Get closest indices to target lat lon

Parameters:
  • lat_lon (ndarray) – Array of lat/lon (spatial_1, spatial_2, 2) Last dimension in order of (lat, lon)

  • target (tuple) – (lat, lon) for target coordinate

Returns:

  • row (int) – row index for closest lat/lon to target lat/lon

  • col (int) – col index for closest lat/lon to target lat/lon

property grid_shape#

Return the grid_shape based on the raster_index, since self._grid_shape does not need to be provided as an input if the raster_file is.

property lat_lon#

Get 2D grid of coordinates with target as the lower left coordinate. (lats, lons, 2)

post_init_log(args_dict=None)#

Log additional arguments after initialization.

property shape#

Get shape of underlying data.

property target#

Return the true value based on the closest lat lon instead of the user provided value self._target, which is used to find the closest lat lon.

property time_slice#

Return time slice for rasterized time period.

wrap(data)#

Return a Sup3rDataset object or tuple of such. This is a tuple when the .data attribute belongs to a Collection object like BatchHandler. Otherwise this is Sup3rDataset object, which is either a wrapped 2-tuple or 1-tuple (e.g. len(data) == 2 or len(data) == 1). This is a 2-tuple when .data belongs to a dual container object like DualSampler and a 1-tuple otherwise.