sup3r.preprocessing.accessor.Sup3rX#

class Sup3rX(ds: Dataset | Self)[source]#

Bases: object

Accessor for xarray - the suggested way to extend xarray functionality.

References

https://docs.xarray.dev/en/latest/internals/extending-xarray.html

Note

(1) This is an xr.Dataset style object with all xr.Dataset methods, plus more. The way to access these methods is either through appending .sx.<method> on an xr.Dataset or by wrapping an xr.Dataset with Sup3rX, e.g. Sup3rX(xr.Dataset(...)).<method>. Throughout the sup3r codebase we prefer to use the latter. The most important part of this interface is parsing __getitem__ calls of the form ds.sx[keys].

  1. keys can be a single feature name, list of features, or numpy style indexing for dimensions. e.g. ds.sx['u'][slice(0, 10), ...] or ds.sx[['u', 'v']][..., slice(0, 10)].

  2. If ds[keys] returns an xr.Dataset object then ds.sx[keys] will return a Sup3rX object. e.g.

ds.sx[['u','v']]) will return a Sup3rX instance but ds.sx['u'] will return an xr.DataArray

  1. Providing only numpy style indexing without features will return

an array with all contained features in the last dimension with a spatiotemporal shape corresponding to the indices. e.g. ds[slice(0, 10), 0, 1] will return an array of shape (10, 1, 1, n_features). This array will be a dask.array or numpy.array, depending on whether data is still on disk or loaded into memory.

(2) The __getitem__ and __getattr__ methods will cast back to type(self) if self._ds.__getitem__ or self._ds.__getattr__ returns an instance of type(self._ds) (e.g. an xr.Dataset). This means we do not have to constantly append .sx for successive calls to accessor methods.

Examples

>>> # To use as an accessor:
>>> ds = xr.Dataset(...)
>>> feature_data = ds.sx[features]
>>> ti = ds.sx.time_index
>>> lat_lon_array = ds.sx.lat_lon
>>> # Use as wrapper:
>>> ds = Sup3rX(xr.Dataset(data_vars={'windspeed': ...}, ...))
>>> np_array = ds['windspeed'].values
>>> dask_array = ds['windspeed'][...] == ds['windspeed'].as_array()

Initialize accessor.

Parameters:

ds (xr.Dataset | xr.DataArray) – xarray Dataset instance to access with the following methods

Methods

add_dims_to_data_vars(vals)

Add dimensions to vals entries if needed.

as_array()

Return .data attribute of an xarray.DataArray with our standard dimension order (lats, lons, time, ..., features)

assign(vals)

Override xarray assign and assign_coords methods to enable update without explicitly providing dimensions if variable already exists.

coarsen(*args, **kwargs)

Override xr.Dataset.coarsen to cast back to Sup3rX object.

compute(**kwargs)

Load ._ds into memory.

flatten()

Flatten rasterized dataset so that there is only a single spatial dimension.

interpolate_na(**kwargs)

Use xr.DataArray.interpolate_na to fill NaN values with a dask compatible method.

isel(*args, **kwargs)

Override xr.Dataset.isel to cast back to Sup3rX object.

mean(**kwargs)

Get mean directly from dataset object.

normalize(means, stds)

Normalize dataset using given means and stds.

ordered(data)

Return data with dimensions in standard order (lats, lons, time, ..., features)

qa()

Check NaNs and stats for all features.

sample(idx)

Get sample from self._ds.

std(**kwargs)

Get std directly from dataset object.

to_dataarray()

Return xr.DataArray for the contained xr.Dataset.

unflatten(grid_shape)

Convert flattened dataset into rasterized dataset with the given grid shape.

update_ds(new_dset[, attrs])

Update self._ds with coords and data_vars replaced with those provided.

Attributes

dtype

Get dtype of underlying array.

features

Features in this container.

flattened

Check if the contained data is flattened 2D data or 3D rasterized data.

grid_shape

Return the shape of the spatial dimensions.

lat_lon

Base lat lon for contained data.

loaded

Check if data has been loaded as numpy arrays.

meta

Return dataframe of flattened lat / lon values.

name

Name of dataset.

shape

Get shape of underlying xr.DataArray, using our standard dimension order.

size

Get size of data contained to use in weight calculations.

target

Return the value of the lower left hand coordinate.

time_independent

Check if the contained data is time independent.

time_index

Base time index for contained data.

time_step

Get time step in seconds.

values

Return numpy values in standard dimension order (lats, lons, time, ..., features)

property values#

Return numpy values in standard dimension order (lats, lons, time, ..., features)

to_dataarray() ndarray | Array[source]#

Return xr.DataArray for the contained xr.Dataset.

as_array()[source]#

Return .data attribute of an xarray.DataArray with our standard dimension order (lats, lons, time, ..., features)

compute(**kwargs)[source]#

Load ._ds into memory. This updates the internal xr.Dataset if it has not been loaded already.

property loaded#

Check if data has been loaded as numpy arrays.

property flattened#

Check if the contained data is flattened 2D data or 3D rasterized data.

property time_independent#

Check if the contained data is time independent.

update_ds(new_dset, attrs=None)[source]#

Update self._ds with coords and data_vars replaced with those provided. These are both provided as dictionaries {name: dask.array}.

Parameters:

new_dset (Dict[str, dask.array]) – Can contain any existing or new variable / coordinate as long as they all have a consistent shape.

Returns:

_ds (xr.Dataset) – Updated dataset with provided coordinates and data_vars with variables in our standard dimension order.

ordered(data)[source]#

Return data with dimensions in standard order (lats, lons, time, ..., features)

sample(idx)[source]#

Get sample from self._ds. The idx should be a tuple of slices for the dimensions (south_north, west_east, time) and a list of feature names. e.g. (slice(0, 3), slice(1, 10), slice(None), ['u_10m', 'v_10m'])

property name#

Name of dataset. Used to label datasets when grouped in Data objects. e.g. for low / high res pairs or daily / hourly data.

isel(*args, **kwargs)[source]#

Override xr.Dataset.isel to cast back to Sup3rX object.

coarsen(*args, **kwargs)[source]#

Override xr.Dataset.coarsen to cast back to Sup3rX object.

mean(**kwargs)[source]#

Get mean directly from dataset object.

std(**kwargs)[source]#

Get std directly from dataset object.

normalize(means, stds)[source]#

Normalize dataset using given means and stds. These are provided as dictionaries.

interpolate_na(**kwargs)[source]#

Use xr.DataArray.interpolate_na to fill NaN values with a dask compatible method.

add_dims_to_data_vars(vals)[source]#

Add dimensions to vals entries if needed. This is used to set values of self._ds which can require dimensions to be explicitly specified for the data being set. e.g. self._ds[‘u_100m’] = ((‘south_north’, ‘west_east’, ‘time’), data). We make guesses on the correct dims if they are missing and give a warning. We add attributes if available in vals, as well

Parameters:

vals (Dict[Str, Union]) – Dictionary of feature names and arrays to use for setting feature data. When arrays are >2 dimensions xarray needs explicit dimension info, so we need to add these if not provided.

assign(vals: Dict[str, ndarray | Array | tuple])[source]#

Override xarray assign and assign_coords methods to enable update without explicitly providing dimensions if variable already exists.

Parameters:

vals (dict) – Dictionary of variable names and either arrays or tuples of (dims, array). If dims are not provided this will try to use stored dims of the variable, if it exists already.

property features#

Features in this container.

property dtype#

Get dtype of underlying array.

property shape#

Get shape of underlying xr.DataArray, using our standard dimension order.

property size#

Get size of data contained to use in weight calculations.

property time_index#

Base time index for contained data.

property time_step#

Get time step in seconds.

property lat_lon: ndarray | Array#

Base lat lon for contained data.

property target#

Return the value of the lower left hand coordinate.

property grid_shape#

Return the shape of the spatial dimensions.

property meta#

Return dataframe of flattened lat / lon values. Can also be set to include additional data like elevation, country, state, etc

unflatten(grid_shape)[source]#

Convert flattened dataset into rasterized dataset with the given grid shape.

flatten()[source]#

Flatten rasterized dataset so that there is only a single spatial dimension.

qa()[source]#

Check NaNs and stats for all features.

__mul__(other)[source]#

Multiply Sup3rX object by other. Used to compute weighted means and stdevs.