sup3r.preprocessing.accessor.Sup3rX#
- class Sup3rX(ds: Dataset | Self)[source]#
Bases:
object
Accessor for xarray - the suggested way to extend xarray functionality.
References
https://docs.xarray.dev/en/latest/internals/extending-xarray.html
Note
(1) This is an
xr.Dataset
style object with allxr.Dataset
methods, plus more. The way to access these methods is either through appending.sx.<method>
on anxr.Dataset
or by wrapping anxr.Dataset
withSup3rX
, e.g.Sup3rX(xr.Dataset(...)).<method>
. Throughout thesup3r
codebase we prefer to use the latter. The most important part of this interface is parsing__getitem__
calls of the formds.sx[keys]
.keys
can be a single feature name, list of features, or numpy style indexing for dimensions. e.g.ds.sx['u'][slice(0, 10), ...]
ords.sx[['u', 'v']][..., slice(0, 10)]
.If
ds[keys]
returns anxr.Dataset
object thends.sx[keys]
will return aSup3rX
object. e.g.
ds.sx[['u','v']]
) will return aSup3rX
instance butds.sx['u']
will return anxr.DataArray
Providing only numpy style indexing without features will return
an array with all contained features in the last dimension with a spatiotemporal shape corresponding to the indices. e.g.
ds[slice(0, 10), 0, 1]
will return an array of shape(10, 1, 1, n_features)
. This array will be a dask.array or numpy.array, depending on whether data is still on disk or loaded into memory.(2) The
__getitem__
and__getattr__
methods will cast back totype(self)
ifself._ds.__getitem__
orself._ds.__getattr__
returns an instance oftype(self._ds)
(e.g. anxr.Dataset
). This means we do not have to constantly append.sx
for successive calls to accessor methods.Examples
>>> # To use as an accessor: >>> ds = xr.Dataset(...) >>> feature_data = ds.sx[features] >>> ti = ds.sx.time_index >>> lat_lon_array = ds.sx.lat_lon
>>> # Use as wrapper: >>> ds = Sup3rX(xr.Dataset(data_vars={'windspeed': ...}, ...)) >>> np_array = ds['windspeed'].values >>> dask_array = ds['windspeed'][...] == ds['windspeed'].as_array()
Initialize accessor.
- Parameters:
ds (xr.Dataset | xr.DataArray) – xarray Dataset instance to access with the following methods
Methods
add_dims_to_data_vars
(vals)Add dimensions to vals entries if needed.
as_array
()Return
.data
attribute of an xarray.DataArray with our standard dimension order(lats, lons, time, ..., features)
assign
(vals)Override xarray assign and assign_coords methods to enable update without explicitly providing dimensions if variable already exists.
coarsen
(*args, **kwargs)Override xr.Dataset.coarsen to cast back to Sup3rX object.
compute
(**kwargs)Load ._ds into memory.
flatten
()Flatten rasterized dataset so that there is only a single spatial dimension.
interpolate_na
(**kwargs)Use xr.DataArray.interpolate_na to fill NaN values with a dask compatible method.
isel
(*args, **kwargs)Override xr.Dataset.isel to cast back to Sup3rX object.
mean
(**kwargs)Get mean directly from dataset object.
normalize
(means, stds)Normalize dataset using given means and stds.
ordered
(data)Return data with dimensions in standard order
(lats, lons, time, ..., features)
qa
()Check NaNs and stats for all features.
sample
(idx)Get sample from
self._ds
.std
(**kwargs)Get std directly from dataset object.
Return xr.DataArray for the contained xr.Dataset.
unflatten
(grid_shape)Convert flattened dataset into rasterized dataset with the given grid shape.
update_ds
(new_dset[, attrs])Update self._ds with coords and data_vars replaced with those provided.
Attributes
Get dtype of underlying array.
Features in this container.
Check if the contained data is flattened 2D data or 3D rasterized data.
Return the shape of the spatial dimensions.
Base lat lon for contained data.
Check if data has been loaded as numpy arrays.
Return dataframe of flattened lat / lon values.
Name of dataset.
Get shape of underlying xr.DataArray, using our standard dimension order.
Get size of data contained to use in weight calculations.
Return the value of the lower left hand coordinate.
Check if the contained data is time independent.
Base time index for contained data.
Get time step in seconds.
Return numpy values in standard dimension order
(lats, lons, time, ..., features)
- property values#
Return numpy values in standard dimension order
(lats, lons, time, ..., features)
- as_array()[source]#
Return
.data
attribute of an xarray.DataArray with our standard dimension order(lats, lons, time, ..., features)
- compute(**kwargs)[source]#
Load ._ds into memory. This updates the internal xr.Dataset if it has not been loaded already.
- property loaded#
Check if data has been loaded as numpy arrays.
- property flattened#
Check if the contained data is flattened 2D data or 3D rasterized data.
- property time_independent#
Check if the contained data is time independent.
- update_ds(new_dset, attrs=None)[source]#
Update self._ds with coords and data_vars replaced with those provided. These are both provided as dictionaries {name: dask.array}.
- Parameters:
new_dset (Dict[str, dask.array]) – Can contain any existing or new variable / coordinate as long as they all have a consistent shape.
- Returns:
_ds (xr.Dataset) – Updated dataset with provided coordinates and data_vars with variables in our standard dimension order.
- ordered(data)[source]#
Return data with dimensions in standard order
(lats, lons, time, ..., features)
- sample(idx)[source]#
Get sample from
self._ds
. The idx should be a tuple of slices for the dimensions(south_north, west_east, time)
and a list of feature names. e.g.(slice(0, 3), slice(1, 10), slice(None), ['u_10m', 'v_10m'])
- property name#
Name of dataset. Used to label datasets when grouped in
Data
objects. e.g. for low / high res pairs or daily / hourly data.
- normalize(means, stds)[source]#
Normalize dataset using given means and stds. These are provided as dictionaries.
- interpolate_na(**kwargs)[source]#
Use xr.DataArray.interpolate_na to fill NaN values with a dask compatible method.
- add_dims_to_data_vars(vals)[source]#
Add dimensions to vals entries if needed. This is used to set values of self._ds which can require dimensions to be explicitly specified for the data being set. e.g. self._ds[‘u_100m’] = ((‘south_north’, ‘west_east’, ‘time’), data). We make guesses on the correct dims if they are missing and give a warning. We add attributes if available in vals, as well
- Parameters:
vals (Dict[Str, Union]) – Dictionary of feature names and arrays to use for setting feature data. When arrays are >2 dimensions xarray needs explicit dimension info, so we need to add these if not provided.
- assign(vals: Dict[str, ndarray | Array | tuple])[source]#
Override xarray assign and assign_coords methods to enable update without explicitly providing dimensions if variable already exists.
- Parameters:
vals (dict) – Dictionary of variable names and either arrays or tuples of (dims, array). If dims are not provided this will try to use stored dims of the variable, if it exists already.
- property features#
Features in this container.
- property dtype#
Get dtype of underlying array.
- property shape#
Get shape of underlying xr.DataArray, using our standard dimension order.
- property size#
Get size of data contained to use in weight calculations.
- property time_index#
Base time index for contained data.
- property time_step#
Get time step in seconds.
- property lat_lon: ndarray | Array#
Base lat lon for contained data.
- property target#
Return the value of the lower left hand coordinate.
- property grid_shape#
Return the shape of the spatial dimensions.
- property meta#
Return dataframe of flattened lat / lon values. Can also be set to include additional data like elevation, country, state, etc