sup3r.preprocessing.base.Sup3rDataset#

class Sup3rDataset(**dsets: Mapping[str, Dataset | Sup3rX])[source]#

Bases: object

Interface for interacting with one or two xr.Dataset instances. This is a wrapper around one or two Sup3rX objects so they work well with Dual objects like DualSampler, DualRasterizer, DualBatchHandler, etc…)

Examples

>>> # access high_res or low_res:
>>> hr = xr.Dataset(...)
>>> lr = xr.Dataset(...)
>>> ds = Sup3rDataset(low_res=lr, high_res=hr)
>>> ds.high_res; ds.low_res  # returns Sup3rX objects
>>> ds[feature]  # returns a tuple of dataarray (low_res, high_res)
>>> # access hourly or daily:
>>> daily = xr.Dataset(...)
>>> hourly = xr.Dataset(...)
>>> ds = Sup3rDataset(daily=daily, hourly=hourly)
>>> ds.hourly; ds.daily  # returns Sup3rX objects
>>> ds[feature]  # returns a tuple of dataarray (daily, hourly)
>>> # single resolution data access:
>>> xds = xr.Dataset(...)
>>> ds = Sup3rDataset(hourly=xds)
>>> ds.hourly  # returns Sup3rX object
>>> ds[feature]  # returns a single dataarray

Note

(1) This may seem similar to Collection, which also can contain multiple data members, but members of Collection objects are completely independent while here there are at most two members which are related as low / high res versions of the same underlying data.

(2) Here we make an important choice to use high_res members to compute means / stds. It would be reasonable to instead use the average of high_res and low_res means / stds for aggregate stats but we want to preserve the relationship between coarsened variables after normalization (e.g. temperature_2m, temperature_max_2m, temperature_min_2m). This means all these variables should have the same means and stds, which ultimately come from the high_res non coarsened variable.

Parameters:

dsets (Mapping[str, xr.Dataset | Sup3rX | Sup3rDataset]) – Sup3rDataset is initialized from a flexible kwargs input. The keys will be used as names in a named tuple and the values will be the dataset members. These names will also be used to define attributes which point to these dataset members. You can provide name=data or name1=data1, name2=data2 and then access these datasets as .name1 or .name2. If dsets values are xr.Dataset objects these will be cast to Sup3rX objects first. We also check if dsets values are Sup3rDataset objects and if they only include one data member we use those to reinitialize a Sup3rDataset

Methods

compute(**kwargs)

Load data into memory for each data member.

isel(*args, **kwargs)

Return new Sup3rDataset with isel applied to each member.

mean(**kwargs)

Use the high_res members to compute the means.

normalize(means, stds)

Normalize dataset using the given mean and stds.

rewrap(data)

Rewrap data as Sup3rDataset after calling parent method.

sample(idx)

Get samples from self._ds members.

std(**kwargs)

Use the high_res members to compute the stds.

Attributes

dtype

Get datatype of first member.

features

The features are determined by the set of features from all data members.

loaded

Check if all data members have been loaded into memory.

shape

We use the shape of the largest data member.

size

Return number of elements in the largest data member.

property dtype#

Get datatype of first member. Assumed to be constant for all members.

rewrap(data)[source]#

Rewrap data as Sup3rDataset after calling parent method.

sample(idx)[source]#

Get samples from self._ds members. idx should be either a tuple of slices for the dimensions (south_north, west_east, time) and a list of feature names or a 2-tuple of the same, for dual datasets.

isel(*args, **kwargs)[source]#

Return new Sup3rDataset with isel applied to each member.

property shape#

We use the shape of the largest data member. These are assumed to be ordered as (low-res, high-res) if there are two members.

property features#

The features are determined by the set of features from all data members.

property size#

Return number of elements in the largest data member.

mean(**kwargs)[source]#

Use the high_res members to compute the means. These are used for normalization during training.

std(**kwargs)[source]#

Use the high_res members to compute the stds. These are used for normalization during training.

normalize(means, stds)[source]#

Normalize dataset using the given mean and stds. These are provided as dictionaries.

compute(**kwargs)[source]#

Load data into memory for each data member.

property loaded#

Check if all data members have been loaded into memory.