sup3r.preprocessing.base.Sup3rDataset#
- class Sup3rDataset(**dsets: Mapping[str, Dataset | Sup3rX])[source]#
Bases:
object
Interface for interacting with one or two
xr.Dataset
instances. This is a wrapper around one or twoSup3rX
objects so they work well with Dual objects likeDualSampler
,DualRasterizer
,DualBatchHandler
, etc…)Examples
>>> # access high_res or low_res: >>> hr = xr.Dataset(...) >>> lr = xr.Dataset(...) >>> ds = Sup3rDataset(low_res=lr, high_res=hr) >>> ds.high_res; ds.low_res # returns Sup3rX objects >>> ds[feature] # returns a tuple of dataarray (low_res, high_res)
>>> # access hourly or daily: >>> daily = xr.Dataset(...) >>> hourly = xr.Dataset(...) >>> ds = Sup3rDataset(daily=daily, hourly=hourly) >>> ds.hourly; ds.daily # returns Sup3rX objects >>> ds[feature] # returns a tuple of dataarray (daily, hourly)
>>> # single resolution data access: >>> xds = xr.Dataset(...) >>> ds = Sup3rDataset(hourly=xds) >>> ds.hourly # returns Sup3rX object >>> ds[feature] # returns a single dataarray
Note
(1) This may seem similar to
Collection
, which also can contain multiple data members, but members ofCollection
objects are completely independent while here there are at most two members which are related as low / high res versions of the same underlying data.(2) Here we make an important choice to use high_res members to compute means / stds. It would be reasonable to instead use the average of high_res and low_res means / stds for aggregate stats but we want to preserve the relationship between coarsened variables after normalization (e.g. temperature_2m, temperature_max_2m, temperature_min_2m). This means all these variables should have the same means and stds, which ultimately come from the high_res non coarsened variable.
- Parameters:
dsets (Mapping[str, xr.Dataset | Sup3rX | Sup3rDataset]) –
Sup3rDataset
is initialized from a flexible kwargs input. The keys will be used as names in a named tuple and the values will be the dataset members. These names will also be used to define attributes which point to these dataset members. You can providename=data
orname1=data1, name2=data2
and then access these datasets as.name1
or.name2
. If dsets values are xr.Dataset objects these will be cast toSup3rX
objects first. We also check if dsets values areSup3rDataset
objects and if they only include one data member we use those to reinitialize aSup3rDataset
Methods
compute
(**kwargs)Load data into memory for each data member.
isel
(*args, **kwargs)Return new Sup3rDataset with isel applied to each member.
mean
(**kwargs)Use the high_res members to compute the means.
normalize
(means, stds)Normalize dataset using the given mean and stds.
rewrap
(data)Rewrap data as
Sup3rDataset
after calling parent method.sample
(idx)Get samples from
self._ds
members.std
(**kwargs)Use the high_res members to compute the stds.
Attributes
Get datatype of first member.
The features are determined by the set of features from all data members.
Check if all data members have been loaded into memory.
We use the shape of the largest data member.
Return number of elements in the largest data member.
- property dtype#
Get datatype of first member. Assumed to be constant for all members.
- sample(idx)[source]#
Get samples from
self._ds
members. idx should be either a tuple of slices for the dimensions (south_north, west_east, time) and a list of feature names or a 2-tuple of the same, for dual datasets.
- property shape#
We use the shape of the largest data member. These are assumed to be ordered as (low-res, high-res) if there are two members.
- property features#
The features are determined by the set of features from all data members.
- property size#
Return number of elements in the largest data member.
- mean(**kwargs)[source]#
Use the high_res members to compute the means. These are used for normalization during training.
- std(**kwargs)[source]#
Use the high_res members to compute the stds. These are used for normalization during training.
- normalize(means, stds)[source]#
Normalize dataset using the given mean and stds. These are provided as dictionaries.
- property loaded#
Check if all data members have been loaded into memory.