sup3r.preprocessing.samplers.cc.DualSamplerCC#
- class DualSamplerCC(data: Sup3rDataset, sample_shape: tuple | None = None, batch_size: int = 16, s_enhance: int = 1, t_enhance: int = 24, feature_sets: Dict | None = None)[source]#
Bases:
DualSampler
Special sampling of WTK or NSRDB data for climate change applications
Note
This will always give daily / hourly data if
t_enhance != 1
. The number of days / hours in the samples is determined by t_enhance. For example, ift_enhance = 8
andsample_shape = (..., 24)
there will be 3 days in the low res sample: lr_sample_shape = (…, 3). If1 < t_enhance != 24
reduce_high_res_sub_daily()
will be used to reduce a high res sample shape from(..., sample_shape[2] * 24 // t_enhance)
to(..., sample_shape[2])
- Parameters:
data (Sup3rDataset) – A
Sup3rDataset
instance with low-res and high-res data memberssample_shape (tuple) – Size of arrays to sample from the high-res data. The sample shape for the low-res sampler will be determined from the enhancement factors.
s_enhance (int) – Spatial enhancement factor
t_enhance (int) – Temporal enhancement factor
feature_sets (Optional[dict]) – Optional dictionary describing how the full set of features is split between
lr_only_features
andhr_exo_features
.- lr_only_featureslist | tuple
List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.
- hr_exo_featureslist | tuple
List of feature names or patt*erns that should be included in the high-resolution observation but not expected to be output from the generative model. An example is high-res topography that is to be injected mid-network.
See also
DualSampler
Methods
Make sure container shapes and sample shapes are compatible with enhancement factors.
get_features
(feature_sets)Return default set of features composed from data vars in low res and high res data objects or the value provided through the feature_sets dictionary.
get_middle_days
(high_res, sample_shape)Get middle chunk of high_res data that will then be reduced to day time steps.
get_sample_index
([n_obs])Get sample index for expanded hourly chunk which will be reduced to the given sample shape.
post_init_log
([args_dict])Log additional arguments after initialization.
Check if the sample_shape is larger than the requested raster size
reduce_high_res_sub_daily
(high_res[, csr_ind])Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.
wrap
(data)Return a
Sup3rDataset
object or tuple of such.Attributes
Return underlying data.
Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection.
Get the high-resolution features corresponding to hr_features_ind
Get the high-resolution feature channel indices that should be included for training.
Get a list of high-resolution features that are intended to be output by the GAN.
Shape of the data sample to select when __next__() is called.
List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.
Shape of the data sample to select when
__next__()
is called.Get shape of underlying data.
- check_for_consistent_shapes()[source]#
Make sure container shapes and sample shapes are compatible with enhancement factors.
- reduce_high_res_sub_daily(high_res, csr_ind=0)[source]#
Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.
- Parameters:
high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal >= 24 (set by the data handler).
csr_ind (int) – Feature index of clearsky_ratio. e.g. self.data[…, csr_ind] -> cs_ratio
- Returns:
high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal has been reduced down to the integer lr_sample_shape[2] * t_enhance. For example if hr_sample_shape[2] is 9 and t_enhance = 8, 72 hourly time steps will be reduced to 9 using the center daylight 9 hours from the second day.
Note
This only does something when
1 < t_enhance < 24.
Ift_enhance = 24
there is no need for reduction since every daily time step will have 24 hourly time steps in the high_res batch data. Of course, ift_enhance = 1
, we are running for a spatial only model so this routine is unnecessary.*Needs review from @grantbuster
- static get_middle_days(high_res, sample_shape)[source]#
Get middle chunk of high_res data that will then be reduced to day time steps. This has n_time_steps = 24 if sample_shape[-1] <= 24 otherwise n_time_steps = sample_shape[-1].
- get_sample_index(n_obs=None)[source]#
Get sample index for expanded hourly chunk which will be reduced to the given sample shape.
- property data#
Return underlying data.
- Returns:
See also
- get_features(feature_sets)#
Return default set of features composed from data vars in low res and high res data objects or the value provided through the feature_sets dictionary.
- property hr_exo_features#
Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection. These must come at the end of the high-res feature set. These can also be input to the model as low-res features.
- property hr_features#
Get the high-resolution features corresponding to hr_features_ind
- property hr_features_ind#
Get the high-resolution feature channel indices that should be included for training. Any high-resolution features that are only included in the data handler to be coarsened for the low-res input are removed
- property hr_out_features#
Get a list of high-resolution features that are intended to be output by the GAN. Does not include high-resolution exogenous features
- property hr_sample_shape: Tuple#
Shape of the data sample to select when __next__() is called. Same as sample_shape
- property lr_only_features#
List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.
- post_init_log(args_dict=None)#
Log additional arguments after initialization.
- preflight()#
Check if the sample_shape is larger than the requested raster size
- property shape#
Get shape of underlying data.
- wrap(data)#
Return a
Sup3rDataset
object or tuple of such. This is a tuple when the .data attribute belongs to aCollection
object likeBatchHandler
. Otherwise this isSup3rDataset
object, which is either a wrapped 2-tuple or 1-tuple (e.g.len(data) == 2
orlen(data) == 1)
. This is a 2-tuple when.data
belongs to a dual container object likeDualSampler
and a 1-tuple otherwise.