sup3r.preprocessing.samplers.cc.DualSamplerCC#

class DualSamplerCC(data: Sup3rDataset, sample_shape: tuple | None = None, batch_size: int = 16, s_enhance: int = 1, t_enhance: int = 24, feature_sets: Dict | None = None)[source]#

Bases: DualSampler

Special sampling of WTK or NSRDB data for climate change applications

Note

This will always give daily / hourly data if t_enhance != 1. The number of days / hours in the samples is determined by t_enhance. For example, if t_enhance = 8 and sample_shape = (..., 24) there will be 3 days in the low res sample: lr_sample_shape = (…, 3). If 1 < t_enhance != 24 reduce_high_res_sub_daily() will be used to reduce a high res sample shape from (..., sample_shape[2] * 24 // t_enhance) to (..., sample_shape[2])

Parameters:
  • data (Sup3rDataset) – A Sup3rDataset instance with low-res and high-res data members

  • sample_shape (tuple) – Size of arrays to sample from the high-res data. The sample shape for the low-res sampler will be determined from the enhancement factors.

  • s_enhance (int) – Spatial enhancement factor

  • t_enhance (int) – Temporal enhancement factor

  • feature_sets (Optional[dict]) – Optional dictionary describing how the full set of features is split between lr_only_features and hr_exo_features.

    lr_only_featureslist | tuple

    List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.

    hr_exo_featureslist | tuple

    List of feature names or patt*erns that should be included in the high-resolution observation but not expected to be output from the generative model. An example is high-res topography that is to be injected mid-network.

See also

DualSampler

Methods

check_for_consistent_shapes()

Make sure container shapes and sample shapes are compatible with enhancement factors.

get_features(feature_sets)

Return default set of features composed from data vars in low res and high res data objects or the value provided through the feature_sets dictionary.

get_middle_days(high_res, sample_shape)

Get middle chunk of high_res data that will then be reduced to day time steps.

get_sample_index([n_obs])

Get sample index for expanded hourly chunk which will be reduced to the given sample shape.

post_init_log([args_dict])

Log additional arguments after initialization.

preflight()

Check if the sample_shape is larger than the requested raster size

reduce_high_res_sub_daily(high_res[, csr_ind])

Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.

wrap(data)

Return a Sup3rDataset object or tuple of such.

Attributes

data

Return underlying data.

hr_exo_features

Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection.

hr_features

Get the high-resolution features corresponding to hr_features_ind

hr_features_ind

Get the high-resolution feature channel indices that should be included for training.

hr_out_features

Get a list of high-resolution features that are intended to be output by the GAN.

hr_sample_shape

Shape of the data sample to select when __next__() is called.

lr_only_features

List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.

sample_shape

Shape of the data sample to select when __next__() is called.

shape

Get shape of underlying data.

check_for_consistent_shapes()[source]#

Make sure container shapes and sample shapes are compatible with enhancement factors.

reduce_high_res_sub_daily(high_res, csr_ind=0)[source]#

Take an hourly high-res observation and reduce the temporal axis down to lr_sample_shape[2] * t_enhance time steps, using only daylight hours on the middle part of the high res data.

Parameters:
  • high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal >= 24 (set by the data handler).

  • csr_ind (int) – Feature index of clearsky_ratio. e.g. self.data[…, csr_ind] -> cs_ratio

Returns:

high_res (Union[np.ndarray, da.core.Array]) – 5D array with dimensions (n_obs, spatial_1, spatial_2, temporal, n_features) where temporal has been reduced down to the integer lr_sample_shape[2] * t_enhance. For example if hr_sample_shape[2] is 9 and t_enhance = 8, 72 hourly time steps will be reduced to 9 using the center daylight 9 hours from the second day.

Note

This only does something when 1 < t_enhance < 24. If t_enhance = 24 there is no need for reduction since every daily time step will have 24 hourly time steps in the high_res batch data. Of course, if t_enhance = 1, we are running for a spatial only model so this routine is unnecessary.

*Needs review from @grantbuster

static get_middle_days(high_res, sample_shape)[source]#

Get middle chunk of high_res data that will then be reduced to day time steps. This has n_time_steps = 24 if sample_shape[-1] <= 24 otherwise n_time_steps = sample_shape[-1].

get_sample_index(n_obs=None)[source]#

Get sample index for expanded hourly chunk which will be reduced to the given sample shape.

property data#

Return underlying data.

Returns:

Sup3rDataset

See also

wrap()

get_features(feature_sets)#

Return default set of features composed from data vars in low res and high res data objects or the value provided through the feature_sets dictionary.

property hr_exo_features#

Get a list of exogenous high-resolution features that are only used for training e.g., mid-network high-res topo injection. These must come at the end of the high-res feature set. These can also be input to the model as low-res features.

property hr_features#

Get the high-resolution features corresponding to hr_features_ind

property hr_features_ind#

Get the high-resolution feature channel indices that should be included for training. Any high-resolution features that are only included in the data handler to be coarsened for the low-res input are removed

property hr_out_features#

Get a list of high-resolution features that are intended to be output by the GAN. Does not include high-resolution exogenous features

property hr_sample_shape: Tuple#

Shape of the data sample to select when __next__() is called. Same as sample_shape

property lr_only_features#

List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.

post_init_log(args_dict=None)#

Log additional arguments after initialization.

preflight()#

Check if the sample_shape is larger than the requested raster size

property sample_shape: Tuple#

Shape of the data sample to select when __next__() is called.

property shape#

Get shape of underlying data.

wrap(data)#

Return a Sup3rDataset object or tuple of such. This is a tuple when the .data attribute belongs to a Collection object like BatchHandler. Otherwise this is Sup3rDataset object, which is either a wrapped 2-tuple or 1-tuple (e.g. len(data) == 2 or len(data) == 1). This is a 2-tuple when .data belongs to a dual container object like DualSampler and a 1-tuple otherwise.