sup3r.preprocessing.batch_handlers.factory.BatchHandlerCC#

class BatchHandlerCC(train_containers, *, val_containers=None, sample_shape=None, batch_size=16, n_batches=64, s_enhance=1, t_enhance=1, means=None, stds=None, queue_cap=None, transform_kwargs=None, mode='lazy', feature_sets=None, **kwargs)#

Bases: DualBatchQueue

BatchHandler object built from two lists of Container objects, one with training data and one with validation data. These lists will be used to initialize lists of class:Sampler objects that will then be used to build batches at run time.

Notes

These lists of containers can contain data from the same underlying data source (e.g. CONUS WTK) (e.g. initialize train / val containers with different time period and / or regions, or they can be used to sample from completely different data sources (e.g. train on CONUS WTK while validating on Canada WTK).

See also

Sampler, AbstractBatchQueue, StatsCollection

Parameters:
  • train_containers (List[Container]) – List of objects with a .data attribute, which will be used to initialize Sampler objects and then used to initialize a batch queue of training data. The data can be a Sup3rX or Sup3rDataset object.

  • val_containers (List[Container]) – List of objects with a .data attribute, which will be used to initialize Sampler objects and then used to initialize a batch queue of validation data. The data can be a Sup3rX or a Sup3rDataset object.

  • batch_size (int) – Number of observations / samples in a batch

  • n_batches (int) – Number of batches in an epoch, this sets the iteration limit for this object.

  • s_enhance (int) – Integer factor by which the spatial axes is to be enhanced.

  • t_enhance (int) – Integer factor by which the temporal axes is to be enhanced.

  • means (str | dict | None) – Usually a file path for loading / saving results, or None for just calculating stats and not saving. Can also be a dict.

  • stds (str | dict | None) – Usually a file path for loading / saving results, or None for just calculating stats and not saving. Can also be a dict.

  • queue_cap (int) – Maximum number of batches the batch queue can store.

  • transform_kwargs (Union[Dict, None]) – Dictionary of kwargs to be passed to self.transform. This method performs smoothing / coarsening.

  • mode (str) – Loading mode. Default is ‘lazy’, which only loads data into memory as batches are queued. ‘eager’ will load all data into memory right away.

  • feature_sets (Optional[dict]) – Optional dictionary describing how the full set of features is split between lr_only_features and hr_exo_features.

    lr_only_featureslist | tuple

    List of feature names or patt*erns that should only be included in the low-res training set and not the high-res observations.

    hr_exo_featureslist | tuple

    List of feature names or patt*erns that should be included in the high-resolution observation but not expected to be output from the generative model. An example is high-res topography that is to be injected mid-network.

  • kwargs (dict) – Additional keyword arguments for BatchQueue and / or Samplers. This can vary depending on the type of BatchQueue / Sampler given to the Factory. For example, to build a BatchHandlerDC object (data-centric batch handler) we use a queue and sampler which takes spatial and temporal weight / bin arguments used to determine how to weigh spatiotemporal regions when sampling. Using ConditionalBatchQueue will result in arguments for computing moments from batches and how to pad batch data to enable these calculations.

  • sample_shape (tuple) – Size of arrays to sample from the high-res data. The sample shape for the low-res sampler will be determined from the enhancement factors.

Methods

check_enhancement_factors()

Make sure each DualSampler has the same enhancment factors and they match those provided to the BatchQueue.

check_features()

Make sure all samplers have the same sets of features.

check_shared_attr(attr)

Check if all containers have the same value for attr.

enqueue_batch()

Build batch and send to queue.

enqueue_batches()

Callback function for queue thread.

get_batch()

Get batch from queue or directly from a Sampler through sample_batch.

get_container_index()

Get random container index based on weights

get_queue()

Return FIFO queue for storing batches.

get_random_container()

Get random container based on container weights

init_samplers(train_containers, ...)

Initialize samplers from given data containers.

log_queue_info()

Log info about queue size.

post_init_log([args_dict])

Log additional arguments after initialization.

post_proc(samples)

Performs some post proc on dequeued samples before sending out for training.

preflight()

Run checks before kicking off the queue.

sample_batch()

Get random sampler from collection and return a batch of samples from that sampler.

start()

Start the val data batch queue in addition to the train batch queue.

stop()

Stop the val data batch queue in addition to the train batch queue.

transform(samples[, smoothing, smoothing_ignore])

Perform smoothing if requested.

wrap(data)

Return a Sup3rDataset object or tuple of such.

Attributes

container_weights

Get weights used to sample from different containers based on relative sizes

data

Return underlying data.

features

Get all features contained in data.

hr_shape

Shape of high resolution sample in a low-res / high-res pair.

lr_shape

Shape of low resolution sample in a low-res / high-res pair.

queue_shape

Shape of objects stored in the queue.

queue_thread

Get new queue thread.

running

Boolean to check whether to keep enqueueing batches.

shape

Get shape of underlying data.

class Batch(low_res, high_res)#

Bases: tuple

Create new instance of Batch(low_res, high_res)

__add__(value, /)#

Return self+value.

__mul__(value, /)#

Return self*value.

count(value, /)#

Return number of occurrences of value.

high_res#

Alias for field number 1

index(value, start=0, stop=9223372036854775807, /)#

Return first index of value.

Raises ValueError if the value is not present.

low_res#

Alias for field number 0

SAMPLER#

alias of DualSamplerCC

TRAIN_QUEUE#

alias of DualBatchQueue

VAL_QUEUE#

alias of DualBatchQueue

check_enhancement_factors()#

Make sure each DualSampler has the same enhancment factors and they match those provided to the BatchQueue.

check_features()#

Make sure all samplers have the same sets of features.

check_shared_attr(attr)#

Check if all containers have the same value for attr. If they do the collection effectively inherits those attributes.

property container_weights#

Get weights used to sample from different containers based on relative sizes

property data#

Return underlying data.

Returns:

Sup3rDataset

See also

wrap()

enqueue_batch()#

Build batch and send to queue.

enqueue_batches() None#

Callback function for queue thread. While training, the queue is checked for empty spots and filled. In the training thread, batches are removed from the queue.

property features#

Get all features contained in data.

get_batch() Batch#

Get batch from queue or directly from a Sampler through sample_batch.

get_container_index()#

Get random container index based on weights

get_queue()#

Return FIFO queue for storing batches.

get_random_container()#

Get random container based on container weights

property hr_shape#

Shape of high resolution sample in a low-res / high-res pair. (e.g. (spatial_1, spatial_2, temporal, features))

init_samplers(train_containers, val_containers, sample_shape, feature_sets, batch_size, sampler_kwargs)#

Initialize samplers from given data containers.

log_queue_info()#

Log info about queue size.

property lr_shape#

Shape of low resolution sample in a low-res / high-res pair. (e.g. (spatial_1, spatial_2, temporal, features))

post_init_log(args_dict=None)#

Log additional arguments after initialization.

post_proc(samples) Batch#

Performs some post proc on dequeued samples before sending out for training. Post processing can include coarsening on high-res data (if Collection consists of Sampler objects and not DualSampler objects), smoothing, etc

Returns:

Batch (namedtuple) – namedtuple with low_res and high_res attributes

preflight()#

Run checks before kicking off the queue.

property queue_shape#

Shape of objects stored in the queue.

property queue_thread#

Get new queue thread.

property running#

Boolean to check whether to keep enqueueing batches.

sample_batch()#

Get random sampler from collection and return a batch of samples from that sampler.

Notes

These samples are wrapped in an np.asarray call, so they have been loaded into memory.

property shape#

Get shape of underlying data.

start()#

Start the val data batch queue in addition to the train batch queue.

stop()#

Stop the val data batch queue in addition to the train batch queue.

transform(samples, smoothing=None, smoothing_ignore=None)#

Perform smoothing if requested.

Note

This does not include temporal or spatial coarsening like SingleBatchQueue

wrap(data)#

Return a Sup3rDataset object or tuple of such. This is a tuple when the .data attribute belongs to a Collection object like BatchHandler. Otherwise this is Sup3rDataset object, which is either a wrapped 2-tuple or 1-tuple (e.g. len(data) == 2 or len(data) == 1). This is a 2-tuple when .data belongs to a dual container object like DualSampler and a 1-tuple otherwise.