gaps.collection.DatasetCollector#
- class DatasetCollector(h5_file, source_files, gids, dataset_in, dataset_out=None, memory_utilization_limit=0.7, pass_through=False)[source]#
Bases:
object
Collector for a single dataset.
- Parameters:
h5_file (path-like) – Path to h5_file into which dataset is to be collected.
source_files (list) – List of source filepaths.
gids (list) – List of gids to be collected.
dataset_in (str) – Name of dataset to collect.
dataset_out (str, optional) – Name of dataset into which collected data is to be written. If None the name of the output dataset is assumed to match the dataset input name. By default, None.
memory_utilization_limit (float, optional) – Memory utilization limit (fractional). This sets how many sites will be collected at a time. By default, 0.7.
pass_through (bool, optional) – Flag to just pass through dataset from one of the source files, assuming all of the source files have identical copies of this dataset. By default, False.
Methods
collect_dataset
(h5_file, source_files, gids, ...)Collect a dataset from multiple source files into a single file.
Attributes
True if there are duplicate gids being collected.
List of gids corresponding to all sites to be combined.
- classmethod collect_dataset(h5_file, source_files, gids, dataset_in, dataset_out=None, memory_utilization_limit=0.7, pass_through=False)[source]#
Collect a dataset from multiple source files into a single file.
- Parameters:
h5_file (path-like) – Path to h5_file into which dataset is to be collected.
source_files (list) – List of source filepaths.
gids (list) – List of gids to be collected.
dataset_in (str) – Name of dataset to collect.
dataset_out (str, optional) – Name of dataset into which collected data is to be written. If None the name of the output dataset is assumed to match the dataset input name. By default, None.
memory_utilization_limit (float, optional) – Memory utilization limit (fractional). This sets how many sites will be collected at a time. By default, 0.7.
pass_through (bool, optional) – Flag to just pass through dataset from one of the source files, assuming all of the source files have identical copies of this dataset. By default, False.