sup3r.postprocessing.collectors.base.BaseCollector#

class BaseCollector(file_paths)[source]#

Bases: OutputMixin, ABC

Base collector class for H5/NETCDF collection

Parameters:

file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.<ext>. Files should have non-overlapping time_index and spatial domains.

Methods

collect(*args, **kwargs)

Collect data files from a dir to one output file.

get_chunk_indices(file)

Get spatial and temporal chunk indices from the given file name.

get_dset_attrs(feature)

Get attrributes for output feature

get_node_cmd(config)

Get a CLI call to collect data.

get_time_dim_name(filepath)

Get the name of the time dimension in the given file

write_data(out_file, dsets, time_index, ...)

Write list of datasets to out_file.

static get_chunk_indices(file)[source]#

Get spatial and temporal chunk indices from the given file name.

Returns:

  • temporal_chunk_index (str) – Zero padded integer for the temporal chunk index

  • spatial_chunk_index (str) – Zero padded integer for the spatial chunk index

abstract classmethod collect(*args, **kwargs)[source]#

Collect data files from a dir to one output file.

classmethod get_node_cmd(config)[source]#

Get a CLI call to collect data.

Parameters:

config (dict) – sup3r collection config with all necessary args and kwargs to run data collection.

static get_dset_attrs(feature)#

Get attrributes for output feature

Parameters:

feature (str) – Name of feature to write

Returns:

  • attrs (dict) – Dictionary of attributes for requested dset

  • dtype (str) – Data type for requested dset. Defaults to float32

static get_time_dim_name(filepath)#

Get the name of the time dimension in the given file

Parameters:

filepath (str) – Path to the file

Returns:

time_key (str) – Name of the time dimension in the given file

classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)#

Write list of datasets to out_file.

Parameters:
  • out_file (str) – Pre-existing H5 file output path

  • dsets (list) – list of datasets to write to out_file

  • time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.

  • data_list (list) – List of np.ndarray objects to write to out_file

  • meta (pd.DataFrame) – Full meta dataframe for the final output data.

  • global_attrs (dict) – Namespace of file-global attributes for the final output data.