sup3r.postprocessing.collectors.base.BaseCollector#
- class BaseCollector(file_paths)[source]#
Bases:
OutputMixin
,ABC
Base collector class for H5/NETCDF collection
- Parameters:
file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.<ext>. Files should have non-overlapping time_index and spatial domains.
Methods
collect
(*args, **kwargs)Collect data files from a dir to one output file.
get_chunk_indices
(file)Get spatial and temporal chunk indices from the given file name.
get_dset_attrs
(feature)Get attrributes for output feature
get_node_cmd
(config)Get a CLI call to collect data.
get_time_dim_name
(filepath)Get the name of the time dimension in the given file
write_data
(out_file, dsets, time_index, ...)Write list of datasets to out_file.
- static get_chunk_indices(file)[source]#
Get spatial and temporal chunk indices from the given file name.
- Returns:
temporal_chunk_index (str) – Zero padded integer for the temporal chunk index
spatial_chunk_index (str) – Zero padded integer for the spatial chunk index
- abstract classmethod collect(*args, **kwargs)[source]#
Collect data files from a dir to one output file.
- classmethod get_node_cmd(config)[source]#
Get a CLI call to collect data.
- Parameters:
config (dict) – sup3r collection config with all necessary args and kwargs to run data collection.
- static get_dset_attrs(feature)#
Get attrributes for output feature
- Parameters:
feature (str) – Name of feature to write
- Returns:
attrs (dict) – Dictionary of attributes for requested dset
dtype (str) – Data type for requested dset. Defaults to float32
- static get_time_dim_name(filepath)#
Get the name of the time dimension in the given file
- Parameters:
filepath (str) – Path to the file
- Returns:
time_key (str) – Name of the time dimension in the given file
- classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)#
Write list of datasets to out_file.
- Parameters:
out_file (str) – Pre-existing H5 file output path
dsets (list) – list of datasets to write to out_file
time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.
data_list (list) – List of np.ndarray objects to write to out_file
meta (pd.DataFrame) – Full meta dataframe for the final output data.
global_attrs (dict) – Namespace of file-global attributes for the final output data.