sup3r.postprocessing.collectors.nc.CollectorNC#
- class CollectorNC(file_paths)[source]#
Bases:
BaseCollector
Sup3r NETCDF file collection framework
- Parameters:
file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.<ext>. Files should have non-overlapping time_index and spatial domains.
Methods
collect
(file_paths, out_file[, features, ...])Collect data files from a dir to one output file.
get_chunk_indices
(file)Get spatial and temporal chunk indices from the given file name.
get_dset_attrs
(feature)Get attrributes for output feature
get_node_cmd
(config)Get a CLI call to collect data.
get_time_dim_name
(filepath)Get the name of the time dimension in the given file
Group same spatial chunks together so each chunk has same spatial footprint but different times
write_data
(out_file, dsets, time_index, ...)Write list of datasets to out_file.
- classmethod collect(file_paths, out_file, features='all', log_level=None, log_file=None, overwrite=True, res_kwargs=None)[source]#
Collect data files from a dir to one output file.
- Filename requirements:
Should end with “.nc”
- Parameters:
file_paths (list | str) – Explicit list of str file paths that will be sorted and collected or a single string with unix-style /search/patt*ern.nc.
out_file (str) – File path of final output file.
features (list | str) – List of dsets to collect. If ‘all’ then all
data_vars
will be collected.log_level (str | None) – Desired log level, None will not initialize logging.
log_file (str | None) – Target log file. None logs to stdout.
write_status (bool) – Flag to write status file once complete if running from pipeline.
job_name (str) – Job name for status file if running from pipeline.
overwrite (bool) – Whether to overwrite existing output file
res_kwargs (dict | None) – Dictionary of kwargs to pass to xarray.open_mfdataset.
- group_spatial_chunks()[source]#
Group same spatial chunks together so each chunk has same spatial footprint but different times
- static get_chunk_indices(file)#
Get spatial and temporal chunk indices from the given file name.
- Returns:
temporal_chunk_index (str) – Zero padded integer for the temporal chunk index
spatial_chunk_index (str) – Zero padded integer for the spatial chunk index
- static get_dset_attrs(feature)#
Get attrributes for output feature
- Parameters:
feature (str) – Name of feature to write
- Returns:
attrs (dict) – Dictionary of attributes for requested dset
dtype (str) – Data type for requested dset. Defaults to float32
- classmethod get_node_cmd(config)#
Get a CLI call to collect data.
- Parameters:
config (dict) – sup3r collection config with all necessary args and kwargs to run data collection.
- static get_time_dim_name(filepath)#
Get the name of the time dimension in the given file
- Parameters:
filepath (str) – Path to the file
- Returns:
time_key (str) – Name of the time dimension in the given file
- classmethod write_data(out_file, dsets, time_index, data_list, meta, global_attrs=None)#
Write list of datasets to out_file.
- Parameters:
out_file (str) – Pre-existing H5 file output path
dsets (list) – list of datasets to write to out_file
time_index (pd.DatetimeIndex()) – Pandas datetime index to use for file time_index.
data_list (list) – List of np.ndarray objects to write to out_file
meta (pd.DataFrame) – Full meta dataframe for the final output data.
global_attrs (dict) – Namespace of file-global attributes for the final output data.