reVX.hybrid_stats.temporal_agg.TemporalAgg
- class TemporalAgg(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, local_time=False, **resample_kwargs)[source]
Bases:
object
Class to temporally aggregate time-series data
- Parameters:
src_fpath (str) – Path to source h5 file
dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.
freq (str, optional) – Aggregation frequency, by default ‘1d’
dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None
year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample
Methods
aggregate
([method, max_workers, ...])Aggregate desired datasets and write to disk
run
(src_fpath, dst_fpath[, freq, dsets, ...])Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method.
Attributes
Datasets to aggregate
- property dsets
Datasets to aggregate
- Returns:
list
- aggregate(method='mean', max_workers=None, chunks_per_worker=5)[source]
Aggregate desired datasets and write to disk
- Parameters:
method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5
- classmethod run(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, method='mean', max_workers=None, chunks_per_worker=5, local_time=False, **resample_kwargs)[source]
Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method. Save the aggregated datasets to the dst .h5 file.
- Parameters:
src_fpath (str) – Path to source h5 file
dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.
freq (str, optional) – Aggregation frequency, by default ‘1d’
dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None
year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None
method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample