reVX.hybrid_stats.temporal_agg.TemporalAgg

class TemporalAgg(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, local_time=False, **resample_kwargs)[source]

Bases: object

Class to temporally aggregate time-series data

Parameters:

src_fpath (str) – Path to source h5 file
dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.
freq (str, optional) – Aggregation frequency, by default ‘1d’
dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None
year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample

Methods

`aggregate`([method, max_workers, ...])	Aggregate desired datasets and write to disk
`run`(src_fpath, dst_fpath[, freq, dsets, ...])	Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method.

Attributes

dsets

Datasets to aggregate

property dsets

Datasets to aggregate

Returns:: list

aggregate(method='mean', max_workers=None, chunks_per_worker=5)[source]

Aggregate desired datasets and write to disk

Parameters:

method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5

classmethod run(src_fpath, dst_fpath, freq='1d', dsets=None, year=None, method='mean', max_workers=None, chunks_per_worker=5, local_time=False, **resample_kwargs)[source]

Temporally aggregate the desired datasets in the src .h5 file to the given frequency using the given method. Save the aggregated datasets to the dst .h5 file.

Parameters:

src_fpath (str) – Path to source h5 file
dst_fpath (str) – Path to destination h5 file to save aggregated datasets to.
freq (str, optional) – Aggregation frequency, by default ‘1d’
dsets (list, optional) – Datasets to aggregate, if None aggregate all datasets in src_fpath, by default None
year (str | int, optional) – Year to extract time-index and datasets for, needed if running on a multi-year file, by default None
method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample