reVX.hybrid_stats.temporal_agg.DatasetAgg
- class DatasetAgg(h5_fpath, dset, time_index=None, year=None, local_time=False)[source]
Bases:
object
Temporaly Aggregate Dataset
- Parameters:
h5_fpath (str) – Path to source h5 filepath
dset (str) – Dataset to aggregate
time_index (pandas.DatetimeIndex, optional) – Dataset datetime index, if None, extract from h5_fpath, by default None
year (str | int, optional) – Year to extract time-index for if running on a multi-year file, by default None
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
Methods
aggregate
([freq, method, max_workers, ...])Aggregate dataset to desired frequency using desired method
run
(h5_fpath, dset[, time_index, year, ...])Temporally aggregate dataset to given frequency using given method
- aggregate(freq='1d', method='mean', max_workers=None, chunks_per_worker=5, **resample_kwargs)[source]
Aggregate dataset to desired frequency using desired method
- Parameters:
freq (str, optional) – Aggregation frequency, by default ‘1d’
method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample
- Returns:
dset_agg (ndarray) – Aggregated dataset array
- classmethod run(h5_fpath, dset, time_index=None, year=None, freq='1d', method='mean', max_workers=None, chunks_per_worker=5, local_time=False, **resample_kwargs)[source]
Temporally aggregate dataset to given frequency using given method
- Parameters:
h5_fpath (str) – Path to source h5 filepath
dset (str) – Dataset to aggregate
time_index (pandas.DatetimeIndex, optional) – Dataset datetime index, if None, extract from h5_fpath, by default None
year (str | int, optional) – Year to extract time-index for if running on a multi-year file, by default None
freq (str, optional) – Aggregation frequency, by default ‘1d’
method (str, optional) – Aggregation method, either ‘mean’ or ‘sum’, by default ‘mean’
max_workers (None | int, optional) – Number of workers to use, if 1 run in serial, if None use all available cores, by default None
chunks_per_worker (int, optional) – Number of chunks to extract on each worker, by default 5
local_time (bool) – Flag to shift data to local time before aggregating temporal data. Default is to stay in UTC.
resample_kwargs (dict, optional) – Kwargs for pandas.DataFrame.resample
- Returns:
agg_data (ndarray) – Dataset aggregated do given frequency using given method