nsrdb.aggregation.aggregation.Aggregation

class Aggregation(var, data_fpath, nn, w, final_ti)[source]

Bases: object

Framework for performing spatiotemporal aggregation.

Parameters:

var (str) – Variable (dataset) name being aggregated.
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Methods

`cloud_property`(var, data_fpath, nn, w, ...)	Run cloud property aggregation, returning the mean cloud property only for timesteps that match the most common (mode) cloud type.
`cloud_property_avg`(cprop_source, ...)	Run cloud property aggregation based on output cloud type.
`cloud_type`(var, data_fpath, nn, w, final_ti)	Run cloud type aggregation, returning the most common cloud type.
`cloud_type_mode`(data, w)	Get the mode of a 2D cloud type array using a rolling time window.
`dhi`(var, i, fout)	Calculate the aggregated DHI from an aggregated output file.
`fill_flag`(var, data_fpath, nn, w, final_ti)	Run fill flag aggregation, returning the percentage of timesteps that were filled.
`format_out_arr`(arr)	Format the output array (round and flatten).
`mean`(var, data_fpath, nn, w, final_ti)	Run agg using a spatial average and temporal moving window average.
`point`(var, data_fpath, nn, w, final_ti)	Run agg by selecting just the closest site and timestep.
`reduce_timeseries`(arr)	Reduce a high res timeseries to a coarse timeseries.
`spatial_avg`(data)	Average the source data across the spatial extent.
`spatial_sum`(data)	Sum the source data across the spatial extent.
`time_avg`(inp)	Calculate the rolling time average for an input array or df.
`time_sum`(inp)	Calculate the rolling sum for an input array or df.

Attributes

`data`	Get the timeseries data for the specified var and sites.
`source_time_index`	Get the time index of the source data.

property source_time_index

Get the time index of the source data.

Returns:: time_index (pd.Datetimeindex) – Datetimeindex of the source dataset.

property data

Get the timeseries data for the specified var and sites.

Returns:: _data (np.ndarray) – Unscaled float data array with shape (ti, nn) where ti is the native time index length and nn is the number of neighbors in the self.nn attr.

static spatial_avg(data)[source]

Average the source data across the spatial extent.

Returns:: data (np.ndarray) – Unscaled float data array with shape (ti, ) where ti is the native time index length the data was averaged accross all nn neighbors.

static spatial_sum(data)[source]

Sum the source data across the spatial extent.

Returns:: data (np.ndarray) – Unscaled float data array with shape (ti, ) where ti is the native time index length the data was summed accross all nn neighbors.

time_avg(inp)[source]

Calculate the rolling time average for an input array or df.

Parameters:: inp (np.ndarray | pd.DataFrame) – Input array/df with data to average.
Returns:: out (np.ndarray | pd.DataFrame) – Array or dataframe with same size as input and each value is a moving average.

time_sum(inp)[source]

Calculate the rolling sum for an input array or df.

Parameters:: inp (np.ndarray | pd.DataFrame) – Input array/df with data to sum.
Returns:: out (np.ndarray | pd.DataFrame) – Array or dataframe with same size as input and each value is a moving sum.

static cloud_type_mode(data, w)[source]

Get the mode of a 2D cloud type array using a rolling time window.

Parameters:

data (np.ndarray) – 2D array of integer cloud types.
w (int) – Temporal window over which to take the mode.

Returns:

data (np.ndarray) – Mode of cloud type.

reduce_timeseries(arr)[source]

Reduce a high res timeseries to a coarse timeseries.

Parameters:: arr (np.ndarray) – 2D numpy array
Returns:: arr (np.ndarray) – Shortened 2D numpy array with length equal to the final ti.

static format_out_arr(arr)[source]: Format the output array (round and flatten).

classmethod point(var, data_fpath, nn, w, final_ti)[source]

Run agg by selecting just the closest site and timestep.

Parameters:

var (str) – Variable (dataset) name being aggregated.
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

classmethod dhi(var, i, fout)[source]

Calculate the aggregated DHI from an aggregated output file.

Parameters:

var (str) – Variable name, either “dhi” or “clearsky_dhi”.
i (int) – Site index in fout.
fout (str) – Filepath to the output file containing aggregated GHI, DNI, and SZA to calculate aggregated DHI.

Returns:

dhi (np.ndarray) – DHI calcualted from vars in fout.

classmethod fill_flag(var, data_fpath, nn, w, final_ti)[source]

Run fill flag aggregation, returning the percentage of timesteps that were filled.

Parameters:

var (str) – Variable (dataset) name being aggregated (fill_flag).
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

classmethod cloud_type(var, data_fpath, nn, w, final_ti)[source]

Run cloud type aggregation, returning the most common cloud type.

Parameters:

var (str) – Variable (dataset) name being aggregated (cloud_type).
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.

static cloud_property_avg(cprop_source, ctype_source, ctype_out_full, w)[source]

Run cloud property aggregation based on output cloud type.

Parameters:

cprop_source (np.ndarray) – Source (full resolution) cloud property data.
ctype_source (np.ndarray) – Source (full resolution) cloud type data.
ctype_out_full (np.ndarray) – Output (reduced resolution) cloud type data, interpolated to the same length as the source resolution.
w (int) – Window size.

Returns:

cprop_out (np.ndarray) – Average cloud property data in the window surrounding each timestep masked by cloud type output == cloud type source. Shape is same as ctype_out_full.

classmethod cloud_property(var, data_fpath, nn, w, final_ti, gid, fout)[source]

Run cloud property aggregation, returning the mean cloud property only for timesteps that match the most common (mode) cloud type.

Parameters:

var (str) – Variable (dataset) name being aggregated (cloud_type).
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).
gid (int) – Site index in fout.
fout (str) – Filepath to the output file containing aggregated cloud type.

Returns:

data (np.ndarray) – Average cloud property data in the window surrounding each timestep masked by cloud type output == cloud type source. Shape is same as ctype_out_full. Array is (n, ) and is unscaled and rounded data from the nn with time series matching final_ti.

classmethod mean(var, data_fpath, nn, w, final_ti)[source]

Run agg using a spatial average and temporal moving window average.

Parameters:

var (str) – Variable (dataset) name being aggregated.
data_fpath (str) – Filepath to h5 file containing source var data.
nn (np.ndarray) – 1D array of site (column) indices in data_fpath to aggregate.
w (int) – Window size for temporal aggregation.
final_ti (pd.DateTimeIndex) – Final datetime index (used to ensure the aggregated profile has correct length).

Returns:

data (np.ndarray) – (n, ) array unscaled and rounded data from the nn with time series matching final_ti.