nsrdb.utilities.sky_class.SkyClass

class SkyClass(fp_surf, fp_nsrdb, nsrdb_gid, clearsky_ratio=0.9, clear_time_frac=0.8, cloudy_time_frac=0.2, window_minutes=61, min_irradiance=0, sza_lim=89)[source]

Bases: object

Utility class for retrieving SURFRAD validation data alongside NSRDB data, determining the sky class by comparison to predicted clearsky irradiance, and providing data in a ready-to-validate dataframe.

Parameters:
  • fp_surf (str) – Filepath to surfrad h5 file.

  • fp_nsrdb (str) – Filepath to NSRDB file. can be a MultiFileResource path with: /dir/prefix*suffix.h5

  • nsrdb_gid (int) – GID (meta data index) for the site of interest in the fp_nsrdb file that matches the fp_surf file.

  • clearsky_ratio (float) – Clearsky ratio (ground measurement / clearsky irradiance) above which a timestep is considered clear

  • clear_time_frac (float) – Fraction of clear timesteps in an averaging window above which the whole window is considered clear. Between clear_time_frac and cloudy_time_frac is considered broken clouds.

  • cloudy_time_frac (float) – Fraction of cloudy timesteps in an averaging window below which the whole window is considered cloudy. Between clear_time_frac and cloudy_time_frac is considered broken clouds.

  • window_minutes (int) – Minutes that the moving average of the sky classification will be over. This will be calculated while considering the source time resolution of the SURFRAD measurements.

  • min_irradiance (float | int) – Minimum irradiance value, timesteps with either ground measured or NSRDB irradiance less than this value will be classified as missing.

  • sza_lim (int | float) – Maximum solar zenith angle, timesteps with sza > sza_lim will be classified as missing

Methods

add_validation_data(df)

Add NSRDB and SURFRAD ghi and dni data to a DataFrame.

calculate_sky_class(df)

Calculate the sky class (clear, cloudy, broken, missing) from the comparison df.

get_comparison_df()

Get a timeseries dataframe comparing the ground-measured GHI vs.

get_rest_inputs()

Get a dataframe of NSRDB variables required to run REST2.

run(fp_surf, fp_nsrdb, nsrdb_gid[, ...])

run_rest(rest_inputs)

Run REST2 using a dataframe of input data and return clearsky GHI.

Attributes

ALIASES

REST_VARS

nsrdb

Get the initialized Resource or MultiFileResource handler

nsrdb_time_index

Get the datetimeindex from the nsrdb h5 file

surf_ghi

Get the surfrad ghi data with negative values as NaN

surf_time_index

Get the datetimeindex from the surfrad h5 file

surfrad

Get the initialized Surfrad handler

property surfrad

Get the initialized Surfrad handler

property surf_time_index

Get the datetimeindex from the surfrad h5 file

property nsrdb

Get the initialized Resource or MultiFileResource handler

property nsrdb_time_index

Get the datetimeindex from the nsrdb h5 file

property surf_ghi

Get the surfrad ghi data with negative values as NaN

get_rest_inputs()[source]

Get a dataframe of NSRDB variables required to run REST2.

Returns:

rest_inputs (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for each input variable required by REST and some extras (e.g. air_temperature).

run_rest(rest_inputs)[source]

Run REST2 using a dataframe of input data and return clearsky GHI.

Parameters:

rest_inputs (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for each input variable required by REST and some extras (e.g. air_temperature).

Returns:

ghi (np.ndarray) – 2D (time, 1) array of clearsky GHI values calculated by REST2.

get_comparison_df()[source]

Get a timeseries dataframe comparing the ground-measured GHI vs. the clearsky (REST2) GHI at the ground-measurement time_index.

Returns:

df (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for ghi_rest (clearsky), ghi_ground (surfrad), and “clear”, where clear is a boolean (1 for clear) with float dtype so it can have NaN values where ground measurements are missing.

calculate_sky_class(df)[source]

Calculate the sky class (clear, cloudy, broken, missing) from the comparison df.

Parameters:

df (pd.DataFrame) – Timeseries data with time_index from the surfrad data (might be missing time steps) and data columns for ghi_rest (clearsky), ghi_ground (surfrad), and “clear”, where clear is a boolean (1 for clear) with float dtype so it can have NaN values where ground measurements are missing.

Returns:

df (pd.DataFrame) – Same as input but with new column “sky_class” with values (clear, cloudy, broken, missing) calculated from the clear_time_frac and cloudy_time_frac inputs over a time window determined by the window_minutes inputs. Note that sky_class == missing means that it is night or there is missing ground measurement data and validation should not be performed with those timesteps.

add_validation_data(df)[source]

Add NSRDB and SURFRAD ghi and dni data to a DataFrame.

classmethod run(fp_surf, fp_nsrdb, nsrdb_gid, clearsky_ratio=0.9, clear_time_frac=0.8, cloudy_time_frac=0.2, window_minutes=61, min_irradiance=0, sza_lim=89)[source]
Parameters:
  • fp_surf (str) – Filepath to surfrad h5 file.

  • fp_nsrdb (str) – Filepath to NSRDB file. can be a MultiFileResource path with: /dir/prefix*suffix.h5

  • nsrdb_gid (int) – GID (meta data index) for the site of interest in the fp_nsrdb file that matches the fp_surf file.

  • clearsky_ratio (float) – Clearsky ratio (ground measurement / clearsky irradiance) above which a timestep is considered clear

  • clear_time_frac (float) – Fraction of clear timesteps in an averaging window above which the whole window is considered clear. Between clear_time_frac and cloudy_time_frac is considered broken clouds.

  • cloudy_time_frac (float) – Fraction of cloudy timesteps in an averaging window below which the whole window is considered cloudy. Between clear_time_frac and cloudy_time_frac is considered broken clouds.

  • window_minutes (int) – Minutes that the moving average of the sky classification will be over. This will be calculated while considering the source time resolution of the SURFRAD measurements.

  • min_irradiance (float | int) – Minimum irradiance value, timesteps with either ground measured or NSRDB irradiance less than this value will be classified as missing.

  • sza_lim (int | float) – Maximum solar zenith angle, timesteps with sza > sza_lim will be classified as missing

Returns:

df (pd.DataFrame) – Timeseries of validation data from fp_nsrdb and fp_surf including sky classification strings (clear, cloudy, broken, missing) with same datetimeindex as the nsrdb file. Note that sky_class == missing means that it is night or there is missing ground measurement data and validation should not be performed with those timesteps.