nsrdb.data_model.clouds.CloudVarSingleH5

class CloudVarSingleH5(fpath, pre_proc_flag=True, index=None, dsets=('cloud_type', 'cld_opd_dcomp', 'cld_reff_dcomp', 'cld_press_acha'), parallax_correct=True, solar_shading=True, remap_pc=True)[source]

Bases: CloudVarSingle

Framework for .h5 single-file/single-timestep cloud data extraction.

Parameters:

fpath (str) – Full filepath for the cloud data at a single timestep.
pre_proc_flag (bool) – Flag to pre-process and sparsify data.
index (np.ndarray) – Nearest neighbor results array to extract a subset of the data.
dsets (tuple | list) – Source datasets to extract.
parallax_correct (bool) – Flag to adjust cloud coordinates so clouds are overhead their coordinates and not at the apparent location from the sensor.
solar_shading (bool) – Flag to adjust cloud coordinates so clouds are assigned to the coordinates they shade.
remap_pc (bool) – Flag to remap the parallax-corrected and solar-shading-corrected data back onto the original semi-regular GOES coordinates

Methods

`clean_attrs`()	Try to clean unnecessary object attributes to reduce memory usage
`correct_coordinates`(fpath, grid[, ...])	Adjust grid lat/lon values based on solar position
`get_dset`(dset)	Get a single dataset from the source cloud data file.
`make_sparse`(grid, raw_grid)	Make the cloud grid sparse by removing NaN coordinates.
`pre_process`(dset, data, attrs[, ...])	Pre-process cloud data by filling missing values and unscaling.
`remap_pc_coords`()	Remap the parallax/shading corrected coordinates back onto the original "raw" coordinate system and set internal variables to do the same for the cloud data when processed through get_dset() and self.source_data
`remap_pc_data`(data)	Perform remapping of parallax/shading corrected data onto the raw/original cloud coordinate system including overlaying cloud shadow data over clear data.

Attributes

`GRID_LABELS`
`dsets`	Get a list of the available datasets in the cloud file.
`fpath`	Get the full file path for this cloud data timestep.
`grid`	Return the cloud data grid for the current timestep.
`source_data`	Get multiple-variable data dictionary from the cloud data file.
`tree`	Get the KDTree for the cloud data coordinates eg.

property dsets: Get a list of the available datasets in the cloud file.

classmethod correct_coordinates(fpath, grid, parallax_correct=True, solar_shading=True)[source]

Adjust grid lat/lon values based on solar position

Parameters:

fpath (str) – Filepath to cloud h5 file containing required datasets for solpo coodinate adjustment.
grid (dict) – Dictionary with latitude and longitude keys and corresponding numpy array values.
parallax_correct (bool) – Flag to adjust cloud coordinates so clouds are overhead their coordinates and not at the apparent location from the sensor.
solar_shading (bool) – Flag to adjust cloud coordinates so clouds are assigned to the coordiantes they shade.

Returns:

grid (dict) – Dictionary with latitude and longitude keys and corresponding numpy array values. Coordinates are adjusted for solar position so that clouds are linked to the coordinate that they are shading.

static pre_process(dset, data, attrs, sparse_mask=None, index=None)[source]

Pre-process cloud data by filling missing values and unscaling.

Pre-processing steps (different for .nc vs .h5):

flatten (ravel)
convert to float32 (unless dset == cloud_type)
convert filled values to NaN (unless dset == cloud_type)
apply scale factor (multiply)
apply add offset (addition)
sparsify
extract only data at index

Parameters:

dset (str) – Dataset name.
data (np.ndarray) – Raw data extracted from the dataset in the cloud data source file.
attrs (dict) – Dataset attributes from the dataset in the cloud data source file.
sparse_mask (NoneType | pd.Series) – Optional boolean mask to apply to the data to sparsify.
index (np.ndarray) – Nearest neighbor results array to extract a subset of the data.

Returns:

data (np.ndarray) – Pre-processed data.

static make_sparse(grid, raw_grid)[source]

Make the cloud grid sparse by removing NaN coordinates.

Parameters:

grid (pd.DataFrame) – GOES source coordinates (labels: [‘latitude’, ‘longitude’]).
raw_grid (pd.DataFrame | None) – Raw GOES source coordinates before parallax correction / solar shading or None if those algorithms are disabled.

Returns:

grid (pd.DataFrame) – Sparse GOES source coordinates with all NaN rows removed.
raw_grid (pd.DataFrame | None) – Raw GOES source coordinates before parallax correction / solar shading or None if those algorithms are disabled.
mask (pd.Series) – Boolean series; the mask to extract sparse data.

get_dset(dset)[source]

Get a single dataset from the source cloud data file.

Parameters:: dset (str) – Variable dataset name to retrieve from the cloud file.
Returns:: dset (np.ndarray) – 1D array of flattened data that should match the self.grid meta data.

clean_attrs(): Try to clean unnecessary object attributes to reduce memory usage

property fpath: Get the full file path for this cloud data timestep.

property grid

Return the cloud data grid for the current timestep.

Returns:: self._grid (pd.DataFrame | None) – GOES source coordinates (labels: [‘latitude’, ‘longitude’]). None if bad dataset

remap_pc_coords(): Remap the parallax/shading corrected coordinates back onto the original “raw” coordinate system and set internal variables to do the same for the cloud data when processed through get_dset() and self.source_data

remap_pc_data(data)

Perform remapping of parallax/shading corrected data onto the raw/original cloud coordinate system including overlaying cloud shadow data over clear data.

Parameters:: data (np.ndarray) – 1D array of flattened data based on the original coordinate system ordering from the cloud file, possibly with sparsification due to pre processing of nan data/coordinates.
Returns:: data (np.ndarray) – 1D array of flattened data that corresponds to the original coordinate system with no parallax/shading corrections but has been re-arranged such that it reflects these coordinate adjustments.

property source_data

Get multiple-variable data dictionary from the cloud data file.

Returns:: data (dict) – Dictionary of multiple cloud datasets. Keys are the cloud dataset names. Values are 1D (flattened/raveled) arrays of data.

property tree

Get the KDTree for the cloud data coordinates eg. cKDTree(self.grid[[‘latitude’, ‘longitude’]])

Returns:: cKDTree