flasc.data_processing.dataframe_manipulations#
Module containing methods for FLASC dataframe manipulations.
Functions
Drop all-nan rows. |
|
Find and fill data gap and mark as missing data with NaN. |
|
Reduce dataframe precision. |
|
This function sorts the dataframe and finds rows with equal time index. |
|
Sort dataframe and fill duplicates. |
|
Filter a dataframe by turbulence intensity range. |
|
Filter a dataframe by wind direction range. |
|
Filter a dataframe by wind speed range. |
|
Get the mean of a column for a list of turbines. |
|
Get the number of turbines in a dataframe. |
|
Determine night or day in dataframe. |
|
Plot sun altitude with day-night color differentiation. |
|
Add pow_ref column using N-nearest upstream turbines. |
|
Add power reference column by list of turbines. |
|
Add pow_ref column using upstream turbines. |
|
Add pow_ref column using upstream turbines within a radius. |
|
Add TI column using all turbines. |
|
Add TI column by list of turbines. |
|
Add TI column using upstream turbines. |
|
Add TI column by upstream turbines within a radius. |
|
Add a wind direction column using all turbines. |
|
Add wind direction column by turbines in radius. |
|
Add WD column by list of turbines. |
|
Add ws column by all turbines. |
|
Add wind speed column by N closest upstream turbines. |
|
Add ws column by list of turbines. |
|
Add wind speed column using upstream turbines. |
|
Add wind speed column using in-radius upstream turbines. |
- flasc.data_processing.dataframe_manipulations.filter_df_by_ws(df: DataFrame | FlascDataFrame, ws_range: List[float]) DataFrame | FlascDataFrame [source]#
Filter a dataframe by wind speed range.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements.
ws_range ([float, float]) -- Wind speed range [lower bound, upper bound].
- Returns:
Filtered dataframe.
- Return type:
pd.DataFrame | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.filter_df_by_wd(df: DataFrame | FlascDataFrame, wd_range: List[float]) DataFrame | FlascDataFrame [source]#
Filter a dataframe by wind direction range.
- Parameters:
df (pd.DataFrame | FlascDataframe) -- Dataframe with measurements.
wd_range ([float, float]) -- Wind direction range [lower bound, upper bound].
- Returns:
Filtered dataframe.
- Return type:
pd.DataFrame | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.filter_df_by_ti(df: DataFrame | FlascDataFrame, ti_range: List[float]) DataFrame | FlascDataFrame [source]#
Filter a dataframe by turbulence intensity range.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements.
ti_range ([float, float]) -- Turbulence intensity range [lower bound, upper bound].
- Returns:
Filtered dataframe.
- Return type:
pd.DataFrame
- flasc.data_processing.dataframe_manipulations.get_num_turbines(df: DataFrame | FlascDataFrame) int [source]#
Get the number of turbines in a dataframe.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with turbine data
- Returns:
Number of turbines in the dataframe
- Return type:
int
- flasc.data_processing.dataframe_manipulations.get_column_mean(df: DataFrame | FlascDataFrame, col_prefix: str = 'pow', turbine_list: List[int] | ndarray | None = None, circular_mean: bool = False) ndarray [source]#
Get the mean of a column for a list of turbines.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements.
col_prefix (str, optional) -- Column prefix to use. Defaults to "pow".
turbine_list ([list, array], optional) -- List of turbine numbers to use. If None, all turbines are used. Defaults to None.
circular_mean (bool, optional) -- Use circular mean. Defaults to False.
- Returns:
Mean of the column for the specified turbines.
- Return type:
np.array
- flasc.data_processing.dataframe_manipulations._set_col_by_turbines(col_out, col_prefix, df, turbine_numbers, circular_mean)[source]#
- flasc.data_processing.dataframe_manipulations._set_col_by_n_closest_upstream_turbines(col_out, col_prefix, df, N, df_upstream, circular_mean, turb_no, x_turbs, y_turbs, exclude_turbs=[])[source]#
- flasc.data_processing.dataframe_manipulations._set_col_by_upstream_turbines(col_out, col_prefix, df, df_upstream, circular_mean, exclude_turbs=[])[source]#
- flasc.data_processing.dataframe_manipulations._set_col_by_radius_from_turbine(col_out, col_prefix, df, turb_no, x_turbs, y_turbs, max_radius, circular_mean, include_itself=True)[source]#
- flasc.data_processing.dataframe_manipulations._set_col_by_upstream_turbines_in_radius(col_out, col_prefix, df, df_upstream, turb_no, x_turbs, y_turbs, max_radius, circular_mean, include_itself=True)[source]#
Add a column of averaged upstream turbine values.
Add a column called [col_out] to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].
- Parameters:
col_out (str) -- Column name to be added to the dataframe.
col_prefix (str) -- Column prefix to use.
df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.
turb_no (int) -- Turbine number from which the radius should be
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.
circular_mean (bool) -- Use circular mean. Defaults to False.
include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called [col_ref].
- Return type:
df (pd.Dataframe)
- flasc.data_processing.dataframe_manipulations.set_wd_by_turbines(df: DataFrame | FlascDataFrame, turbine_numbers: List[int]) DataFrame | FlascDataFrame [source]#
Add WD column by list of turbines.
Add a column called 'wd' in your dataframe with value equal to the circular-averaged wind direction measurements of all the turbines in turbine_numbers.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'wd'.
- Return type:
df (pd.DataFrame | FlascDataFrame)
- flasc.data_processing.dataframe_manipulations.set_wd_by_all_turbines(df: DataFrame | FlascDataFrame) DataFrame | FlascDataFrame [source]#
Add a wind direction column using all turbines.
Add a column called 'wd' in your dataframe with value equal to the circular-averaged wind direction measurements of all turbines.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'wd'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_wd_by_radius_from_turbine(df: DataFrame | FlascDataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], max_radius: float, include_itself: bool = True) DataFrame | FlascDataFrame [source]#
Add wind direction column by turbines in radius.
Add a column called 'wd' to your dataframe, which is the mean of the columns wd_%03d for turbines that are within radius [max_radius] of the turbine of interest [turb_no].
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turb_no (int) -- Turbine number from which the radius should be calculated.
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.
include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'wd'.
- Return type:
pd.DataFrame | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ws_by_turbines(df: DataFrame | FlascDataFrame, turbine_numbers: List[int]) DataFrame | FlascDataFrame [source]#
Add ws column by list of turbines.
Add a column called 'ws' in your dataframe with value equal to the mean wind speed measurements of all the turbines in turbine_numbers.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe
wd_%03d (typically consists of)
ws_%03d
ti_%03d
pow_%03d
and
measurements. (potentially additional)
turbine_numbers ([list, array]) -- List of turbine numbers that
average. (should be used to calculate the column)
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called 'ws'.
- Return type:
pd.DataFrame | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ws_by_all_turbines(df: DataFrame | FlascDataFrame) DataFrame | FlascDataFrame [source]#
Add ws column by all turbines.
Add a column called 'ws' in your dataframe with value equal to the circular-averaged wind direction measurements of all turbines.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'ws'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ws_by_upstream_turbines(df: DataFrame | FlascDataFrame, df_upstream, exclude_turbs=[]) DataFrame | FlascDataFrame [source]#
Add wind speed column using upstream turbines.
Add a column called 'ws' in your dataframe with value equal to the averaged wind speed measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). exclude_turbs ([list, array]): array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'ws'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ws_by_upstream_turbines_in_radius(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], max_radius: float, include_itself: bool = True) DataFrame | FlascDataFrame [source]#
Add wind speed column using in-radius upstream turbines.
Add a column called 'ws' to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.
turb_no (int) -- Turbine number from which the radius should be
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.
include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called 'ws'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ws_by_n_closest_upstream_turbines(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], exclude_turbs: List[int] = [], N: int = 5) DataFrame | FlascDataFrame [source]#
Add wind speed column by N closest upstream turbines.
Add a column called 'ws' to your dataframe, which is the mean of the columns ws_%03d for the N closest turbines that are upstream of the turbine of interest [turb_no].
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.
turb_no (int) -- Turbine number from which the radius should be
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
N (int) -- Number of closest turbines to consider for the calculation
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called 'pow_ref'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ti_by_turbines(df: DataFrame | FlascDataFrame, turbine_numbers: List[int]) DataFrame | FlascDataFrame [source]#
Add TI column by list of turbines.
Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines listed in turbine_numbers.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'ti'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ti_by_all_turbines(df: DataFrame | FlascDataFrame) DataFrame | FlascDataFrame [source]#
Add TI column using all turbines.
Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all turbines.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called 'ti'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ti_by_upstream_turbines(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, exclude_turbs: List[int] = []) DataFrame | FlascDataFrame [source]#
Add TI column using upstream turbines.
Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). exclude_turbs ([list, array]): array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'ti'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_ti_by_upstream_turbines_in_radius(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], max_radius: float, include_itself: bool = True) DataFrame | FlascDataFrame [source]#
Add TI column by upstream turbines within a radius.
Add a column called 'ti' to your dataframe, which is the mean of the columns ti_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.
turb_no (int) -- Turbine number from which the radius should be
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.
include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.
- Returns:
Dataframe which equals the inserted dataframe plus the additional column called 'ti'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_pow_ref_by_turbines(df: DataFrame | FlascDataFrame, turbine_numbers: List[int]) DataFrame | FlascDataFrame [source]#
Add power reference column by list of turbines.
Add a column called 'pow_ref' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines listed in turbine_numbers.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'ti'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_pow_ref_by_upstream_turbines(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, exclude_turbs: List[int] = []) DataFrame | FlascDataFrame [source]#
Add pow_ref column using upstream turbines.
Add a column called 'pow_ref' in your dataframe with value equal to the averaged power measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...).
exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'pow_ref'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_pow_ref_by_upstream_turbines_in_radius(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], max_radius: float, include_itself: bool = False) DataFrame | FlascDataFrame [source]#
Add pow_ref column using upstream turbines within a radius.
Add a column called 'pow_ref' to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.
turb_no (int) -- Turbine number from which the radius should be
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.
include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'pow_ref'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.set_pow_ref_by_n_closest_upstream_turbines(df: DataFrame | FlascDataFrame, df_upstream: DataFrame, turb_no: int, x_turbs: List[float], y_turbs: List[float], exclude_turbs: bool = [], N: int = 5) DataFrame | FlascDataFrame [source]#
Add pow_ref column using N-nearest upstream turbines.
Add a column called 'pow_ref' to your dataframe, which is the mean of the columns pow_%03d for the N closest turbines that are upstream of the turbine of interest [turb_no].
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.
df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...).
turb_no (int) -- Turbine number from which the radius should be calculated.
x_turbs ([list, array]) -- Array containing x locations of turbines.
y_turbs ([list, array]) -- Array containing y locations of turbines.
exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.
N (int) -- Number of closest turbines to consider for the calculation of the averaged column quantity. Defaults to 5.
- Returns:
- Dataframe which equals the inserted dataframe
plus the additional column called 'pow_ref'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.df_reduce_precision(df_in: DataFrame | FlascDataFrame, verbose: bool = False, allow_convert_to_integer: bool = True) DataFrame | FlascDataFrame [source]#
Reduce dataframe precision.
Reduce the precision in dataframes from float64 to float32, or possibly even further to int32, int16, int8 or even bool. This operation typically reduces the size of the dataframe by a factor 2 without any real loss in precision. This can make particular operations and data storage much more efficient. This can also bring about speed-ups doing calculations with these variables.
- Parameters:
df_in (pd.Dataframe | FlascDataFrame) -- Dataframe that needs to be reduced.
verbose (bool, optional) -- Print progress. Defaults to False.
allow_convert_to_integer (bool, optional) -- Allow reduction to integer type if possible. Defaults to True.
- Returns:
Reduced dataframe
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.df_drop_nan_rows(df: DataFrame | FlascDataFrame, verbose: bool = False) DataFrame | FlascDataFrame [source]#
Drop all-nan rows.
Remove entries in dataframe where all rows (besides 'time') have nan values.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Input pandas dataframe
verbose (bool, optional) -- Print progress. Defaults to False.
- Returns:
Dataframe with all-nan rows removed
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.df_find_and_fill_data_gaps_with_missing(df: DataFrame | FlascDataFrame, missing_data_buffer: float = 5.0) DataFrame | FlascDataFrame [source]#
Find and fill data gap and mark as missing data with NaN.
- This function takes a pd.DataFrame object and look for large jumps in
the 'time' column. Rather than simply interpolating these values using a ZOH, this rather indicates that measurements are missing. Hence, this function finds these time gaps and inserts an additional row extra 1 second after the start of the time gap with all 'nan' values. This way, the data gap becomes populated with 'nan' values and the data will be ignored in any further analysis.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- Merged dataframe for all imported files
missing_data_buffer (float, optional) -- If the time gaps are equal or larger than this limit [s], then it will consider the data as corrupted or missing. Defaults to 10.
- Returns:
- The postprocessed dataframe where all data
within large time gaps hold value 'missing'.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.df_sort_and_find_duplicates(df: DataFrame | FlascDataFrame)[source]#
This function sorts the dataframe and finds rows with equal time index.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- An (unsorted) dataframe
- Returns:
Dataframe sorted by time duplicate_entries_idx ([list of int]): list with indices of the former
of two duplicate rows. The indices correspond to the time-sorted df.
- Return type:
pd.Dataframe | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.is_day_or_night(df: DataFrame | FlascDataFrame, latitude: float, longitude: float, sunrise_altitude: float = 0, sunset_altitude: float = 0, lag_hours: float = 0, datetime_column: str = 'time') DataFrame | FlascDataFrame [source]#
Determine night or day in dataframe.
Determine whether it's day or night for a given set of coordinates and UTC timestamp in a DataFrame.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- A Pandas DataFrame containing the time in UTC and other relevant data.
latitude (float) -- The latitude of the location for which to determine day or night.
longitude (float) -- The longitude of the location for which to determine day or night.
sunrise_altitude (float) -- The altitude of the sun to denote that sunrise has occurred [degress]
sunset_altitude (float) -- The altitude of the sun to denote that sunset has occurred [degress]
lag_hours (float, optional) -- The number of hours to lag behind the timestamp for the daylight
0. (determination. Default is)
datetime_column (str, optional) -- The name of the DataFrame column containing the timestamp in UTC. Default is 'time'.
- Returns:
- The input DataFrame with two additional
columns: 'sun_altitude' (the sun's altitude at the given timestamp) and 'is_day' (a boolean indicating whether it's daytime at the given timestamp).
- Return type:
pd.DataFrame | FlascDataFrame
- flasc.data_processing.dataframe_manipulations.plot_sun_altitude_with_day_night_color(df: DataFrame | FlascDataFrame, ax: axis | None = None) axis [source]#
Plot sun altitude with day-night color differentiation.
This function creates a plot of Sun Altitude over time, distinguishing between day and night periods with different background colors. The input DataFrame 'df' should contain time and sun_altitude columns, as well as a boolean 'is_day' column to indicate day and night periods.
- Parameters:
df (pd.DataFrame | FlascDataFrame) -- A DataFrame containing time, sun_altitude, and is_day columns.
ax (plt.axis, optional) -- An optional Matplotlib axis to use for the plot. If not provided, a new axis will be created.
- Returns:
The Matplotlib axis plotted on.
- Return type:
ax (plt.axis)
- flasc.data_processing.dataframe_manipulations.df_sort_and_fix_duplicates(df: DataFrame | FlascDataFrame) DataFrame | FlascDataFrame [source]#
Sort dataframe and fill duplicates.
This function sorts the dataframe and addresses duplicate rows (i.e., rows in which the time index is equal). It does this by merging the two rows, replacing the 'nan' entries of one row with the non-'nan' entries of the other row. If someone both rows have different values for the same column, then an exception is thrown.
- Parameters:
df (pd.Dataframe | FlascDataFrame) -- An (unsorted) dataframe
- Returns:
A time-sorted Dataframe in which its duplicate rows have been merged.
- Return type:
df (pd.Dataframe | FlascDataFrame)