dwind.utils#

Modules

array

Provides a series of generic NumPy and Pandas utility functions.

hpc

Provides the live timing table functionalities for the Kestrel MultiProcess class.

loader

Provides the core data loading methods for importing scenario data from flat files or SQL.

Array#

Provides a series of generic NumPy and Pandas utility functions.

Functions

memory_downcaster(df)

Downcasts int and float columns to the lowest memory alternative possible.

split_by_index(arr, n_splits)

Split a DataFrame, Series, or array like with np.array_split, but only return the start and stop indices, rather than chunks.

dwind.utils.array.memory_downcaster(df)[source]#

Downcasts int and float columns to the lowest memory alternative possible. For integers this means converting to either signed or unsigned 8-, 16-, 32-, or 64-bit integers, and for floats, converting to np.float32.

Parameters:

df (pd.DataFrame | pd.Series) – DataFrame or Series to have its memory footprint reduced.

Returns:

Reduced footprint version of the passed df.

Return type:

pd.DataFrame | pd.Series

dwind.utils.array.split_by_index(arr, n_splits)[source]#

Split a DataFrame, Series, or array like with np.array_split, but only return the start and stop indices, rather than chunks. For Pandas objects, this are equivalent to arr.iloc[start: end] and for NumPy: arr[start: end]. Splits are done according to the 0th dimension.

Parameters:
  • arr (pd.DataFrame | pd.Series | np.ndarray) – The array, data frame, or series to split.

  • n_splits (int) – The number of near equal or equal splits.

Return type:

tuple[ndarray, ndarray]

Returns:

tuple[np.ndarray, np.ndarray]

HPC#

Provides the live timing table functionalities for the Kestrel MultiProcess class.

Functions

convert_seconds_for_print(time)

Convert number of seconds to number of hours, minutes, and seconds.

generate_run_status_table(job_status)

Generate the job status run time statistics table.

get_finished_run_status(jobs)

Extracts a dictionary of job_id and status from the sacct output for a single job or series of jobs.

update_status(job_status)

Get an updated status and timing statistics for all running jobs on the HPC.

dwind.utils.hpc.convert_seconds_for_print(time)[source]#

Convert number of seconds to number of hours, minutes, and seconds.

Return type:

str

dwind.utils.hpc.generate_run_status_table(job_status)[source]#

Generate the job status run time statistics table.

Parameters:

job_status (dict) – Dictionary of job id (primary key) with sub keys of “status”, “start_time” (initial or start of run status), “wait”, and “run”.

Returns:

rich.Table of human readable statistics. bool: True if all jobs are complete, otherwise False.

Return type:

Table

dwind.utils.hpc.get_finished_run_status(jobs)[source]#

Extracts a dictionary of job_id and status from the sacct output for a single job or series of jobs.

Parameters:

jobs (int | str | list[int | str]) – Single job ID or list of job IDs that have finished running.

Returns:

Dictionary of {job_id_1: status_1, …, job_id_N: status_N}.

Return type:

dict[str, str]

dwind.utils.hpc.update_status(job_status)[source]#

Get an updated status and timing statistics for all running jobs on the HPC.

Parameters:

job_status (dict) – Dictionary of job id (primary key) with sub keys of “status”, “start_time” (initial or start of run status), “wait”, and “run”.

Returns:

Dictionary of updated statuses and timing statistics for all current queued and

running jobs.

Return type:

dict

Loader#

Provides the core data loading methods for importing scenario data from flat files or SQL.

Functions

load_df(file_or_table, year[, sql_constructor])

Loads data from either a SQL table or file to a pandas DataFrame.

dwind.utils.loader.load_df(file_or_table, year, sql_constructor=None)[source]#

Loads data from either a SQL table or file to a pandas DataFrame.

Parameters:
  • file_or_table (str | Path) – File name or path object, or SQL table where the data are located.

  • year (dwind.config.Year, optional) – If used, only extracts the single year from

  • None. (a column called "year". Defaults to)

  • sql_constructor (str | None, optional) – The SQL engine constructor string. Required if extracting from SQL. Defaults to None.