rex.utilities.execution.SmartParallelJob
- class SmartParallelJob(obj, execution_iter, n_workers=None, mem_util_lim=0.7)[source]
Bases:
object
Single node parallel compute manager with smart data flushing.
Single node parallel compute manager with smart data flushing.
- Parameters:
obj (object) – Python object that will be submitted to futures. Must have methods run(arg) and flush(). run(arg) must take the iteration result of execution_iter as the single positional argument. Additionally, the results of obj.run(arg) will be pa ssed to obj.out. obj.out will be passed None when the memory is to be cleared. It is advisable that obj.run() be a @staticmethod for dramatically faster submission in parallel.
execution_iter (iter) – Python iterator that controls the futures submitted in parallel.
n_workers (int) – Number of workers to use in parallel. None will use all available workers.
mem_util_lim (float) – Memory utilization limit (fractional). If the used memory divided by the total memory is greater than this value, the obj.out will be flushed and the local node memory will be cleared.
Methods
execute
(obj, execution_iter[, n_workers, ...])Execute the smart parallel run with data flushing.
flush
()Flush obj.out to disk, set obj.out=None, and garbage collect.
gather_and_flush
(i, futures[, force_flush])Wait on futures, potentially update obj.out and flush to disk.
run
(**kwargs)Run ParallelSmartJobs
Attributes
Get the iterator object that controls the parallel execution.
Get the memory utilization limit (fractional).
Get the number of workers in the local cluster.
Get the main python object that will be submitted to futures.
- property execution_iter
Get the iterator object that controls the parallel execution.
- Returns:
_execution_iter (iterable) – Iterable object that controls the processes of the parallel job.
- property mem_util_lim
Get the memory utilization limit (fractional).
- Returns:
_mem_util_lim (float) – Fractional memory utilization limit. If the used memory divided by the total memory is greater than this value, the obj.out will be flushed and the local node memory will be cleared.
- property n_workers
Get the number of workers in the local cluster.
- Returns:
_n_workers (int) – Number of workers. Default value is the number of CPU’s.
- property obj
Get the main python object that will be submitted to futures.
- Returns:
_obj (Object) – Python object that will be submitted to futures. Must have methods run(arg) and flush(). run(arg) must take the iteration result of execution_iter as the single positional argument. Additionally, the results of obj.run(arg) will be passed to obj.out. obj.out will be passed None when the memory is to be cleared. It is advisable that obj.run() be a @staticmethod for dramatically faster submission in parallel.
- gather_and_flush(i, futures, force_flush=False)[source]
Wait on futures, potentially update obj.out and flush to disk.
- Parameters:
i (int | str) – Iteration number (for logging purposes).
futures (list) – List of parallel future objects to wait on or gather.
force_flush (bool) – Option to force a disk flush. Useful for end-of-iteration. If this is False, will only flush to disk if the memory utilization exceeds the mem_util_lim.
- Returns:
futures (list) – List of parallel future objects. If the memory was flushed, this is a cleared list: futures.clear()
- run(**kwargs)[source]
Run ParallelSmartJobs
- Parameters:
kwargs (dict) – Keyword arguments to be passed to obj.run(). Makes it easier to have obj.run() as a @staticmethod.
- classmethod execute(obj, execution_iter, n_workers=None, mem_util_lim=0.7, **kwargs)[source]
Execute the smart parallel run with data flushing.
- Parameters:
obj (object) – Python object that will be submitted to futures. Must have methods run(arg) and flush(). run(arg) must take the iteration result of execution_iter as the single positional argument. Additionally, the results of obj.run(arg) will be passed to obj.out. obj.out will be passed None when the memory is to be cleared. It is advisable that obj.run() be a @staticmethod for dramatically faster submission in parallel.
execution_iter (iter) – Python iterator that controls the futures submitted in parallel.
n_workers (int) – Number of workers to scale the cluster to. None will use all available workers in a local cluster.
mem_util_lim (float) – Memory utilization limit (fractional). If the used memory divided by the total memory is greater than this value, the obj.out will be flushed and the local node memory will be cleared.
kwargs (dict) – Keyword arguments to be passed to obj.run(). Makes it easier to have obj.run() as a @staticmethod.