How to to monitor Spark resource utilization

sparkctl integrates with rmon to monitor CPU, memory, disk, and network utilization of Spark.

This page shows how to enable it.

$ sparkctl configure --resource-monitor

By default, this enables monitoring of all resource types at a 5-second interval. Edit the resulting config.json to customize resource types or interval.

Start the Spark cluster as normal.

$ sparkctl start

sparkctl will enable resource monitoring with rmon in the background on all compute nodes. Whenever you stop the cluster after running your jobs:

$ sparkctl stop

sparkctl will stop rmon on all compute nodes, collect a summary of utilization stats, and produce interactive HTML plots. Those will be in ./stats-output/html by default.

Warning

If you don’t run sparkctl stop, rmon will keep running on all compute nodes indefinitely.

Managed execution

sparkctl offers a managed execution mode to help prevent cases where you forget to shut down Spark workers and rmon.

Add the --wait option to the start command as follows:

$ sparkctl start --wait

sparkctl will print Press Ctrl-C to shut down all Spark processes. to the console and then go to sleep without returning control to the shell.

Connect to the head node of the allocation and run your Spark jobs. When you’re done, press Ctrl-c in the original terminal to shut everything down.