Validating ML results using Tensorboard
Tensorboard provides visualization and tooling needed for machine learning, deep learning, and reinforcement learning experimentation:
- Tracking and visualizing metrics such as loss and accuracy.
- Visualizing the model graph (ops and layers).
- Viewing histograms of weights, biases, or other tensors as they change over time.
- Projecting embeddings to a lower dimensional space.
- Displaying images, text, and audio data.
- Profiling TensorFlow programs.
For RL it is useful to visualize metrics such as:
- Mean, min, and max reward values.
- Estimated Q-values.
- Algorithm-specific metrics (e.g. entropy for PPO).
To visualize results from Tensorboard, first
cd to the directory where your results reside. E.g., if you ran experiments using
ray, then do the following:
There are three main methods for activating Tensorboard:
- If you included Tensorboard installation in an Anaconda environment, simply activate it:
module purge conda activate <your_environment>
- You can also install Tensorboard in userspace using
pip install tensorboard --user
- Or, install using container images:
ml singularity-container singularity pull docker://tensorflow/tensorflow singularity run tensorflow_latest.sif
Then, initialize Tensorboard using a pre-specified port number of your choosing (e.g. 6006, 8008):
If everything works properly, terminal will show:
tensorboard --logdir=. --port 6006 --bind_all
Open a new Terminal tab and create a tunnel:
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all TensorBoard 2.5.0 at http://localhost:6006/ (Press CTRL+C to quit)
Finally, open the above localhost url (
ssh -NfL 6006:localhost:6006 $USER@el1.hpc.nrel.gov
http://localhost:6006/) in a browser, where all the aforementioned plots will be shown.