Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
mnist-job.sh		mnist-job.sh
mnist-train.py		mnist-train.py

README.md

`mnist-gpu-demo`: Training on Tensorflow MNist job on Slurm GPU Node

This repository contains the code and scripts required to train a simple MNist model for hand writing character recognition.

Repository Structure

tf.keras.datasets.mnist: dataset of labeled hand writing characters.
mnist-job.sh: Slurm script for training the BERT model to be run on the Yens.
mnist-train.py: Python script to train the model on Yen's GPU node.

Training Script Details

The training script train-finbert.py performs the following:

Defines a custom dataset to read in financial news texts and their labels.
Utilizes the BERT model from the transformers library for sentiment analysis.
Trains the model using PyTorch Lightning's Trainer on Yen's GPU.

Usage

Modify the Slurm script to include your email address. Slurm will report useful metrics via email such as queue time, runtime, CPU and RAM utilization and will alert you if the job has failed.

Submit the Slurm script to initiate the model training on gpu partition on the Yens:

$ sbatch mnist-job.sh

Monitor the training progress by checking the Slurm queue for your username:

$ squeue -u $USER
$ sacct -j 3

Monitoring Training

Instructions for monitoring GPU utilization and other training metrics.

login to compute nodes

$ nvtop

Output

After the training is complete, check the output file finBERT-train.out for training and evaluation metrics:

$ cat mnist-train.output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spack_conda_tensorflow_mnist

spack_conda_tensorflow_mnist

README.md

`mnist-gpu-demo`: Training on Tensorflow MNist job on Slurm GPU Node

Table of Contents

Repository Structure

Training Script Details

Usage

Monitoring Training

Output

Credits

Files

spack_conda_tensorflow_mnist

Directory actions

More options

Directory actions

More options

Latest commit

History

spack_conda_tensorflow_mnist

Folders and files

parent directory

README.md

mnist-gpu-demo: Training on Tensorflow MNist job on Slurm GPU Node

Table of Contents

Repository Structure

Training Script Details

Usage

Monitoring Training

Output

Credits

`mnist-gpu-demo`: Training on Tensorflow MNist job on Slurm GPU Node