Parameters for Evaluating Deep Learning Program’s GPU Performance

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the motherboard. In certain CPUs, they are embedded in the CPU die.

Examining and Scrutiny the correct GPU performance system of measurement can go a long way in assisting us to train and deploy deep learning applications. Here are the top 5 system of measurements we should monitor:

1. GPU Deployment

GPU deployment is one of the primary systems of measurements to observe during a deep learning training session. This system of measurement is readily accessible through popular GPU monitoring interfaces such as NVIDIA’s “NVIDIA-semi”. A GPU’s deployment is defined as the percentage of time one or more GPU kernels are running over the last second, which is analogous to a GPU being utilized by a deep learning program.

Monitoring our deep learning training sessions’ GPU deployment is one of the best statistics to determine if our GPU is being used. Moreover, monitoring the real-time deployment trend can help identify bottlenecks in our pre-processing and feature engineering pipelines that might be slowing down our training process.

2. GPU Memory Access and Deployment

Much like the GPU deployment, the state of our GPU’s memory is also a great indicator of how well our GPU is being used in our deep learning process. The NVIDIA-smi has a comprehensive list of the memory system of measurements that can be used to accelerate our model training. Like GPU deployment, the GPU memory deployment system of measurement is one of the key systems of measurements to monitor the training process. This system of measurement represents the percentage of time over the last second that the GPU’s memory controller was being utilized to either read or write from memory. Another system of measurements such as the available memory used memory and free memory can also prove important, as they provide insights into the efficiency of our deep learning program. Additionally, this system of measurements can be used to fine-tune the batch size for our training samples.

3. Power Usage and Temperatures

Power usage is an important aspect of GPU performance. The power draw on one of our GPUs gives us a statistic of how hard it is working, as well as how power-intensive our application would be. This can be especially important for testing deep learning applications for mobile devices, where power consumption is a significant concern.

The power usage is closely associated with the ambient temperature the GPU is being used in. Power draw measured by tools such as NVIDIA-smi is usually monitored at the card’s power supply unit and includes the power consumed by active cooling elements, memory, and compute units.

As our GPU’s temperature rises, the resistance value and other electronic components temperature increases, and fans spin faster, increasing the power draw. For deep learning, a GPU’s power consumption is also important because thermal stifling at high temperatures slows down the training process.

4. Training time

The training time is one of the major systems of measurements used in deep learning models to gauge and benchmark the performance of GPUs. It is important to keep the definition of the solution reliable and constant between all different GPUs. For categorization problems such as image classification using CNN and NLP applications using RNN, this could be a predefined accuracy that the model must meet. GPU features such as enabling mixed-precision and model optimizations such as tuning the input batch size play a very important role in training time.

5. Thought Process

While training time is important during the learning process, the time for inference is important for a deployed model in production. In neural networks, the time for inference is the time needed to make a forward pass through the neural network to come up with a result. The throughput is typically used to measure a GPU’s performance in making fast inferences.

The general system of measurement for throughput is given by the number of samples processed per second by the model on a GPU. However, the exact system of measurement can vary depending on the model architecture and the deep learning application.

For example, the throughput for a convolutional neural network for image classification would be calculated in images/second. In contrast, the throughput for a recurrent neural network being utilized in an NLP application could be done in tokens/second.

Architecture of GPU


Monitoring the right GPU performance system of measurements can save us a lot of drudgery and time, so we can focus on training or deploying our deep learning applications.

BIO: BIO: Arpit Bhushan Sharma is an Electrical and Electronics Engineer and pursuing his Bachelor of technology from the KIET Group of Institutions, Ghaziabad. He has a little bit of experience in python and machine learning and wants to make his career in Machine Learning. He loves to write technical articles on various aspects of data science on the Medium platform and Blogger Platform.

For Contacting:




Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store