Analyzing Performance of Neural Networks with PyTorch Profiler

Deep neural networks are complex! Literally it takes quite an amount of effort and time to make them work near to perfect. Despite the effort you put on fitting the model well with your data and getting an admirable accuracy you have to keep your eye on model efficiency and performance. Sometimes it’s a trade-off between the model accuracy and the efficiency in inference. In order to do this, analysing the memory and computation usage of the networks is essential. This is where profiling neural networks comes in to the scene.

Since PyTorch is my preferred deep learning framework, I’ve been using PyTorch profiler tool it had for a while on torch.autograd.profiler . It was pretty sleek and had some basic functionalities for profiling DNNs. Getting a major update PyTorch 1.8.1 announced PyTorch Profiler, the imporved performance debugging profiler for PyTorch DNNs.

One of the major improvements it has got is the performance visualisations attached with tensorboard. As mentioned in the release article, there are 5 major features included on PyTorch Profiler.

Distributed training view
Memory view
GPU utilization
Cloud storage support
Jump to course code

You don’t need to have extensive set of codes for analyzing the performance of the network. Just a set of simple Profiler API calls. To get the things started, let’s see how you can use PyTorch Profiler for analyzing execution time and memory consumption of the popular resnet18 architecture. You may need to have PyTorch 1.8.1 or higher to perform these actions.

import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity

use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")

#init simple resnet model
model = models.resnet18().to(device)

#create a dummy input
inputs = torch.randn(5,3,224,224).to(device)

# Analyze execution time
with profile(activities=[
        ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

#print the output sorted with CPU execution time
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))


#Analyzing memory consumption
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
        profile_memory=True, record_shapes=True) as prof:
    model(inputs)

#print the output sorted with CPU memory consumption
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))