Analyzing Performance of Neural Networks with PyTorch Profiler – Part 2

PyTorch Profiler output for model training

In the previous post, we explored the basic concepts of PyTorch profiler and the newest capabilities comes with its recent updates. One of the coolest things I tried is the TensorBoard plugin comes with PyTorch Profiler. Yes.. you heard to right.. The well-known deep learning visualisation platform TensorBoard is having a Profiler plugin which makes network analysis much more easy.

I just tried the PyTorch Profiler official tutorials and seems the visualisations are pretty descriptive with analysis. I’ll do a complete deep dive with the tool in the next article.

One of the cool things I’ve noticed is the performance recommendations. Most of the recommendations make by the tool makes sense and am pretty sure they going to increase the model training performance.

In the meantime you can play around with the tool and see how convenient it is to use in your deep learning experiments. Here’s the script I used for starting the initial steps with the tool.

import torch
import torch.nn
import torch.optim
import torch.profiler
import torch.utils.data
import torchvision.datasets
import torchvision.models
import torchvision.transforms as T

#load data
transform = T.Compose(
    [T.Resize(224),
     T.ToTensor(),
     T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True)

#create model 
device = torch.device("cuda:0")
model = torchvision.models.resnet18(pretrained=True).cuda(device)
criterion = torch.nn.CrossEntropyLoss().cuda(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
model.train()


#train function
def train(data):
    inputs, labels = data[0].to(device=device), data[1].to(device=device)
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

#use profiler to record execution events
with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        if step >= (1 + 1 + 3) * 2:
            break
        train(batch_data)
        prof.step()

Analyzing Performance of Neural Networks with PyTorch Profiler

Deep neural networks are complex! Literally it takes quite an amount of effort and time to make them work near to perfect. Despite the effort you put on fitting the model well with your data and getting an admirable accuracy you have to keep your eye on model efficiency and performance. Sometimes it’s a trade-off between the model accuracy and the efficiency in inference. In order to do this, analysing the memory and computation usage of the networks is essential. This is where profiling neural networks comes in to the scene.

Since PyTorch is my preferred deep learning framework, I’ve been using PyTorch profiler tool it had for a while on torch.autograd.profiler . It was pretty sleek and had some basic functionalities for profiling DNNs. Getting a major update PyTorch 1.8.1 announced PyTorch Profiler, the imporved performance debugging profiler for PyTorch DNNs.

One of the major improvements it has got is the performance visualisations attached with tensorboard. As mentioned in the release article, there are 5 major features included on PyTorch Profiler.

  1. Distributed training view
  2. Memory view
  3. GPU utilization
  4. Cloud storage support
  5. Jump to course code

You don’t need to have extensive set of codes for analyzing the performance of the network. Just a set of simple Profiler API calls. To get the things started, let’s see how you can use PyTorch Profiler for analyzing execution time and memory consumption of the popular resnet18 architecture. You may need to have PyTorch 1.8.1 or higher to perform these actions.

import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity

use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")

#init simple resnet model
model = models.resnet18().to(device)

#create a dummy input
inputs = torch.randn(5,3,224,224).to(device)

# Analyze execution time
with profile(activities=[
        ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

#print the output sorted with CPU execution time
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))


#Analyzing memory consumption
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
        profile_memory=True, record_shapes=True) as prof:
    model(inputs)

#print the output sorted with CPU memory consumption
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))
Output from the execution time analysis
Output from the memory consumption analysis

Will do discuss on using Profiler visualizations for analyzing model behaviours in the next post.