PyTorch Profiler | NaadiSpeaks

import torch import torch.nn import torch.optim import torch.profiler import torch.utils.data import torchvision.datasets import torchvision.models import torchvision.transforms as T #load data transform = T.Compose( [T.Resize(224), T.ToTensor(), T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) train_loader = torch.utils.data.DataLoader(train_set, batch_size=32, shuffle=True) #create model device = torch.device("cuda:0") model = torchvision.models.resnet18(pretrained=True).cuda(device) criterion = torch.nn.CrossEntropyLoss().cuda(device) optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9) model.train() #train function def train(data): inputs, labels = data[0].to(device=device), data[1].to(device=device) outputs = model(inputs) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() #use profiler to record execution events with torch.profiler.profile( schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2), on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'), record_shapes=True, profile_memory=True, with_stack=True ) as prof: for step, batch_data in enumerate(train_loader): if step >= (1 + 1 + 3) * 2: break train(batch_data) prof.step()

Deep neural networks are complex! Literally it takes quite an amount of effort and time to make them work near to perfect. Despite the effort you put on fitting the model well with your data and getting an admirable accuracy you have to keep your eye on model efficiency and performance. Sometimes it’s a trade-off between the model accuracy and the efficiency in inference. In order to do this, analysing the memory and computation usage of the networks is essential. This is where profiling neural networks comes in to the scene.

Since PyTorch is my preferred deep learning framework, I’ve been using PyTorch profiler tool it had for a while on torch.autograd.profiler . It was pretty sleek and had some basic functionalities for profiling DNNs. Getting a major update PyTorch 1.8.1 announced PyTorch Profiler, the imporved performance debugging profiler for PyTorch DNNs.

One of the major improvements it has got is the performance visualisations attached with tensorboard. As mentioned in the release article, there are 5 major features included on PyTorch Profiler.

Distributed training view
Memory view
GPU utilization
Cloud storage support
Jump to course code

You don’t need to have extensive set of codes for analyzing the performance of the network. Just a set of simple Profiler API calls. To get the things started, let’s see how you can use PyTorch Profiler for analyzing execution time and memory consumption of the popular resnet18 architecture. You may need to have PyTorch 1.8.1 or higher to perform these actions.

import torch
import torchvision.models as models
from torch.profiler import profile, record_function, ProfilerActivity

use_cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if use_cuda else "cpu")

#init simple resnet model
model = models.resnet18().to(device)

#create a dummy input
inputs = torch.randn(5,3,224,224).to(device)

# Analyze execution time
with profile(activities=[
        ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
    with record_function("model_inference"):
        model(inputs)

#print the output sorted with CPU execution time
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))


#Analyzing memory consumption
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
        profile_memory=True, record_shapes=True) as prof:
    model(inputs)

#print the output sorted with CPU memory consumption
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))