I would say, training a deep neural network model to achieve a good accuracy is an art. The training process enable the model to learn the model parameters such as the weights and the biases with the training data. In the process of training, model hyper-parameters govern the process. They control the behavior of model training and does a significant impact on model accuracy and convergence.
Learning rate, number of epochs, hidden layers, hidden units, activation functions, momentum are the hyperparameters that we can adjust to make the neural network models perform well.
Adjusting the learning rate is a vital factor for convergence because a small learning rate makes the training very slow and can occur overfitting, while if the learning rate is too large, the training will diverge. The typical way of finding the optimum learning rate is performing a grid search or a random search which can be computationally expensive and take a lot of time. Isn’t there a smart way to find out the optimal learning rate?
Here I’m going to connect some dots together on a process I followed to choose a good learning rate for my model and a way of training a DNN with different learning rate policy.
Many researchers actively work on this area and through his paper “Cyclical Learning Rates for Training Neural Networks” by Leslie N. Smith proposed Learning rate range test (LR range test) and Cyclical Learning Rates (CLR).
Not going to discuss the interesting theory behind LR range test and CLR, as fast.ai has a pretty good introduction on the method and they even have an implementation of LR range test that can use off the shelf. Strongly recommend to read this post. I found a nice implementation on LR range test in PyTorch by David Silva and feel free to pull it from here . https://github.com/davidtvs/pytorch-lr-finder
In 2018, by the paper “A disciplined Approach to Neural Network Hyper-Parameters : Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay” Smith introduces the 1cycle policy which is only running a single cycle of training compared to several cycles in the CLR. Strongly suggest to take a look on this blog post to get an idea on 1cycle policy.
Ok… Now you read it! Is this working???
I give it a try using a simple transfer learning experiment. The dataset and the experiment I used here is from the PyTorch documentation which you can find here. These are the steps I followed during the experiment.
Yeah! I’ve pushed the experiment to GitHub and feel free to use it. 😊
- Run the LR range finder to find the maximum learning rate value to use on 1cycle learning.

Output from the LR finder
According to the graph it is clear that 5*1e-3 can be the maximum learning rate value that can be used for training. So, I chose 7*1e-3; which is bit before the minimum as my maximum learning rate for training.
- Run the training using a defined learning rate (Note that a learning rate decay has used during training)
- Run the training according to the 1cycle policy. (A cyclical momentum and cyclical learning date have been used. Note that the learning rate and the momentum is changing in each mini-batch: not epoch-wise.)

- Compare the validation accuracy and validation loss of each method.
Can you notice that the green line, which represents the experiment trained using 1cycle policy gives a better validation accuracy and a better validation loss when converging.
These are the best validation accuracy of the two experiments.
- Fixed LR : 0.9411
- 1-cycle : 0.9607
Tip : Use the batch size according to the computational capacity you are having. The number of iterations in 1cycle policy depends on the batch size, number of epochs and the dataset size you are using for training.
Though this experiment is a simple one, it is proven that 1cycle policy does a job in increasing the accuracy of neural network models and helps for super convergence. Give it a try and don’t forget to share your experiences here. 😊
References –
[1] Cyclical Learning Rates for Training Neural Networks
https://arxiv.org/abs/1506.01186
[2] A disciplined approach to neural network hyper-parameters: Part 1 — learning rate, batch size, momentum, and weight decay
https://arxiv.org/abs/1803.09820
[3] The 1cycle policy
https://sgugger.github.io/the-1cycle-policy.html
[4] PyTorch Learning Rate Finder
https://github.com/davidtvs/pytorch-lr-finder
[5] Tranfer Learning Tutorial
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html


We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the
So, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset. Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

As shown in the graph, TensorFlow is the most popular and widely used deep learning framework right now. When it comes to Keras, it’s not working independently. It works as an upper layer for prevailing deep learning frameworks; namely with TensorFlow, Theano & CNTK (MXNet backend for Keras is on the way). To be more précised, Keras act as a wrapper for these frameworks. Working with Keras is easy as working with Lego blocks. What you have to know is where to fix the right component. So it is the ultimate deep learning tool for human beings!
Let’s start with a simple experiment that involves classifying Dog & Cat images from Kaggle. First make sure to download the training & testing image files from Kaggle (
Extracting the teeny tiny features in images, feeding the features into deep neural networks with number of hidden neuron layers and granting the silicon chips “eyes” to see has become a hot topic today. Computer vision has gone so far from the era of pattern recognition and feature engineering. With the advancement of machine learning algorithms combined with deep learning; understanding the content in the images and using them in real world applications has become a MUST more than a trend.
Fill the name, description and select the domain you going to build the model. Here I’ve selected Landmarks because the images I’m going to use contains landmarks and structural buildings.
All together 53 images with different tags were uploaded for training.

