FAQs on Machine Learning Development – #AskNaadi Part 1

Happy 2022!

It’s almost 7 years since I started playing with machine learning and related domains. These are some FAQs that comes for me from peers. Just added my thoughts on those. Feel free to any questions or concerns you have on the domain. I’ll try my best to add my thoughts on that. Note that all these answers are my personal opinions and experiences.

01. How to learn the theories behind machine learning?

The first thing I’d suggest would be ‘self-learning’. There are plenty of online resources out there where you can start studying by your own. Most of them are free. Some may need a payment for the certification (That’s totally up to you to pay and get it). I’ve listed down some of the famous places to get a kickstart for learning AI. Just take a look here.

Next would be keep practising. Never stop coding and training models in various domains. Kaggle is a good place to sharpen your skill set. Keep learning and keep practising at the same time.

02. Do we really need mathematics for ML?

Yes. To some extend you should know the theories behind probability and and some from basic mathematics. No need to worry a lot on that. As I said previously, there are plenty of places to catch up your maths too.

03. Is there a difference between data analysis and machine learning?

Yes. There is. Data analysis is about find pattern in the prevailing data and obtain inferences due to those patterns. It may have the data visualization components too. When is comes to machine learning, you train a system to learn those patterns and try to predict the upcoming pattern.

04. Does the trend in AI/ML going to fade out in the near future?

Mmm.. I don’t think so. Can’t exactly say AI is going to be ‘the’ future. Since all these technical advancements going to generate hell a lot of data, there should be a way to understand the patterns of those data and get a value out of that. So, data science and machine learning is going to be the approach to go for.

Right… those are some general questions I frequently get from people. Let’s move into some technicalities.

05. What’s the OS you use on your work rig?

Ubuntu! Yes it’s FOSS and super easy to setup all the dependencies which I need on it. (I did a complete walk through on my setup previously. Here’s it). Sometimes I use Windows too. But if it’s with docker and all, yes.. Ubuntu is the choice I’m going with.

06. What’s your preferred programming language to perform machine learning experiments?

I’m a Python guy! (Have used R a little)

07. Any frameworks/ libraries you use most in your experiments?

Since am more into deep learning and computer vision, I use PyTorch deep learning framework a lot. NumPy, Sci-kit learn, Pandas and all other ML related Python toolkits are in my toolbox always.

08. Machine learning is all about neural networks right?

No it’s not! This is one of the biggest myths! Artificial neural networks (ANNs) are only one family of algorithms which we can perform machine learning. There are plenty of other algorithms which are widely used in performing ML. Decision trees, Support Vector Machines, Naive Bayes are some popular ML algorithms which are not ANNs.

09. Why we need GPUs for training?

You need GPUs when you need to do parallel processing. The normal CPUs we have on our machines are typically having 4-5 cores and limited number of threads can be handled simultaneously. When it comes to a GPU, it’s having thousands of small cores which can handle thousands of computational threads in parallel. (For an example Nvidia 2080Ti is having 4352 CUDA cores in it). In Deep learning, we have to perform millions or calculations to train models. Running these workloads in GPUs is much faster and efficient.

10. When to use/ not to use Deep learning?

This is a tricky questions. Deep learning is always good in understanding the non-linear data. That’s why it’s performing really well in computer vision and natural language processing domains. If you have a such task, or your feature space is really large and having a massive amount of data, I’d suggest you to go with deep learning. If not sticking with traditional machine learning algorithms might be the best case.

11. Do I need to know all complex theories behind AI to develop intelligent applications?

Yes and No. In some cases, you may have to understand the theories behind AI/ML in order to develop a machine learning based applications. Mostly I would say model training and validation phases need this knowledge. Will say you are a software developer who’s very good with .Net/ Java and you are developing an application which is having a component where you have to read some text from a scanned document. You have to do it using computer vision. Fortunately you don’t have to build the component from the scratch. There are plenty of services which can be used as REST endpoints to complete the task. No need to worry on the underlying algorithms at all. Just use the JSON!

12. Should I build all my models from scratch?

This is a Yes/No answer too. This question comes mostly with deep learning model development. In some complex scenarios you may have to develop your models from the scratch. But most of the cases the problem you having can be defined as a object detection/ image classification/ Key phrase extraction from text etc. kinda problem. The best approach to go forward would be something like this.

  • Use a simple ANN and see your data loading and the related things are working fine.
  • Use a pre-trained model and see the performance (A widely used SOTA model would be the best choice).
  • If it’s not working out, do transfer learning and see the accuracy of the trained model. (You should get good results most of the times by this step)
  • Do some tweaks to the network and see if it’s working.
  • If none of these are working, then think of building a novel model.

13. Is cloud based machine learning is a good option?

In most of the industrial use cases yes! Since most of the data in prevailing systems are already sitting in the cloud and industries are heavily relying on cloud services these days, cloud based ML is a good approach. Obviously it comes with a price. When it comes to research phases, the price of purchiasing computation power maybe a problem. In those cases, my approach would be doing the research phase on-prem and moving the deployment to the cloud.

14. I’ve huge computer vision datasets to be trained? Shall I move all my stuff to the cloud?

Ehh… As I said previously, if you planning on a research project, which goes for a long time and need a lot of computational hours, I’d suggest to go with a local setup first, finalize the model and then move to the cloud. (If dollars aren’t your problem, no worries at all! Go for the cloud! Obviously it’s easy and more reliable)

15. Which cloud provider to choose?

There’s a lot of cloud providers out there having various services related to ML. Some provides out of the box services where you can just call and API to do the ML tasks (Microsoft Cognitive services etc. ). There are services where you can use your own data to train prevailing models (Custom Vision service by Azure etc.)

If you want end-to-end ML life cycle management, personally I find Azure ML service is a good solution since you can use any of your ML related frameworks and tools and just use cloud to train, manage and deploy the models. I find MLOps features that comes with Azure Machine Learning is pretty useful.

16. I’ve trained and deployed a pretty good machine learning model. I don’t need to touch that again right?

No way! You have to continuously check their performance and the accuracy they are providing for the newest data that comes to the service. The data that comes into the service may skewed. It’s always a good idea to train the model with more data. So better to have a re-training MLOps pipelines to iteratively check your models.

17. My DL models takes a lot of time to train. If I have more computation power the things will speed up?

mm.. Not all the time. I have seen cases where data loading is getting more time than model training. Make sure you are using the correct coding approaches and sufficient memory and process management. Make sure you are not using old libraries which may be the cause for slow processing times. If your code is clean and clear then try adjusting the computation power.

This is just few questions I noted down. If you have any other questions or concerns in the domain of machine learning/ deep learning and data science, just drop a comment below. Will try to add my thoughts there.

Perspective Transformation of Coordinate Points on Polygons

I blogged on some of the challenges I face with my deep learning based experiments and the approaches I used for overcoming those challenges in previous blog posts. This is going to be one of them where I’m going to explain a technique I used for calculating perspective relationship between two different planes in a computer vision application.

Background :

Computer vision is widely used in surveillance applications, object detection and sports analytics. Mapping the imagery/ video footage generated from a single camera or from set of cameras to a relative space is one of the major tasks we may have to deal with. Mostly this need comes with mapping people/ object locations.

Use Case –

Imagine a sports analytics application where you capture a soccer game from a fixed camera and run a human detection algorithm on the image to find out the player positions. That’s quite straightforward. (You can see that has been done in the following figure). The tricky part is mapping the player positions which are in the camera space to the actual soccer field coordinate and generate graph with player positions relative to the soccer field (or you may want to normalize the location coordinates) . What we need was an output similar to the left bottom one in the figure.

How to do that?

We can clearly understand that soccer field is rectangular in shape. So that, if we know the frame-space location coordinates of the 4 corners of the field, we can easily transform any point inside that polygon for a given coordinate space. In geometry this is called “perspective transformation” . (This is bit different from affine transformation which is a common application.)

Perspective transformation

If you interested in digging deep and see how this mathematical transformation is happening, I strongly encourage you to follow this link and see the related matrix calculations behind this operation.

I found out a pretty neat JavaScript snippet by Florian Segginger and I did ported the logic to a python script.

import numpy as np
from matplotlib import pyplot as plt

resultRect = {
  'p1': {'x': 0, 'y': 0},
  'p2': {'x': 1, 'y': 0},
  'p3': {'x': 1, 'y': 1},
  'p4': {'x': 0, 'y': 1}
}

resultPoint = {'x': 0, 'y': 0}

# # solve function
# First, find the transformation matrix for our deformed inputPolygon
# [a b c]
# [d e f]
# [g h 1]
def perspective_transform(inputPolygon, point):
  x0 = inputPolygon['p1']['x']
  y0 = inputPolygon['p1']['y']
  x1 = inputPolygon['p2']['x']
  y1 = inputPolygon['p2']['y']
  x2 = inputPolygon['p3']['x']
  y2 = inputPolygon['p3']['y']
  x3 = inputPolygon['p4']['x']
  y3 = inputPolygon['p4']['y']

  X0 = resultRect['p1']['x']
  Y0 = resultRect['p1']['y']
  X1 = resultRect['p2']['x']
  Y1 = resultRect['p2']['y']
  X2 = resultRect['p3']['x']
  Y2 = resultRect['p3']['y']
  X3 = resultRect['p4']['x']
  Y3 = resultRect['p4']['y']


  dx1 = x1 - x2
  dx2 = x3 - x2
  dx3 = x0 - x1 + x2 - x3
  dy1 = y1 - y2
  dy2 = y3 - y2
  dy3 = y0 - y1 + y2 - y3

  a13 = (dx3 * dy2 - dy3 * dx2) / (dx1 * dy2 - dy1 * dx2)
  a23 = (dx1 * dy3 - dy1 * dx3) / (dx1 * dy2 - dy1 * dx2)
  a11 = x1 - x0 + a13 * x1
  a21 = x3 - x0 + a23 * x3
  a31 = x0
  a12 = y1 - y0 + a13 * y1
  a22 = y3 - y0 + a23 * y3
  a32 = y0

  transformMatrix = [
  [a11, a12, a13],
  [a21, a22, a23],
  [a31, a32, 1]
  ]

  #find inverse of matrix
  transformMatrix = np.array(transformMatrix)
  inv = np.linalg.inv(transformMatrix)

  #convert point to a matrix
  pointMatrix = np.array([point['x'], point['y'],1])

  #matrix multiplication
  resultMatrix = np.matmul(pointMatrix, inv)

  #result point
  resultPoint['x'] = resultMatrix[0] / resultMatrix[2]
  resultPoint['y'] = resultMatrix[1] / resultMatrix[2]

  return resultPoint

########

#perform transformation with an example

inputPolygon = {
  'p1': {'x': 158, 'y': 2044},
  'p2': {'x': 669, 'y': 573},
  'p3': {'x': 2797, 'y': 594},
  'p4': {'x': 3686, 'y': 2062}
}

point = {'x': 1800, 'y': 900}

resultPoint = perspective_transform(inputPolygon, point)

How to use this?

Pretty easy! You have to know 3 things.

1. Coordinates of the corner 4 points of the polygon to be transformed

2. Coordinate/ coordinates of the points to be transformed

3. 4 corner points of the transformed polygon (This can be a rectangle or any 4 point polygon)

perspective_transform method will get the input polygon coordinates and point coordinates and output the resultPoint perspective to the resultRect we have defined. (In this code I’ve used a 1-1 plane to map the points)

Feel free to use this method in your applications and let me know your thoughts on this. Cheers!

10 Tips for Designing & Developing Computer Vision Projects

Computer vision based applications have become one of the most popular research areas as well as have gained lot of interest in different industrial domains. Popularity and the advancements of deep learning have given a boost for the hype of computer vision.

Being a researcher focused on computer vision based applications for nearly 3 years, Here are some tips I’d give for a developer who’s stepping into a computer vision related experiment/ deployment.

Before going further into the discussion, you may need to get an idea on the difference between traditional computer vision approaches and deep learning based approaches. Here’s a quick overview on that.

01. Do we really have to use deep learning based computer vision approaches to solve this?

This is the very first thing to concern! When you see a problem from the scratch, you may think applying deep learning for this is the survivor. It’s not true in some cases. You may be able to solve the problem using traditional line detection filters etc. easily without wasting the time and energy in training a deep learning model to solve the task. Observe the problem thoroughly and get the decision to move forward or not.

02. Analyze the input data and the desired output

To be obvious, deep learning based computer vision models get images or videos as its input modalities. Before starting the project implementations, we should consider following factors of the input data we have.

Size of the data –

Since DL models need a huge amount of data (in most of the cases) for training without getting the models overfitted we need to make sure we have a good amount of data in hand for training. In this case we can’t specify exact numbers. I’d say more the better!

Quality of the data –

Some image inputs or the video streams we get are blurred and not covering the most important features we need to build the models. Getting images/ videos in higher resolution is always better. When considering the quality of the data it’s better to take a look on the factors like class imbalance if it’s a classification problem.

Similarity of training data and data inputs in the inference time –

I’ve seen cases where data model is getting in the inference time is very different than the data used in the training (For an example the model is trained using cat images from cartoons and it’s getting real life cat images in the inference time.) If it’s not a model which is specifically designed for domain adaptation, you should NEVER do this mistake.

03. Building from the scratch? Is it necessary?

As I said previously, computer vision is one of the most widely researched areas in deep learning. So that, you are having the privilege of using pre-built models as well as online services to perform your computer vision workloads.

Services such as Azure cognitive services, Google vision APIs etc. provides pre-built web APIs which you can directly use for many vision related tasks. Starting from an OCR task of reading a text in a scanned document, there are APIs which can identify human faces and their emotions even. No need to build from the scratch. You can just use the service as a web service in your application.

Even going a step forward from the pre-built services Microsoft Azure cognitive services offer a custom vision service where you can train your own image classification models with your own data. This may come handy in most of the practical applications where you don’t need to spend time on building the model or configuring the training environment.

04. Building from scratch? Is it REALLY necessary?

Yp! Again, a decision to take. If your problem cannot be addressed from the pre-built computer vision services available online, the option you have to go forward is building a deep learning model and training it using your own data. When it comes to model development one of the very big mistakes we do is neglecting the prevailing models built by researchers for various purposes.

I’m pretty sure most of the computer vision tasks that you have is falling under famous computer vision areas such as image classification, action recognition in videos, human pose detection, human/ object tracking etc. There are many pre-built methods which has been achieved state-of-the-art accuracy in solving these problems and benchmarked with most of the publicly available big datasets. For an example, ResNet models are specifically designed for image classification and shown the best accuracy on ImageNet dataset. You can easily use these models (Most of these models are available in model zoos of popular deep learning frameworks) and adapt their last layers for your needs and get higher accuracies rather than building your own model from the scratch.

Papers with code is a great place to search for prevailing models on various computer vision tasks.

I recently came across this openMMLab repositories which comes pretty handy in such tasks. (Mostly for video analysis stuff)

05. Use the correct method

When building the models, make sure you follow the correct path which matches with your data input. For an example if you only have few training images to train your classification model, you may need to look on areas like few-shot learning to train your model. Tricks such as adding batch normalization, using correct loss functions, adding more input modalities, using learning rate schedulers, transfer learning will surely increase your model accuracy.

06. Data augmentation is a suvivor!

More data the better! Always take a look on sensible data augmentation methods to make sure your model is not overfitted for training data. Always visualize your data inputs before using that for model training to make sure your data augmentations are making sense.  

07. Model training should not be a nightmare

This is the most time-consuming part in developing computer vision models. We all know training deep learning models needs a lot of computation power. Make sure you have enough computation power to train your models. It’ll be a nightmare to train an image classifier which is having 100,000 images just using your CPU! Make sure you have a good enough GPUs for performing the computations and configured them correctly for training models.  

08. Model inference time should not be years!

Model inferencing the least concerned portion in model development. Though it is the most vital part since this is where the outcome is shown for the outsider. Sometimes, your trained model may take a lot of time for inferencing which may make the model useless in a real-world application. Think of a human detection system you implemented taking 1-2 minutes to identify a human who’s accessing a secured location…. There’s no use of a such system since that doesn’t meet the need of real-time surveillance. Always make sure to develop the simplest model that gives the best accuracy. Sometimes you may have to compromise few digits from the accuracy numbers to increase the model efficiency. That’s totally fine in a real-world application. Before pushing the model into production, take a look on converting the models to ONNX or model pruning. It’ll help you to deploy efficient models.

09. Take a look on your deployment target

This directly connects with the facts we discussed in the model inference time. We don’t have the luxury of having high end machines powered with GPUs in all deployment locations. Or having high powered cloud services. Sometimes out deployment target may be a IoT device. So that make sure you design a light weight model which even provides a good performance by consuming less resources.     

10. Privacy concerns

Last but not least, we may have to look on privacy concerns. Since we are dealing with image and video data which may contains lot of personal informaiton of the people, we need to make sure we are followiong the privacy guidelines and making sure the data we use for model training is having enough security clearance to do such tasks.

Bit lengthy… but hope you got some clues before getting into your next computer vision project. Happy coding 😊

Transfer Learning in ConvNets – Part 2

42-29421947We discussed the possibility of transferring the knowledge learned by a ConvNet to another. If you new to the idea of transfer learning, please go check up the previous post here.

Alright… Let’s see a practical scenario where we need to use transfer learning. We all know that deep neural networks are data hungry. We may need a huge amount of data to build unbiased predictive models. Though the perfect scenario is that, in most of the cases, there’s not that much of data to train neural models. So, the ‘To Go” survivor for you may be transfer learning.

Here in this small demonstration what I’ve done is building a multi-class classifier that have 8 classes and only 100 odd images in the training set for each class.

The dataset I’m using here is a derivation of the “Natural Images” dataset (https://www.kaggle.com/prasunroy/natural-images/version/1#_=_ )  . I’ve randomly reduced the number of images in the original dataset for building the “Mini Natural Images”. This dataset consists of three phases for train, test and validation.  (The dataset is available in the GitHub repository) Go ahead and feel free to pull it or fork it!

Here’s an overview of the “Mini Natural Images” dataset.

datasetSo, this is going to be an image classification task. We going to take the advantage of ImageNet; and the state-of-the-art architectures pre-trained on ImageNet dataset.  Instead of random initialization, we initialize the network with a pretrained network and the convNet is finetuned with the training set.

I’ve used PyTorch deep learning framework for the experiment as it’s super easy to adopt for deep learning.  For this type of computer vision applications you can use the models available in torch vision.models (https://pytorch.org/docs/stable/torchvision/models.html )

The models available in the model zoo is pre-trained with ImageNet dataset to classify 1000 classes. With that, there’s 1000 nodes in the final layer. For adopting the model for our need, keep in mind to remove the final layer and replace it with the desired number of nodes for your task. (In this experiment, the final fc layer of the resNet18 has been replaced by 8 node fc layer)

Here’s the way to replace the final layer of resNet architecture and in VGG architecture.

#Using a model pre-trained on ImageNet and replacing it's final linear layer

#For resnet18
model_ft = models.resnet18(pretrained=True)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs, 8)

#for VGG16_BN
model_ft = models.vgg16_bn(pretrained=True)
model_ft.classifier[6].out_features = 8

Rest of the training goes in the same of training and finetuning a CNN. Make sure to use a desired batch size to your GPU available in your rig. (You can use a DLVM for this task if you wish 😊)

The training and validation accuracies are plotted and the confusion matrix is generated using torchnet (https://github.com/pytorch/tnt ) which is pretty good for visualization and logging in PyTorch.

g6_r18

Confusion matrix of the classification

The classifier performs a 97% accuracy for the testing image set, which is not bad.

Now it’s your time to go ahead and get your hands dirty with this experiment. Leave a comment if you find come up with any issue. Happy coding!

Here’s the GitHub Repo for your reference!

Transfer Learning in ConvNets

42-29421947The rise of deep learning methods in the areas like computer vision and natural language processing lead to the need of having massive datasets to train the deep neural networks. In most of the cases, finding large enough datasets is not possible and training a deep neural network from the scratch for a particular task may time consuming. For addressing this problem, transfer learning can be used as a learning framework; where the knowledge acquired from a learned related task is transferred for the learning improvement of a new task.

In a simple form, transfer learning helps a machine learning models to learn easily by getting the help from a pre-trained machine learning model which the domain is similar to some extent (not exactly similar).

t1

The ways in which transfer might improve learning

There might be cases where transfer method actually decreases the performance, where we called them as a Negative Transfer. Normally, we (a human) engage with the task of deciding which knowledge can be transferred (mapping) in particular tasks but the active research is going on finding ways to do this mapping automatically.

That’s all about the theories! Let’s discuss how we can apply transfer learning in a computer vision task. As you all know, Convolutional Neural Networks (CNNs) is performing really well in the cases of image classification, image recognition and such tasks. Training deep CNNs need large amounts of image/video data and the massive number of parameter tuning operations takes a long time to train models. In such cases, Transfer Learning is a best fit to train new models and it is widely used in the industry as well as in the research.

There are three main approaches of using transfer learning in machine learning problems. To make it easier to understand I’ll get my examples from the context of training deep neural network models for computer vision (image classification, labeling etc.) related tasks.

ConvNet as fixed feature extractor –

In this case, you use a ConvNet that has been pre-trained with a large image repository like ImageNet and remove its last fully connected layer. The rest is used as a fixed feature extractor for the dataset you are having. Then a linear classifier (softmax or a linear SVM) should be trained for the new dataset.

t2

VGG16 as a feature extractor

Fine-tuning the ConvNet –

Here we are not just stopping by using the ConvNet as a feature extractor. We finetune the weights of the ConvNet with the data that we are having. Sometimes not the whole deepNet, the set of last layers are tuned as the first layers represent most generalized features.

Using pretrained models –

In here we used pre-trained models available in most deep learning frameworks and adjust them according to our need. In the next post, will discuss how to perform this using PyTorch.

One of the most important decisions to get in transfer learning is whether to fine tune the network or to leave it as it is. The size of the dataset and the similarity of the prevailing dataset to the model’s trained training set are the deciding factors for it.  Here’s a summary that would help you to take the decision.

Picture1Let’s discuss how to perform transfer learning with an example in the next post. 😊

Deep Learning Vs. Traditional Computer Vision

1_K68boX7fmtsYmyG2LlcmhQ

Traditional way of feature extraction

Computer vision can be succinctly described as finding and telling features from images to help discriminate objects and/or classes of objects.

Computer vision has become one of the vital research areas and the commercial applications bounded with the use of computer vision methodologies is becoming a huge portion in industry. The accuracy and the speed of processing and identifying images captured from cameras are has developed through decades. Being the well-known boy in town, deep learning is playing a major role as a computer vision tool.

Is deep learning the only tool to perform computer vision?

A big no! Deep learning came to the scene of computer vision couple of years back. Back then, computer vision was mainly based with image processing algorithms and methods. The main process of computer vision was extracting the features of the image. Detecting the color, edges, corners and objects were the first step to do when performing a computer vision task. These features are human engineered and accuracy and the reliability of the models directly depend on the extracted features and on the methods used for feature extraction. In the traditional vision scope, the algorithms like SIFT (Scale-Invariant Feature Transform), SURF (Speeded-Up Robust Features), BRIEF (Binary Robust Independent Elementary Features) plays the major role of extracting the features from the raw image.

The difficulty with this approach of feature extraction in image classification is that you have to choose which features to look for in each given image. When the number of classes of the classification goes high or the image clarity goes down it’s really hard to cope up with traditional computer vision algorithms.

The Deep Learning approach –

Deep learning, which is a subset of machine learning has shown a significant performance and accuracy gain in the field of computer vision. Arguably one of the most influential papers in applying deep learning to computer vision, in 2012, a neural network containing over 60 million parameters significantly beat previous state-of-the-art approaches to image recognition in a popular ImageNet computer vision competition: ISVRC-2012

Screen-Shot-2018-03-16-at-3.06.48-PMThe boom started with the convolutional neural networks and the modified architectures of ConvNets. By now it is said that some convNet architectures are so close to 100% accuracy of image classification challenges, sometimes beating the human eye!

The main difference in deep learning approach of computer vision is the concept of end-to-end learning. There’s no longer need of defining the features and do feature engineering. The neural do that for you. It can simply put in this way.

If you want to teach a [deep] neural network to recognize a cat, for instance, you don’t tell it to look for whiskers, ears, fur, and eyes. You simply show it thousands and thousands of photos of cats, and eventually it works things out. If it keeps misclassifying foxes as cats, you don’t rewrite the code. You just keep coaching it.

Wired | 2016

Though deep neural networks has its major drawbacks like, need of having huge amount of training data and need of large computation power, the field of computer vision has already conquered by this amazing tool already!