Computer Vision | NaadiSpeaks

I blogged on some of the challenges I face with my deep learning based experiments and the approaches I used for overcoming those challenges in previous blog posts. This is going to be one of them where I’m going to explain a technique I used for calculating perspective relationship between two different planes in a computer vision application.

Background :

Computer vision is widely used in surveillance applications, object detection and sports analytics. Mapping the imagery/ video footage generated from a single camera or from set of cameras to a relative space is one of the major tasks we may have to deal with. Mostly this need comes with mapping people/ object locations.

Use Case –

Imagine a sports analytics application where you capture a soccer game from a fixed camera and run a human detection algorithm on the image to find out the player positions. That’s quite straightforward. (You can see that has been done in the following figure). The tricky part is mapping the player positions which are in the camera space to the actual soccer field coordinate and generate graph with player positions relative to the soccer field (or you may want to normalize the location coordinates) . What we need was an output similar to the left bottom one in the figure.

Reference : https://www.sportperformanceanalysis.com/article/computer-vision-in-sport

How to do that?

We can clearly understand that soccer field is rectangular in shape. So that, if we know the frame-space location coordinates of the 4 corners of the field, we can easily transform any point inside that polygon for a given coordinate space. In geometry this is called “perspective transformation” . (This is bit different from affine transformation which is a common application.)

If you interested in digging deep and see how this mathematical transformation is happening, I strongly encourage you to follow this link and see the related matrix calculations behind this operation.

I found out a pretty neat JavaScript snippet by Florian Segginger and I did ported the logic to a python script.

import numpy as np
from matplotlib import pyplot as plt

resultRect = {
  'p1': {'x': 0, 'y': 0},
  'p2': {'x': 1, 'y': 0},
  'p3': {'x': 1, 'y': 1},
  'p4': {'x': 0, 'y': 1}
}

resultPoint = {'x': 0, 'y': 0}

# # solve function
# First, find the transformation matrix for our deformed inputPolygon
# [a b c]
# [d e f]
# [g h 1]
def perspective_transform(inputPolygon, point):
  x0 = inputPolygon['p1']['x']
  y0 = inputPolygon['p1']['y']
  x1 = inputPolygon['p2']['x']
  y1 = inputPolygon['p2']['y']
  x2 = inputPolygon['p3']['x']
  y2 = inputPolygon['p3']['y']
  x3 = inputPolygon['p4']['x']
  y3 = inputPolygon['p4']['y']

  X0 = resultRect['p1']['x']
  Y0 = resultRect['p1']['y']
  X1 = resultRect['p2']['x']
  Y1 = resultRect['p2']['y']
  X2 = resultRect['p3']['x']
  Y2 = resultRect['p3']['y']
  X3 = resultRect['p4']['x']
  Y3 = resultRect['p4']['y']


  dx1 = x1 - x2
  dx2 = x3 - x2
  dx3 = x0 - x1 + x2 - x3
  dy1 = y1 - y2
  dy2 = y3 - y2
  dy3 = y0 - y1 + y2 - y3

  a13 = (dx3 * dy2 - dy3 * dx2) / (dx1 * dy2 - dy1 * dx2)
  a23 = (dx1 * dy3 - dy1 * dx3) / (dx1 * dy2 - dy1 * dx2)
  a11 = x1 - x0 + a13 * x1
  a21 = x3 - x0 + a23 * x3
  a31 = x0
  a12 = y1 - y0 + a13 * y1
  a22 = y3 - y0 + a23 * y3
  a32 = y0

  transformMatrix = [
  [a11, a12, a13],
  [a21, a22, a23],
  [a31, a32, 1]
  ]

  #find inverse of matrix
  transformMatrix = np.array(transformMatrix)
  inv = np.linalg.inv(transformMatrix)

  #convert point to a matrix
  pointMatrix = np.array([point['x'], point['y'],1])

  #matrix multiplication
  resultMatrix = np.matmul(pointMatrix, inv)

  #result point
  resultPoint['x'] = resultMatrix[0] / resultMatrix[2]
  resultPoint['y'] = resultMatrix[1] / resultMatrix[2]

  return resultPoint

########

#perform transformation with an example

inputPolygon = {
  'p1': {'x': 158, 'y': 2044},
  'p2': {'x': 669, 'y': 573},
  'p3': {'x': 2797, 'y': 594},
  'p4': {'x': 3686, 'y': 2062}
}

point = {'x': 1800, 'y': 900}

resultPoint = perspective_transform(inputPolygon, point)

How to use this?

Pretty easy! You have to know 3 things.

1. Coordinates of the corner 4 points of the polygon to be transformed

2. Coordinate/ coordinates of the points to be transformed

3. 4 corner points of the transformed polygon (This can be a rectangle or any 4 point polygon)

perspective_transform method will get the input polygon coordinates and point coordinates and output the resultPoint perspective to the resultRect we have defined. (In this code I’ve used a 1-1 plane to map the points)

Feel free to use this method in your applications and let me know your thoughts on this. Cheers!