OCR Fundamentals - Part2

📌 Image Pre-processing

Why is pre-processing essential in OCR?

There are several reasons you might not get good quality results from OCR. One of the common reasons is noises in an original image, so we can say that OCR results highly depend on the quality of the input image. If the quality of the input image is too poor, even the best OCR engines will not give excellent results. Therefore, image pre-processing comes into play to improve the quality of the input image so that the OCR engine gives us an accurate output. This article will introduce image pre-processing techniques to improve OCR accuracy.

Pre-processing Techniques

When the input image is speckled, too dark, or too light, the pattern of dots cannot be recognized correctly, which is detected by the OCR algorithm. The following techniques address these issues.

Resolution and Scaling

DPI (Dots Per Inch) resolution of an original image which shows the number of pixels per inch, is crucial in image recognition algorithms. The higher DPI of input images, the higher accuracy of the OCR, and in many cases, the recommended resolution for OCR is around 300 DPI. Keeping DPI lower than 200 or greater than 600 may cause inaccurate results, but some OCR engines can adjust the resolution automatically by scaling the original image based on the font size. Below are the examples of how to scale an image using python:

Original Image

Downscale

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_to_path)
img = cv2.resize(img, (0, 0), fx=0.5, fy=0.5)

plt.imshow(img)
plt.show()

alt text

Upscale

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_to_path)
img = cv2.resize(img, (0, 0), fx=1.5, fy=1.5)

plt.imshow(img)
plt.show()

alt text

Blurring

One of the most common noises is the variation of brightness and color information, and it makes it difficult to detect the textual content in the image. Therefore, we mostly do denoising of an original image before passing it to an OCR engine, and in image processing, we call it blurring. You can do it with image filtering techniques as the following.

Gaussian filters remove Gaussian noise from images by utilizing a Gaussian kernel for convolution. Although it is faster than other blurring techniques, it cannot preserve edges that may affect the OCR results.

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_to_path)
img = cv2.GaussianBlur(img, (9, 9), 0)

plt.imshow(img)
plt.show()

alt text

In Median filtering, it replaces the central elements of the kernel regions with the medians of the neighboring pixels. It is a suitable method for denoising images while preserving edges.

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_to_path)
img = cv2.medianBlur(img,10)

plt.imshow(img)
plt.show()

alt text

A Bilateral filter is highly effective at doing blurring without losing sharp edges. It replaces each pixel with a weighted average of its neighbors.

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(image_to_path)
img = cv2.bilateralFilter(img,9,75,75)

plt.imshow(img)
plt.show()

alt text

Morphological Operations

Morphological operations address the downgrade of blurred images. The two basic morphological operations are Erosion and Dilation, and we usually apply these methods to binary images. The erosion reduces the size of bright spots, while dilation does the opposite. Morphological operations also help smooth images using the opening and closing operations.

Erosion

import numpy as np
import cv2

img = cv2.imread(image_to_path)
kernel = np.ones((5,5), np.uint8)
img = cv2.erode(img, kernel, iterations=1)

plt.imshow(img)
plt.show()

alt text

Dilation

import numpy as np
import cv2

img = cv2.imread(image_to_path)
kernel = np.ones((2,2), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)

plt.imshow(img)
plt.show()

alt text

Opening

import numpy as np
import cv2

img = cv2.imread(image_to_path)
kernel = np.ones((5,5), np.uint8)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)

plt.imshow(img)
plt.show()

alt text

Closing

import numpy as np
import cv2

img = cv2.imread(image_to_path)
kernel = np.ones((2,2), np.uint8)
img = ccv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)

plt.imshow(img)
plt.show()

alt text

Skew Correction

Most images taken by a digital camera or scanned from a flatbed scanner are slightly skewed. Skewed images directly affect the line segmentation in OCR, reducing its accuracy. Therefore, we manually detect the skew angle and rotate the page to reproduce the original form. We omitted the sample code for skew correction because it seemed to take time to prepare good sample images. If you want to know more about skew correction, please check this A Review of Skew Detection Techniques for Document out.

Summary

As with most image recognition algorithms, being aware of basic image pre-processing techniques and understanding how they work are the clues to increasing the accuracy of OCR. So, in this article, we have presented the commonly used image pre-processing techniques to improve the accuracy of OCR. In our upcoming blog, we will dive deeper into how we can use these pre-processing techniques to extract textual content from images in combination with some open-source OCR libraries.

References

Back