OCR Fundamentals - Part2
📌 Image Pre-processing
Why is pre-processing essential in OCR?
There are several reasons you might not get good quality results from OCR. One of the common reasons is noises in an original image, so we can say that OCR results highly depend on the quality of the input image. If the quality of the input image is too poor, even the best OCR engines will not give excellent results. Therefore, image pre-processing comes into play to improve the quality of the input image so that the OCR engine gives us an accurate output. This article will introduce image pre-processing techniques to improve OCR accuracy.
Pre-processing Techniques
When the input image is speckled, too dark, or too light, the pattern of dots cannot be recognized correctly, which is detected by the OCR algorithm. The following techniques address these issues.
Resolution and Scaling
DPI (Dots Per Inch) resolution of an original image which shows the number of pixels per inch, is crucial in image recognition algorithms. The higher DPI of input images, the higher accuracy of the OCR, and in many cases, the recommended resolution for OCR is around 300 DPI. Keeping DPI lower than 200 or greater than 600 may cause inaccurate results, but some OCR engines can adjust the resolution automatically by scaling the original image based on the font size. Below are the examples of how to scale an image using python:
Original Image
Downscale
import matplotlib.pyplot as plt
import cv2
img = cv2.imread(image_to_path)
img = cv2.resize(img, (0, 0), fx=0.5, fy=0.5)
plt.imshow(img)
plt.show()
Upscale
import matplotlib.pyplot as plt
import cv2
img = cv2.imread(image_to_path)
img = cv2.resize(img, (0, 0), fx=1.5, fy=1.5)
plt.imshow(img)
plt.show()
Blurring
One of the most common noises is the variation of brightness and color information, and it makes it difficult to detect the textual content in the image. Therefore, we mostly do denoising of an original image before passing it to an OCR engine, and in image processing, we call it blurring. You can do it with image filtering techniques as the following.
- Gaussian filters remove Gaussian noise from images by utilizing a Gaussian kernel for convolution. Although it is faster than other blurring techniques, it cannot preserve edges that may affect the OCR results.
import matplotlib.pyplot as plt
import cv2
img = cv2.imread(image_to_path)
img = cv2.GaussianBlur(img, (9, 9), 0)
plt.imshow(img)
plt.show()
- In Median filtering, it replaces the central elements of the kernel regions with the medians of the neighboring pixels. It is a suitable method for denoising images while preserving edges.
import matplotlib.pyplot as plt
import cv2
img = cv2.imread(image_to_path)
img = cv2.medianBlur(img,10)
plt.imshow(img)
plt.show()
- A Bilateral filter is highly effective at doing blurring without losing sharp edges. It replaces each pixel with a weighted average of its neighbors.
import matplotlib.pyplot as plt
import cv2
img = cv2.imread(image_to_path)
img = cv2.bilateralFilter(img,9,75,75)
plt.imshow(img)
plt.show()
Morphological Operations
Morphological operations address the downgrade of blurred images. The two basic morphological operations are Erosion and Dilation, and we usually apply these methods to binary images. The erosion reduces the size of bright spots, while dilation does the opposite. Morphological operations also help smooth images using the opening and closing operations.
Erosion
import numpy as np
import cv2
img = cv2.imread(image_to_path)
kernel = np.ones((5,5), np.uint8)
img = cv2.erode(img, kernel, iterations=1)
plt.imshow(img)
plt.show()
Dilation
import numpy as np
import cv2
img = cv2.imread(image_to_path)
kernel = np.ones((2,2), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
plt.imshow(img)
plt.show()
Opening
import numpy as np
import cv2
img = cv2.imread(image_to_path)
kernel = np.ones((5,5), np.uint8)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
plt.imshow(img)
plt.show()
Closing
import numpy as np
import cv2
img = cv2.imread(image_to_path)
kernel = np.ones((2,2), np.uint8)
img = ccv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
plt.imshow(img)
plt.show()
Skew Correction
Most images taken by a digital camera or scanned from a flatbed scanner are slightly skewed. Skewed images directly affect the line segmentation in OCR, reducing its accuracy. Therefore, we manually detect the skew angle and rotate the page to reproduce the original form. We omitted the sample code for skew correction because it seemed to take time to prepare good sample images. If you want to know more about skew correction, please check this A Review of Skew Detection Techniques for Document out.
Summary
As with most image recognition algorithms, being aware of basic image pre-processing techniques and understanding how they work are the clues to increasing the accuracy of OCR. So, in this article, we have presented the commonly used image pre-processing techniques to improve the accuracy of OCR. In our upcoming blog, we will dive deeper into how we can use these pre-processing techniques to extract textual content from images in combination with some open-source OCR libraries.