OCR Fundamentals - Part1
📌 Basics and Workflow
Foreword
Our blog for this time will explain how to use a method called OCR for extracting textual data from images. It includes three parts: Part 1 describes the basics and workflow of OCR, Part 2 introduces some of the commonly used image pre-processing techniques for OCR, and Part 3 demonstrates the use of those techniques in combination with well-known open-source OCR libraries.
What is OCR?
The OCR (Optical Character Recognition) process identifies and recognizes text in images, such as scanned documents and photos, and converts them into digital characters for computers use. It relates closely to computer vision and pattern recognition and is part of artificial intelligence. Now let's look at how OCR algorithms recognize text in images.
Workflow of OCR
Following are the five basic steps of most OCR algorithms:
Step 1: Pre-processing
An original image may have some quality issues, which result in poor text recognition, but luckily, image processing algorithms usually solve these issues. Denoising, binarization, and skew correction are the typical examples applied to images before passing them to OCR.
Listed here are some of the most effective pre-processing techniques:
- Scaling
- Denoising
- Blurring
- Binarization
- Morphological Operations
- Skew correction
Step 2: Segmentation
After solving the quality issues, in the segmentation step, we divide the image into parts in the following order:
Line level segmentation: In this step, we get a skew-corrected image that contains text written as a line used for further segmentation.
Word level segmentation: Word-level segmentation identifies the parts within a single line retrieved in the previous step, and each of those parts includes a word.
Character level segmentation: The image parts retrieved from the previous step are divided up character by character in this step.
Step 3: Feature extraction
Once the image is segmented, it is possible to identify the text contents, and the feature extraction step carries the first stage of this. It results in some characteristics, such as closed loops, line direction, and line intersections, and each of them has its feature representation.
Some are statistical features, for example, the density of dots in a specific area, and some are structural features such as aspect ratios, cross points, branch points, strokes with their directions, and horizontal curves at the top or bottom.
Step 4: Classification
In the classification step, we classify each character in the image segment using the features collected in the previous step. Classification methods can be statistical models, machine learning algorithms, and deep neural networks.
Step 5: Post-processing
Most OCR algorithms carry out a post-processing phase that double-checks the initial output. It reduces the number of errors by evaluating the result with a word dictionary to determine the correct spelling. For example, the characters "l (lowercase L)" and "I (uppercase i)" can be barely distinguishable, especially when handwriting is involved. That makes the post-process phase crucial for better accuracy.
Summary
In this article, we have introduced the basics of OCR and how internal processing steps work. As mentioned earlier, OCR is a method that recognizes text in a scanned image and converts it into text data. It’s so helpful, yet not perfect. That is why we have to understand the capabilities and limitations of OCR before using it. If you want to know more about OCR, we recommend reading our next article.