Field Extraction and Recognition
Field Extraction involves creating the image that contains the data from each field that was specified by the user. The extracted image contains no background text or form lines. This image can then be used to recognize the data in each field without recognizing data that is a part of the form. During the copy, any form text, extraneous lines (such as the tail of a signature) and form lines are removed. This leaves only the data that is to be captured and converted to text in the extracted_image.
Character Recognition involves extracting all the connected components from a field and performing either splitting or joining individual characters. This allows us to return a different number of characters for each of several split/join selections. For example, if there was a connected component Me in which the M and e were touching, the system might return 'M'-'e' , 'N'-'I'-'e' and 'I'-'V'-'I'-'e' as possible results of splitting the component. Another example, if there were two connected components '5','-' were the '-' is the top of 5, the system might return '5' and '5','-' as choices.
Once the characters are isolated, the characters are normalized, features extracted and classification is performed using neural networks depending upon the field data type.
Cursive Recognition
Cursive word recognition is a very challenging problem because of ambiguous character boundaries, ligatures, and a large variation of writing styles in cursive handwriting.
We have developed a cursive word recognition system. The system is based on a modified Hidden Markov Model (HMM) and Dynamic Programming techniques. A cursive word is first segmented into a sequence of pieces called graphemes. A letter can have one, two or three graphemes (thus, a word is usually over-segmented). The neighboring graphemes are then assembled in such a way that a probability measure under the HMM framework is optimized.
Contextual Processing
Basic OCR recognition engines recognize digits and letters with a reasonable accuracy. Still, the recognition of words or phrases remains a challenge. The longer the word or phrase, the less the probability of recognizing the value without error. Fortunately, the use of contextual knowledge enables us to improve the recognition quite substantially. The context is defined as a set of constraints that the recognized information must satisfy: the more constraints, the less room for errors. For example, the sequence (city, state, zip) must satisfy the following constraints: the zip is composed of 5 digits; its value must belong to the list of possible zips, and the city must exist and correspond to the zip.
In many of today's systems, the exploitation of context is left to the application program. On the contrary, we are building a general context analyzer that can be customized for a particular application. A language called Document Specification Language (DSL) is provided to specify the syntax of the information to be recognized as well as constraints stating that a recognized value must exist in a given dictionary or that a particular routine applied to recognized value(s) must return TRUE.
The context analyzer interprets the tokens returned by the recognition engine (with all its multiple choices) according to the constraints imposed by the DSL "program." It looks at the multiple combinations, picks up the ones that satisfy the constraints and chooses the most probable one.
Example:
The ICR returns the following choices:
K e a c A e v M O 6 4 0 6 0 r u r y y 2 8CheckMate finds the following solution,
K e a r n e y M O 6 4 0 6 0 (with confidence 70/100)over the second choice (with a much lesser confidence):
F a r l e y M O 6 4 0 2 8 (with confidence 54/100)