IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
[an error occurred while processing this directive]
 Select a country
 IBM Research Home
Almaden Research
About Us
Visitor Information
Almaden Projects
IBM Research News
Career Opportunities
Feedback

 
 


IBM Research - Almaden
Document Analysis and REcognition

Past Project
The primary focus of DARE is analyzing and automatically extracting information from scanned images of paper documents. This technology is very useful in minimizing the cost of capturing information from paper documents. The applications are in automated forms processing, bank check processing, mail sorting, etc. This technology can also be used for analyzing and indexing documents for digital library applications.

Automated forms processing reduces the cost of data entry for our customers and frees up their operators for less laborious/boring tasks.
Capturing information from paper documents is very expensive and labor intensive. It has been estimated that worldwide about $250 billion are spent annually (largely in operator salaries, etc.) in keying in information from paper documents, and this in capturing information from only 5% of the available documents. By providing automated means for this process, we can reduce the cost of data entry for our customers and free up their operators for less laborious/boring tasks. We could also enable them to capture information from a larger portion of their documents, and more quickly. Customers who will be most interested in exploiting this technology are in government, finance (banks), insurance, health care, etc.

There are many difficult problems to solve. For example, many paper documents, such as forms, are often filled out in hand. So, we need to have handwriting recognition capability. That is requiring lots of improvement and innovation. Figuring out what is useful and what is not on a paper document requires intelligence, and is a very interesting research problem. Many paper documents are noisy; the process of scanning paper documents could introduce additional noise and other distortions such as skew. Image processing to remove the noise is another challenging research problem.

We have developed address recognition software that has been used in two IBM solutions: a) automated flat mail sorting system, and b) automated parcel mail sorting system. IBM's mail sorting products have worldwide application, and we have already won important contracts at Swiss Post, Norway Post, Denmark Post, Luxembourg Post, and Finland Post. It is currently being evaluated by US Postal Service, and UPS among others. We had earlier developed recognition software for automated forms processing. This IBM solution is being used by several US states - Maryland, Wisconsin, Vermont, Georgia, etc., for state income tax processing.

   
Link to content List of Almaden Projects

  About IBM  |  Privacy  |  Terms of use  |  Contact