» Optical Character Recognition (OCR)

Optical Character Recognition (OCR)

“…a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form.”(UNESCAP, Pop-IT project, 1997-2001)

The process of converting scanned images of machine-printed or handwritten text (numerals, letters, and symbols) into a computer-processable format; also known as optical character recognition (OCR). A typical OCR system contains three logical components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized. Text images are processed with OCR software and hardware. The process involves three operations: document analysis (extracting individual character images), recognizing these images (based on shape), and contextual processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices). The output interface is responsible for communication of OCR system results to the outside world.

Sourced from http://www.answers.com/topic/optical-character-recognition

An OCR system for inkjet printing in action

Optical Character Recognition

Elizabeth Sheehan (Editor: Nancy Stubbs)

The concept of Optical Character Recognition (OCR) has been around, in one form or another, for a good 200 years. The process of transforming an image of printed text into a text code, thereby making it machine-readable found its earliest incarnation in US patents for reading aids for the blind in the early 1800s (Schantz 1982). Subsequent technologists advanced OCR thinking, but it wasn’t until the 1950s and the burgeoning of the computer industry that OCR technology really took hold with its applications to data entry. In 1951, a young Department of Defense engineer named David Shepard developed a scanning device in his home that he nicknamed ‘Gismo.’ This device could read twenty-three letters of the alphabet, interpret Morse Code, and read aloud letter by letter. While crude by today’s standards, it nevertheless captured the imagination of scientists and businessmen alike with its potential as a practical business tool in the data entry field. ‘Gismo’ garnered quite a bit of publicity and consequently spurred on the further development of OCR tools.

THE TECHNOLOGY

There are two essential elements to OCR technology: scanning and recognition. During the image capture (or scanning) process, an electronic version of the original document is produced in the form of a bitmap image and is saved as a TIFF (Tagged-Image File Format) file.

The subsequent OCR process of turning that image into computer-editable text involves five discrete processes: identification of text and image blocks in the image, character recognition, word identification/recognition, correction, and formatting output (Haigh 1996). The essentials of this process are that the OCR software is trying to recognize text in some discernible pattern, comparing it to internal dictionaries, and then saving the ultimate file in a format that is editable by the end-user.

APPLICATIONS OF OCR

Since the 1950s, commercial and governmental applications of OCR technology have abounded. The Reader’s Digest company holds the distinction of being the first business to install an OCR reader for productivity purposes. It was used to convert typewritten sales reports into punched cards for downloading in the magazine’s subscription department (Schantz 1982).

The US government has utilized OCR technology in the US Post Office system for close to three decades to automate the mail handling process and improve its efficiency. Today there are 240 Post Office processing sites with optical character readers nationwide (Zahniser 1997). These OCR readers can capture an address anywhere on an envelope face and even correctly identify misspelled words, ensuring accurate routing.

In addition to the commercial applications of OCR technology, there is the compelling use of the technology in the social/educational arena. Applications of OCR technology with the blind and visually impaired allows the scanning of printed text and its recognition as is customary, but then adds the capability of a synthesizer so that the recognized text can then be spoken back in synthetic speech.

Lastly and inevitably, OCR technology has its applications to the individual PC user as a means of increasing productivity by allowing the modification and reuse of existing information. Business cards, magazine articles, formal documents — all can be scanned and recognized to eliminate the need for rekeying this information manually.

DRIVING FORCES OF OCR TECHNOLOGY

What drives the development of OCR technology? The requirements of accuracy and speed. When considering the use of an OCR software it must inevitably be judged against the alternative – manually rekeying a document. OCR software has become increasingly sophisticated in its ability to recognize text, thus ensuring a greater accuracy. What were stumbling blocks in the past, such as typographical and formatting complexities (e.g., bold, italics, font size, tables) are being overcome with powerful recognition features that most software now includes. However, accuracy rates are only in the high 90 percentiles, still requiring clean-up in the post-OCR phase.

Finally, scanning speed is also a factor. The faster the scanning speed, the more recognition errors. In the productivity battle of scanning versus rekeying, continued sophistication of scanning technology is essential to the growth of OCR technology.

POLICY THAT DRIVES OCR TECHNOLOGY

One of the ways industry sought to ensure accuracy in the OCR process was the establishment of standardization of fonts. In 1966, a standard character set was adopted by most manufacturers of optical scanning equipment. Jacob Rabinow, an OCR pioneer, was a great advocate of standardization of everything from the type and size of the paper scanned, to the quality of the printing, the format, and the font. All this, he believed, would make the OCR devices simpler and less costly (Schantz 1982). While these ideas may be applicable to the commercial use of OCR, it’s next to impossible to control in the free-wheeling PC environment where an end-user may want to scan anything from a business card to a book. And indeed, as OCR developers and manufacturers move away from customized OCR systems for large corporations and move towards smaller, standardized products it will be the consumer that drives the evolution of the technology making it a more widespread productivity tool.

THE BUSINESS OF OCR TECHNOLOGY

The field of OCR technology has had, at some point, some of the biggest names in technology working on the concept. In its infancy, when OCR was seen mainly as a means of streamlining data entry tasks, companies such as IBM, GE, and RCA were in the thick of manufacturing customized systems for the data processing industry.

Today, the focus of OCR technology is software — its ease of use, speed, and accuracy. Currently, the leading producers of OCR software are companies such as Caere Corporation (a Los Gatos, California firm that was founded in 1976 by Dr. Robert Noyce, inventor of the integrated circuit and founder of Intel), Xerox, and Hewlett-Packard, which bundles OCR software with it’s scanners.

The leading OCR programs are Caere’s ‘OmniPage Pro 7.0 for Windows 95,’ and Xerox’s ‘TextBridge Pro 96.’ Caere’s program is considered a bit pricey with a software cost of $499, in addition to which a user requires Windows 95 software, 8MB of RAM, 15MB of hard disk, and a scanning device (which can cost, on average $500). Xerox’s ‘TextBridge’ can operate in a Windows 3.1 environment, with 8MB of RAM, 10MB of hard disk space, and a scanner. It’s less costly at a $260 suggested retail price, but is considered less refined than ‘OmniPage.’ ‘OmniPage’ is considered very user-friendly with tools and wizards that guide you through the OCR process. Both programs, however, are considered advancements in the speed and accuracy of OCR technology.

OPPORTUNITIES, PROBLEMS & PROSPECTS

OCR technology has not just been reduced to the mundane tasks of simply replicating business cards and magazine articles. The growth of the global community represents a tremendous opportunity for OCR software developers. The ability to communicate across cultures is and will increasingly be an essential of modern business.

Currently, development of ‘universal’ OCR software is a burgeoning field. This is the ability to scan a foreign document, convert it to a foreign text document, and then translate it into another language. There are programs currently on the market that can translate up to 25-30 languages.

The next hurdle for optical recognition is handwriting. Currently, OCR technology works at its optimum level with clean, standardized text documents (i.e., typewritten, first-generation). This is what allows the recognition mechanism to work best. Handwriting is another matter altogether — the individuality of handwriting makes it indecipherable by standard OCR software.

The next level of OCR technology will include the use of Intelligent Character Recognition (ICR). This is a technology that involves pattern recognition and the processing system is based on the learning model of the human brain. Information is presented in a nonlinear form and the network develops its own guidelines based on that information and adapts to it. In a nutshell, ICR technology ‘learns’ to recognize non-standard text such as something as distinctive as one’s own handwriting. Exciting theory yes, but ICR still remains to a large extent unsuccessful. Increased computing power and more advanced “neural” or learning networks are necessary for advancement.

As information becomes increasingly available from various media outlets, we seem to be inundated with nothing but paper and the dream of a ‘paperless’ world in the computer-age has fallen by the wayside. Though paper will not disappear from our lives, OCR can help us handle it better. Optical Character Recognition can make text accessible to the blind, can streamline data entry, and can make the individual user just that much more productive.

Sourced from http://www-bcf.usc.edu/~wdutton/comm533/OCR-SHEE.htm