Digital Learning

Friday, September 20, 2019

Optical Character Recognition or Reader:

Optical character recognition or optical character reader (OCR)

* Optical character recognition is a digital conversion of
images,printed text into machine-encoded text, whether from a scanned document, a photograph of a document.
* OCR is usually used as a “hidden” technology, powering many documented systems and services in our lifestyle .Use cases for OCR technology include data entry automation, indexing documents for search engines, automatic number plate recognition, also as assisting blind and visually impaired persons.
* OCR technology has proven immensely useful in digitizing historic newspapers and texts that have now been converted into fully searchable formats and had made accessing those earlier texts easier and faster.
* OCR (Optical Character Recognition) is a technology that helps to convert different types of documents files or images into editable data. 

The basic process of OCR includes a  text analyzing of a document and converting the characters into code that can be used for data processing. OCR is also referred to as text recognition.
* OCR systems are made up of a combination of hardware and software that is used to change physical documents into machine-readable text.

* The process of OCR is most commonly used to turn  hard copy of historic documents into PDFs. Once placed in this soft copy, users can edit, format and search the document as if it was created with a word processor.
How optical character recognition works:
* The initial step of OCR is using a scanner to process the physical form of a document or file. When all pages are copied, OCR software converts the document into a two-color, or black and white. The scanned image is analyzed for light and dark areas, where the dark areas are selected as characters that need to be recognized and light areas are identified as background.
* The dark areas are then processed further to find alphabetic letters and numeric digits. OCR programs can vary in their techniques, but typically involve targeting one character, word of text at a time. 
Characters are then identified using one of two algorithms:
* Pattern recognition- OCR programs are input some examples of text in many fonts and formats which are then used to compare, and recognize, characters in the scanned files.
* Feature detection- OCR programs follow rules regarding the features of a specific letter or number to recognize characters in the scanned document. Features could include the number of angled lines, crossed lines or curves in a character for comparison. For example, the capital letter “A” may be stored as two diagonal lines that meet with a horizontal line across the middle.
* When a character is identified, it is converted into an ASCII code that can be used by computer systems to handle further manipulations. Users should correct basic errors, proofread and make sure difficult layouts were handled properly before saving the document for further use.
Optical character recognition use cases:

OCR can be used for different applications:
* Scanning printed documents into versions that can be edited with word processors, like Microsoft Word.

Indexing print material for search engines.
Automating data entry, extraction and processing.
Deciphering documents into text that can be read aloud to visually-impaired or blind users.
* Archiving historic information, such as newspapers, magazines or phone books, into searchable formats.

Electronically depositing checks without the need for a bank teller.Placing important, signed legal documents into an electronic database.Sorting letters for mail delivery.

Benefits of optical character recognition:
* The some advantages of OCR technology are saved time, decreased errors and minimized effort. 
* It also enables actions that are not capable with physical copies such as compressing into ZIP files, incorporating into a website and attaching to an email.
* While taking pictures of documents enables them to be digitally archived, OCR provides the added functionality of being able to edit and search those documents.

No comments:

Post a Comment