OCR Tehnology Features
Imaging module
Load and save pictures in formats such as BMP, PNG, TIFF, PDF and JPEG. Use JPEG2000 and JBIG2 compression, with a separate extension.
Pre-processing
Optimize your OCR results and clean up original images, with features like adaptive binarization, despeckle filters, deskew feature, document rotation. Dark border removal, line removal, color dropout are available in a separate extension.
Text recognition
Standard version of IRIS OCR SDK is available in 137+ languages, with various add-ons: Asian, Hebrew, Arabic, Banking Fonts, ICR.
Barcode recognition
Our barcode recognition module is able to recognize popular 1D barcodes such as code 39, code 128, EAN, UPC. A separate add-on provides for 2D barcodes decoding for the recognition of PDF417, QR code and data matrix.
Document output
Document output formats in standard IRIS OCR SDK are: PDF, PDF/A, HTML, XML, RTF, TXT, ODT, WordML, SpreadsheetML, CSV, DOCX, XLSX and XPS. An additional compression module generates compressed files using our iHQC technology in PDF and XPS.
Page processing
Zone Recognition
Automatic page orientation recognition
Automatic images perspective correction of document images captured through camera
Automatic punch hole removal capabilities
Add a separator like a blank page or a barcode between each document to tell the OCR software to create different output files from a single batch of documents.
Recognition languages
Recognition languages: Afaan Oromo, Afrikaans, Albanian, Arabic, Asturian, Aymara, Azeri (Latin), Balinese, Basque, Bemba, Bikol, Bislama, Bosnian (Cyrillic), Bosnian (Latin), Brazilian, Breton, Bulgarian, Bulgarian-English, Byelorussian, Byelorussian-English, Catalan, Cebuano, Chamorro, Chinese (Simplified), Chinese (Traditional), Corsican, Croatian, Czech, Danish, Dutch, English (UK), English (USA), Esperanto, Estonian, Faroese, Farsi, Fijian, Finnish, French, Frisian, Friulian, Galician, Ganda, German, German (Switzerland), Greek, Greek-English, Greenlandic, Haitian Creole, Hani, Hebrew, Hiligaynon, Hungarian, Icelandic, Ido, Ilocano, Indonesian, Interlingua, Irish (Gaelic), Italian, Japanese, Javanese, Kapampangan, Kazakh, Kikongo, Kinyarwanda, Korean, Kurdish, Latin, Latvian, Lithuanian, Luba, Luxembourgish, Macedonian, Macedonian-English, Madurese, Malagasy, Malay, Manx (Gaelic), Maori, Mayan, Mexican, Minangkabau, Moldovan, Mongolian (Cyrillic), Nahuatl, Norwegian, Numeric, Nyanja, Nynorsk, Occitan, Papiamento, Pidgin English (Nigeria), Polish, Portuguese, Quechua, Rhaeto-Romance, Romanian, Rundi, Russian, Russian-English, Samoan, Sardinian, Scottish (Gaelic), Serbian, Serbian (Latin), Serbian-English, Shona, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tahitian, Tatar (Latin), Tetum, Tok Pisin, Tonga, Tswana, Turkish, Turkmen (Latin), Ukrainian, Ukrainian-English, Uzbek, Waray, Welsh, Wolof, Xhosa, Zapotec, Zulu.
Handwriting Recognition
Cursive handwriting can not be recognized with OCR technology, because “optical character recognition” is only tuned for printed texts
◾ Handwritten text can be recognized only if the characters are written separated (“handprinted text”) This recognition scenario is called ICR and most often used for:
⦁ Zonal Recognition (OCR,ICR)
⦁ Forms Processing
Scanned image resolution
What image resolution is the best one?
300 dpi resolution is recommended for scanning documents.
More precise rules can be found below
⦁ For regular texts (font size 8-10 points) it is recommended to use 300 dpi resolution for OCR
⦁ A smaller resolution will lead to a quality and speed degradation
⦁ For font size smaller then 8 points 400-600 dpi resolution is recommended
⦁ Font size from 12 to 20 points is the best for better quality and speed
Color scan
Color scan, gray or black and white, wich is the best?
The correct recognition of the characters depends on the clarity with which they are distinguished from the background on which they are written.
Characters written on gray or colored backgrounds can lead to recognition errors, as this background makes it difficult to read characters. However, thanks to our state-of-the-art technology, colors are interpreted separately and can be eliminated in the recognition process if they have overlapping characters. Therefore, our recommendation is color scanning if documents have color areas. However, even if the documents are only black and white, we still recommend color scanning to maintain a smooth flow, because the speed difference between color scanning and black and white scanning is minimal.