OCRMode Enumeration

ByteScout PDF Extractor SDK

Free Trial Web API version Licensing Request A Quote

HAVE QUESTIONS OR NEED HELP? SUBMIT THE SUPPORT REQUEST FORM or write email to SUPPORT@BYTESCOUT.COM

OCR (Optical Character Recognition) usage mode.

Namespace: Bytescout.PDFExtractor
Assembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 13.4.1.4801-master

Syntax

C++

Copy

public enum OCRMode

Public Enumeration OCRMode

public enum class OCRMode

type OCRMode

Members

Member name	Value	Description
Off	0	Off. No OCR is used.
Auto	1	Similar to TextFromImagesAndVectorsAndFonts but checks if the page contains only raster images to decide if need to use OCR. Runs OCR only if page contains very few text and one or more raster images. The result contains text objects produced from images and vector drawings.
TextFromImagesAndVectorsAndFonts	2	Always runs OCR to extract text from images and vector drawings (if any). See also .TextFromImagesAndFonts mode to read from objects except vector drawings. The result contains text objects from PDF and text objects produced from images and vector drawings using OCR functionality if any.
TextFromImagesAndVectorsAndRepairedFonts	3	Special mode: extracts text from images and vector drawings and repairs text from fonts fixing the incorrect encoding. Some PDF files contain visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like that using the OCR functionality. The result contains text objects from PDF and text objects produced from images and vector drawings using OCR functionality if any.
TextFromRepairedFontsOnly	4	Special mode: repairs text objects with incorrect encoding using OCR functionality. Images and vectors are not processed in this mode. Some PDF files contains visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like this using OCR function. This mode returns repaired text objects only (no images or vector drawings are processed).
TextFromImagesAndRepairedFonts	5	Special mode: extracts text from raster images (but skips vector drawings) and repairs text objects with incorrect encoding Some PDF files contains visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like this using the OCR functionality. This mode returns repaired text objects and text objects produced from raster images (no vector drawings are processed).
TextFromImagesAndFonts	6	Runs OCR to extract text from images (but skips vector drawings) plus the text objects. The result contains text objects from PDF and text objects produced from images (but no vector drawings are processed) using OCR functionality.
TextFromImagesOnly	7	Runs OCR to extract text from images (but skips vector drawings) plus the text objects. The result contains text extracted from images only.
TextFromVectorsOnly	8	Runs OCR to extract text from vector drawings only. The result contains text objects from vector drawings only.
TextFromImagesAndVectorsOnly	9	Runs OCR to extract text from images and vector drawings only. no text from pdf objects is included. The result contains text objects from vector drawings only.
TextFromVectorsAndRepairedFonts	10	Special mode: extracts text from vector drawings and repairs text from fonts fixing the incorrect encoding. Some PDF files contain visible text which is damaged when copied (appears as ? or other incorrect symbols when extracted or copied). This mode repairs damaged text like that using the OCR functionality.
TextFromVectorsAndFonts	11	Runs OCR to extract text from vector drawings (but skips images) plus the text objects. The result contains text objects from PDF and text objects produced from vector drawings using OCR functionality.
AutoRepairFonts	16	Sets whether to automatically try to detect PDF documents with corrupted text and forces OCR font repair instead. (!) Warning: the detection does not work with non-English texts or with small amount of text on the page.

Reference

Bytescout.PDFExtractor Namespace