TextRecognizer Class

ByteScout Text Recognition SDK

Free Trial Web API version Licensing Request A Quote

HAVE QUESTIONS OR NEED HELP? SUBMIT THE SUPPORT REQUEST FORM or write email to SUPPORT@BYTESCOUT.COM

Represents text recognizer that able to extract text from scanned PDF files and PNG, JPEG, BMP and TIFF (single-page) images using Optical Character Recognition (OCR).

Inheritance Hierarchy

SystemObject
ByteScout.TextRecognitionBaseRecognizer
ByteScout.TextRecognitionTextRecognizer

Namespace: ByteScout.TextRecognition
Assembly: ByteScout.TextRecognition (in ByteScout.TextRecognition.dll) Version: 2.6.1.323-master

Syntax

C++

Copy

public class TextRecognizer : BaseRecognizer

Public Class TextRecognizer
	Inherits BaseRecognizer

public ref class TextRecognizer : public BaseRecognizer

type TextRecognizer =  
    class
        inherit BaseRecognizer
    end

The TextRecognizer type exposes the following members.

Constructors

	Name	Description
	TextRecognizer	Initializes a new instance of the TextRecognizer class.
	TextRecognizer(String, String)	Initializes a new instance of the TextRecognizer class.

Top

Properties

	Name	Description
	AutoDetectPageRotation	Gets or sets a value indicating whether the TextRecognizer will try to automatically detect the rotation of a scanned page. Default is false.
	BlackList	A set of characters not allowed to be recognized from scanned document. The resulting text will only contain characters that are not in this list. This helps improve uncertain recognition.
	ComHelpers	Set of helping methods for use from COM/ActiveX.
	Corrections	Collection of corrections automatically applied to recognized text to fix repeating recognition errors.
	ImagePreprocessingFilters	Collection of image preprocessing filters.
	IsDocumentLoaded	Gets whether a document is loaded. (Inherited from BaseRecognizer.)
	KeepTextFormatting	Gets or sets whether to try to keep the text formatting.
	LicenseInfo	Gets license information. (Inherited from BaseRecognizer.)
	MaximizeCPUUtilization	Gets or sets maximum OCR performance using Intel OpenMP (if available) to accelerate to approximately 30%. Default is false. (Inherited from BaseRecognizer.)
	OCRLanguage	Language for Optical Character Recognition (OCR). The valid values are: "eng" - English (default) "deu" - German "fra" - French "spa" - Spanish Download more languages at https://github.com/bytescout/ocrdata. (Inherited from BaseRecognizer.)
	OCRLanguageDataFolder	Folder containing OCR language data files. (Inherited from BaseRecognizer.)
	PageSeparator	Gets or sets the page separator character or string. Default is "\r\n".
	PDFRenderingOptions	Gets or sets PDF rendering options. (Inherited from BaseRecognizer.)
	PDFRenderingResolution	Gets or sets PDF rendering resolution. Default is 300 DPI. (Inherited from BaseRecognizer.)
	RecognitionAreas	Collection of page areas intended for text recognition.
	RegistrationKey	Gets or sets the key number part of registration information. (Inherited from BaseRecognizer.)
	RegistrationName	Gets or sets the name part of the registration information. (Inherited from BaseRecognizer.)
	TrimLeadingSpaces	Gets or sets whether to trim redundant leading spaces. Default is false. Works only if KeepTextFormatting is true.
	UnwrapParagraphs	Gets or sets whether to unwrap paragraph text. Default is false. Works only if KeepTextFormatting is true.
	Version	Gets version of the component. (Inherited from BaseRecognizer.)
	WhiteList	A set of characters allowed to be recognized from scanned document. Only characters from this list will appear in the result text. This helps improve uncertain recognition.

Top

Methods

	Name	Description
	CheckOCRComponents	(Inherited from BaseRecognizer.)
	Clear	Releases loaded document and allocated resources. (Inherited from BaseRecognizer.)
	Dispose	Releases managed resources of the component. (Inherited from BaseRecognizer.)
	Equals	(Inherited from Object.)
	Finalize	(Inherited from Object.)
	GetHashCode	(Inherited from Object.)
	GetOCRObjects	Performs the recognition and returns list of recognized text objects of specified level of discretization.
	GetOCRObjectsAsJSON	Performs the recognition and returns the list of recognized text objects of specified level of discretization as JSON string.
	GetOCRObjectsAsXML	Performs the recognition and returns the list of recognized text objects of specified level of discretization as XML string.
	GetPageCount	Returns number of pages in loaded document. (Inherited from BaseRecognizer.)
	GetPageHeight	Returns document page height in pixels. (Inherited from BaseRecognizer.)
	GetPageSize	Returns document page dimensions in pixels. (Inherited from BaseRecognizer.)
	GetPageWidth	Returns document page width in pixels. (Inherited from BaseRecognizer.)
	GetPreprocessedPageBitmap	Returns preview image of document page with preprocessing filters applied.
	GetText	Reads text from specified document page range.
	GetType	(Inherited from Object.)
	LoadDocument(Byte)	Loads document from byte array. (Inherited from BaseRecognizer.)
	LoadDocument(Image)	Loads document from Image object. (Inherited from BaseRecognizer.)
	LoadDocument(Int64)	Loads document from Win32 HBITMAP structure. (Inherited from BaseRecognizer.)
	LoadDocument(Stream)	Loads document from stream. (Inherited from BaseRecognizer.)
	LoadDocument(String)	Loads document from file. (Inherited from BaseRecognizer.)
	LoadDocument(ScreenshotMaker)	Load screenshot from the main display. Use SetScreenshotArea(Int32, Int32, Int32, Int32) to set a portion of the screen to take screenshot from. (Inherited from BaseRecognizer.)
	MemberwiseClone	(Inherited from Object.)
	OnPasswordRequired	(Inherited from BaseRecognizer.)
	SaveOCRObjectsAsJSON	Performs the recognition and saves the list of recognized text objects of specified level of discretization to JSON file.
	SaveOCRObjectsAsXML	Performs the recognition and saves the list of recognized text objects of specified level of discretization to XML file.
	SavePreprocessedPageBitmap	Saves bitmap of document page with preprocessing filters applied. The image is saved in PNG format.
	SaveText(Stream, Int32, Int32, Encoding)	Saves text from specified page range to Stream.
	SaveText(String, Int32, Int32, Encoding)	Saves text from specified page range to file.
	ToString	(Inherited from Object.)

Top

Events

	Name	Description
	PasswordRequired	Occurs when a password is required to open PDF document. (Inherited from BaseRecognizer.)

Top

Reference

ByteScout.TextRecognition Namespace