IXMLExtractor Interface

ByteScout PDF Extractor SDK

Free Trial Web API version Licensing Request A Quote

HAVE QUESTIONS OR NEED HELP? SUBMIT THE SUPPORT REQUEST FORM or write email to SUPPORT@BYTESCOUT.COM

Defines the PDF to XML extractor interface.

Namespace: Bytescout.PDFExtractor
Assembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 13.4.1.4801-master

Syntax

C++

Copy

public interface IXMLExtractor

Public Interface IXMLExtractor

public interface class IXMLExtractor

type IXMLExtractor =  interface end

The IXMLExtractor type exposes the following members.

Properties

	Name	Description
	AllowStandalonePunctuation	Gets or sets whether to allow standalone punctuation characters. If false they will be merged with nearest text object.
	DetectStrikeoutTextStyle	Get or sets whether to detect the "strikeout" text style. Default is false.
	DetectUnderlineTextStyle	Get or sets whether to detect the "underline" text style. Default is false.
	ImageFolder	Gets or sets the folder to put extracted images when SaveImages property is set to ImageHandling.OuterFile. Default is "images" - the extractor will create "images" sub-folder in the same folder with output XML file.
	ImageFormat	Gets or sets the image format for extracted images. Default is PNG.
	IndentedXML	Get or sets whether to generate indented XML. Default is true.
	KeepOriginalFontNames	By default XMLExtractor replaces names of embedded fonts with standard (or "descendant") fonts similar by metrics and typeface. This is because embedded fonts differ from fonts installed into your system or absent there at all. Set this property to true if you want to keep the original font names.
	SaveImages	Get or sets the image saving way: do not save; save to outer file; embed into result XML as Base64 string. Default is ImageHandling.None.
	SaveVectors	Get or sets whether to save vector objects. Default is false.

Top

Methods

	Name	Description
	GetPageXMLAsVariant	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SavePageXMLToStream(Int32, Stream) for in-memory processing of PDF documents or images.
	GetXML	Extracts XML data from the entire document as string.
	GetXML(IListInt32)	Extracts XML data from specified page range.
	GetXML(String)	Extracts XML data from specified page range.
	GetXML(Int32, Int32)	Extracts XML data from specified page range.
	GetXMLAsVariant	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(Stream) for in-memory processing of PDF documents or images.
	GetXMLAsVariant(String)	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(String, Stream) for in-memory processing of PDF documents or images.
	GetXMLAsVariant(Int32, Int32)	Returns extracted XML data as array of bytes. This is COM/ActiveX-compatible version of the method SaveXMLToStream(Int32, Int32, Stream) for in-memory processing of PDF documents or images.
	GetXMLDocument	Extracts XML data from the entire document as XmlDocument.
	GetXMLDocument(IListInt32)	Extracts XML data from specified pages as XmlDocument.
	GetXMLDocument(String)	Extracts XML data from specified page ranges as XmlDocument.
	GetXMLDocument(Int32, Int32)	Extracts XML data from specified page range as XmlDocument.
	GetXMLDocumentFromPage	Extracts XML data from specified document page as XmlDocument.
	GetXMLFromPage	Extracts XML data from specified document page as string.
	SavePageXMLToFile	Saves page XML data to file.
	SavePageXMLToStream	Saves page XML data to stream.
	SaveXMLToFile(String)	Saves XML data from the entire document to file.
	SaveXMLToFile(IListInt32, String)	Saves XML data from specified pages to file.
	SaveXMLToFile(String, String)	Saves XML data from specified page ranges to file.
	SaveXMLToFile(Int32, Int32, String)	Saves XML data from specified page range to file.
	SaveXMLToStream(Stream)	Saves XML data to stream.
	SaveXMLToStream(IListInt32, Stream)	Saves XML data from specified pages to stream.
	SaveXMLToStream(String, Stream)	Saves XML data from specified page ranges to stream.
	SaveXMLToStream(Int32, Int32, Stream)	Saves XML data from specified page range to stream.

Top

Reference

Bytescout.PDFExtractor Namespace