Free Trial
Web API version
Licensing
Request A Quote
HAVE QUESTIONS OR NEED HELP? SUBMIT THE SUPPORT REQUEST FORM or write email to SUPPORT@BYTESCOUT.COM
Defines the PDF to XML extractor interface.
Namespace: Bytescout.PDFExtractor
Assembly: Bytescout.PDFExtractor (in Bytescout.PDFExtractor.dll) Version: 12.0.0.4062-master
Syntax
The IXMLExtractor type exposes the following members.
Properties
Name | Description | |
---|---|---|
![]() | AllowStandalonePunctuation |
Gets or sets whether to allow standalone punctuation characters. If false they will be merged with nearest text object.
|
![]() | DetectStrikeoutTextStyle |
Get or sets whether to detect the "strikeout" text style. Default is false.
|
![]() | DetectUnderlineTextStyle |
Get or sets whether to detect the "underline" text style. Default is false.
|
![]() | ImageFolder |
Gets or sets the folder to put extracted images when SaveImages property is set to ImageHandling.OuterFile.
Default is "images" - the extractor will create "images" sub-folder in the same folder with output XML file.
|
![]() | ImageFormat |
Gets or sets the image format for extracted images. Default is PNG.
|
![]() | KeepOriginalFontNames |
By default XMLExtractor replaces names of embedded fonts with standard (or "descendant") fonts similar by metrics and typeface.
This is because embedded fonts differ from fonts installed into your system or absent there at all.
Set this property to true if you want to keep the original font names.
|
![]() | SaveImages |
Get or sets the image saving way: do not save; save to outer file; embed into result XML as Base64 string. Default is ImageHandling.None.
|
![]() | SaveVectors |
Get or sets whether to save vector objects. Default is false.
|
Methods
Name | Description | |
---|---|---|
![]() | GetXML |
Extracts XML data from whole document as string.
|
![]() | GetXML(Int32, Int32) |
Extracts XML data from specifed page range.
|
![]() | GetXMLDocument |
Extracts XML data from whole document as XmlDocument.
|
![]() | GetXMLDocument(Int32, Int32) |
Extracts XML data from whole document as XmlDocument.
|
![]() | GetXMLDocumentFromPage |
Extracts XML data from specified document page as XmlDocument.
|
![]() | GetXMLFromPage |
Extracts XML data from specified document page as string.
|
![]() | SavePageXMLToFile |
Saves page XML data to file.
|
![]() | SavePageXMLToStream |
Saves page XML data to stream.
|
![]() | SaveXMLToFile(String) |
Saves XML data to file.
|
![]() | SaveXMLToFile(Int32, Int32, String) |
Saves XML data from specified page range to file.
|
![]() | SaveXMLToStream(Stream) |
Saves XML data to stream.
|
![]() | SaveXMLToStream(Int32, Int32, Stream) |
Saves XML data from specified page range to stream.
|
See Also