IHTMLExtractor PropertiesByteScout PDF To HTML SDK

The IHTMLExtractor type exposes the following members.


Public propertyAddFontStyleHTMLTagsToText
Controls if HTML output adds font style information to text objects True by default, set to False to output text objects as plain text objects without font size and style defined
Public propertyColumnDetectionMode
Column detection mode.
Public propertyControlsAsText
Controls if renders the form text controls to a plain text objects. False by default, set to True to display controls as text.
Public propertyDetectHyperLinks
Controls if URL links will be detected as set as clickable links or not True by default.
Public propertyDetectLinesInsteadOfParagraphs
Gets or sets a value indicating whether to detect single lines or multiple lines of text
Public propertyDetectNewColumnBySpacesRatio
Table columns detection option.
Public propertyExtractAnnotations
Gets or sets a value indicating whether to extract text from annotation objects. Default is true.
Public propertyExtractColumnByColumn
Gets or sets a value indicating whether to extract text column by column or use the visual layout of the text while extracting. False by default. if you are processing PDF newspapers with text columns, set this property to True so you get column by column instead of line by line
Public propertyExtractInvisibleText
Gets or sets a value indicating whether to extract invisible text from PDF document.
Public propertyExtractionMode
Extraction mode: plain HTML or formatted HTML with CSS.
Public propertyExtractShadowLikeText
Gets or sets a value indicating whether to include characters used to create "shadow" effect (when the same character appears with some offset) from PDF document. True by default (includes all encoded characters disregarding their real appearance).
Public propertyFontSubstitutionMap
Map to substitute fonts, you may add new mapping to match one source font to the target source font in output HTML
Public propertyHighPrecisionTextPositioning
Gets or sets a value indicating whether to use the high precision text positioning.
Public propertyKeepOriginalFontNames
By default HTMLExtractor replaces names of embedded fonts with standard (or "descendant") fonts similar by metrics and typeface. This is because embedded fonts differ from fonts installed into your system or absent there at all. Set this property to true if you want to keep the original font names.
Public propertyLineGroupingMode
Sets how lines are grouped into paragraphs. Default: no lines grouping is performed.
Public propertyOptimizeImages
Gets or sets optimization of images (True by default)
Public propertyOutputImageFormat
Defines format for output images. Default is JPEG.
Public propertyOutputPageWidth
Width of the output pages rendered into HTML.
Public propertyPreserveFormattingOnTextExtraction
Gets or sets a value indicating whether to preserve the text formatting on the extraction.
Public propertyRemoveHyphenation
Gets or sets a value indicating whether to automatically remove hyphenations in end of lines (works when Unwrap is True).
Public propertySaveImages
Get or sets the image handling (skip, embed, or save to outer file).
Public propertyUnwrap
Gets or sets a value indicating whether to unwrap lines into single lines or not (especially could be useful in the column layout mode - see ExtractColumnByColumn property). Default is False.
See Also