WordFinderConfig Class Documentation
classWordFinderConfigNamespace:datalogics_interface
Detailed Description
Wraps the key fields from PDWordFinderConfigRec to configure word extraction.
Constructor & Destructor Documentation
WordFinderConfig
WordFinderConfig()Creates a WordFinderConfig with all options set to false (default behavior).
WordFinderConfig
WordFinderConfig(WordFinderConfig &&)Parameters
~WordFinderConfig
~WordFinderConfig()Member Function Documentation
get_disable_char_reordering
boolget_disable_char_reordering()Returns:
boolWhen true, disables character reordering.
get_disable_tagged_pdf
boolget_disable_tagged_pdf()Returns:
boolWhen true, disables tagged PDF support.
get_ignore_char_gaps
boolget_ignore_char_gaps()Returns:
boolWhen true, disables converting large character gaps to spaces.
get_ignore_line_gaps
boolget_ignore_line_gaps()Returns:
boolWhen true, disables treating vertical movements as line breaks.
get_no_annotations
boolget_no_annotations()Returns:
boolWhen true, disables extracting text from annotations.
get_no_encoding_guess
boolget_no_encoding_guess()Returns:
boolWhen true, disables guessing encoding of fonts with unknown encoding.
get_no_ext_char_offset
boolget_no_ext_char_offset()Returns:
boolWhen true, disables extended character offset generation.
get_no_hyphen_detection
boolget_no_hyphen_detection()Returns:
boolWhen true, disables soft hyphen detection in non-tagged PDF.
get_no_ligature_expansion
boolget_no_ligature_expansion()Returns:
boolWhen true, disables default ligature expansion.
get_no_skewed_quads
boolget_no_skewed_quads()Returns:
boolWhen true, disables quad-per-character for skewed words.
get_no_style_info
boolget_no_style_info()Returns:
boolWhen true, disables character style information generation.
get_no_text_render_mode_3
boolget_no_text_render_mode_3()Returns:
boolWhen true, disables extracting invisible text (render mode 3).
get_no_xy_sort
boolget_no_xy_sort()Returns:
boolWhen true, disables generating an XY-ordered word list.
get_precise_quad
boolget_precise_quad()Returns:
boolWhen true, bounding boxes are based on actual glyph bounding boxes.
get_preserve_redundant_chars
boolget_preserve_redundant_chars()Returns:
boolWhen true, preserves redundant (overlapping) characters.
get_preserve_spaces
boolget_preserve_spaces()Returns:
boolWhen true, preserves space characters during word breaking.
get_trust_nb_space
boolget_trust_nb_space()Returns:
boolWhen true, treats non-breaking spaces as regular spaces.
get_unknown_to_std_enc
boolget_unknown_to_std_enc()Returns:
boolWhen true, assumes unknown encoding fonts are Standard Roman.
operator=
WordFinderConfig &operator=(WordFinderConfig &&)Parameters
Returns:
WordFinderConfig &set_disable_char_reordering
voidset_disable_char_reordering(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables character reordering.
set_disable_tagged_pdf
voidset_disable_tagged_pdf(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables tagged PDF support.
set_ignore_char_gaps
voidset_ignore_char_gaps(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables converting large character gaps to spaces.
set_ignore_line_gaps
voidset_ignore_line_gaps(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables treating vertical movements as line breaks.
set_no_annotations
voidset_no_annotations(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables extracting text from annotations.
set_no_encoding_guess
voidset_no_encoding_guess(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables guessing encoding of fonts with unknown encoding.
set_no_ext_char_offset
voidset_no_ext_char_offset(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables extended character offset generation.
set_no_hyphen_detection
voidset_no_hyphen_detection(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables soft hyphen detection in non-tagged PDF.
set_no_ligature_expansion
voidset_no_ligature_expansion(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables default ligature expansion.
set_no_skewed_quads
voidset_no_skewed_quads(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables quad-per-character for skewed words.
set_no_style_info
voidset_no_style_info(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables character style information generation.
set_no_text_render_mode_3
voidset_no_text_render_mode_3(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables extracting invisible text (render mode 3).
set_no_xy_sort
voidset_no_xy_sort(boolvalue)Parameters
value: bool
Returns:
voidWhen true, disables generating an XY-ordered word list.
set_precise_quad
voidset_precise_quad(boolvalue)Parameters
value: bool
Returns:
voidWhen true, bounding boxes are based on actual glyph bounding boxes.
set_preserve_redundant_chars
voidset_preserve_redundant_chars(boolvalue)Parameters
value: bool
Returns:
voidWhen true, preserves redundant (overlapping) characters.
set_preserve_spaces
voidset_preserve_spaces(boolvalue)Parameters
value: bool
Returns:
voidWhen true, preserves space characters during word breaking.
set_trust_nb_space
voidset_trust_nb_space(boolvalue)Parameters
value: bool
Returns:
voidWhen true, treats non-breaking spaces as regular spaces.
set_unknown_to_std_enc
voidset_unknown_to_std_enc(boolvalue)Parameters
value: bool
Returns:
voidWhen true, assumes unknown encoding fonts are Standard Roman.