WordFinderConfig Class Documentation

classWordFinderConfig

Namespace:datalogics_interface

Detailed Description

Wraps the key fields from PDWordFinderConfigRec to configure word extraction.

Constructor & Destructor Documentation

WordFinderConfig

WordFinderConfig()

Creates a WordFinderConfig with all options set to false (default behavior).

~WordFinderConfig

~WordFinderConfig()

Member Function Documentation

get_disable_char_reordering

boolget_disable_char_reordering()

Returns:

bool

When true, disables character reordering.

get_disable_tagged_pdf

boolget_disable_tagged_pdf()

Returns:

bool

When true, disables tagged PDF support.

get_ignore_char_gaps

boolget_ignore_char_gaps()

Returns:

bool

When true, disables converting large character gaps to spaces.

get_ignore_line_gaps

boolget_ignore_line_gaps()

Returns:

bool

When true, disables treating vertical movements as line breaks.

get_no_annotations

boolget_no_annotations()

Returns:

bool

When true, disables extracting text from annotations.

get_no_encoding_guess

boolget_no_encoding_guess()

Returns:

bool

When true, disables guessing encoding of fonts with unknown encoding.

get_no_ext_char_offset

boolget_no_ext_char_offset()

Returns:

bool

When true, disables extended character offset generation.

get_no_hyphen_detection

boolget_no_hyphen_detection()

Returns:

bool

When true, disables soft hyphen detection in non-tagged PDF.

get_no_ligature_expansion

boolget_no_ligature_expansion()

Returns:

bool

When true, disables default ligature expansion.

get_no_skewed_quads

boolget_no_skewed_quads()

Returns:

bool

When true, disables quad-per-character for skewed words.

get_no_style_info

boolget_no_style_info()

Returns:

bool

When true, disables character style information generation.

get_no_text_render_mode_3

boolget_no_text_render_mode_3()

Returns:

bool

When true, disables extracting invisible text (render mode 3).

get_no_xy_sort

boolget_no_xy_sort()

Returns:

bool

When true, disables generating an XY-ordered word list.

get_precise_quad

boolget_precise_quad()

Returns:

bool

When true, bounding boxes are based on actual glyph bounding boxes.

get_preserve_redundant_chars

boolget_preserve_redundant_chars()

Returns:

bool

When true, preserves redundant (overlapping) characters.

get_preserve_spaces

boolget_preserve_spaces()

Returns:

bool

When true, preserves space characters during word breaking.

get_trust_nb_space

boolget_trust_nb_space()

Returns:

bool

When true, treats non-breaking spaces as regular spaces.

get_unknown_to_std_enc

boolget_unknown_to_std_enc()

Returns:

bool

When true, assumes unknown encoding fonts are Standard Roman.

set_disable_char_reordering

voidset_disable_char_reordering(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables character reordering.

set_disable_tagged_pdf

voidset_disable_tagged_pdf(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables tagged PDF support.

set_ignore_char_gaps

voidset_ignore_char_gaps(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables converting large character gaps to spaces.

set_ignore_line_gaps

voidset_ignore_line_gaps(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables treating vertical movements as line breaks.

set_no_annotations

voidset_no_annotations(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables extracting text from annotations.

set_no_encoding_guess

voidset_no_encoding_guess(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables guessing encoding of fonts with unknown encoding.

set_no_ext_char_offset

voidset_no_ext_char_offset(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables extended character offset generation.

set_no_hyphen_detection

voidset_no_hyphen_detection(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables soft hyphen detection in non-tagged PDF.

set_no_ligature_expansion

voidset_no_ligature_expansion(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables default ligature expansion.

set_no_skewed_quads

voidset_no_skewed_quads(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables quad-per-character for skewed words.

set_no_style_info

voidset_no_style_info(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables character style information generation.

set_no_text_render_mode_3

voidset_no_text_render_mode_3(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables extracting invisible text (render mode 3).

set_no_xy_sort

voidset_no_xy_sort(boolvalue)

Parameters

value: bool

Returns:

void

When true, disables generating an XY-ordered word list.

set_precise_quad

voidset_precise_quad(boolvalue)

Parameters

value: bool

Returns:

void

When true, bounding boxes are based on actual glyph bounding boxes.

set_preserve_redundant_chars

voidset_preserve_redundant_chars(boolvalue)

Parameters

value: bool

Returns:

void

When true, preserves redundant (overlapping) characters.

set_preserve_spaces

voidset_preserve_spaces(boolvalue)

Parameters

value: bool

Returns:

void

When true, preserves space characters during word breaking.

set_trust_nb_space

voidset_trust_nb_space(boolvalue)

Parameters

value: bool

Returns:

void

When true, treats non-breaking spaces as regular spaces.

set_unknown_to_std_enc

voidset_unknown_to_std_enc(boolvalue)

Parameters

value: bool

Returns:

void

When true, assumes unknown encoding fonts are Standard Roman.