Word Class Documentation

classWord

Namespace:datalogics_interface

Detailed Description

Each word contains a sequence of characters in one or more styles (see Style).

Referenced by

Constructor & Destructor Documentation

~Word

~Word()

Member Function Documentation

get_attributes

WordAttributeFlagsget_attributes()

Returns:

A set of WordAttributeFlags containing information on the types of characters in the word.

Gets a set of summary flags containing information on the types of characters in a word.

get_char_quads

std::vector< Quad >get_char_quads()

Returns:

A list containing the Quads found in the word, one for each individual character.

Gets a list of Quads occupied by the individual characters in a word.

get_is_last_word_in_region

boolget_is_last_word_in_region()

Returns:

true if the word is the last word in a region, false if it is not.

This can be useful for determining visual line breaks in tagged PDFs. In tagged PDF documents, WordAttributeFlags.LastWordOnLine is set according to the tags in the document, so that flag cannot be used to determine visual line breaks.

get_quads

Returns:

A list containing the Quads found in the word.

The quad's height is the height of the font's bounding box, not the height of the tallest character used in the word. The font's bounding box is determined by the glyphs in the font that extend farthest above and below the baseline; it often extends somewhat above the top of 'A' and below the bottom of 'y'.

The quad's width is determined from the characters actually present in the word.

For example, the quads for the words "AWAY" and "away" have the same height, but generally do not have the same width unless the font is a mono-spaced font (a font in which all characters have the same width).

Despite the names of the fields in an Quad (TopLeft for top left, BottomLeft for bottom left, and so forth) the corners of Quad do not necessarily have these positions.

get_style_transitions

std::vector< StyleTransition >get_style_transitions()

Returns:

The list of StyleTransition objects.

Every word has at least one style transition, at character position zero in the word.

get_text

std::stringget_text()

Returns:

The text of the Word.

The string to return includes any word break characters (such as space characters) that follow the word, but not any that precede the word.