Word Class Documentation

classWord

Namespace:com::datalogics::PDFL

Detailed Description

A word in a PDF file. Each word contains a sequence of characters in one or more styles (see Style).

Member Function Documentation

DisposeChildren

voidDisposeChildren()

Returns:

void

[static initializer]

static void[static initializer]()

delete

synchronized voiddelete(Booleandisposing)

Parameters

disposing: Boolean

Returns:

synchronized void

delete

synchronized voiddelete()

Returns:

synchronized void

finalize

voidfinalize()

Returns:

void

getAttributes

java.util.EnumSet< WordAttributeFlags >getAttributes()

Returns:

A set of flags containing information on the types of characters in word.

Gets a set of summary flags containing information on the types of characters in a word.

getCharQuads

Returns:

A list containing the Quads found in the words, for each individual character

Gets a list of Quads occupied by the individual characters in a word.

getIsLastWordInRegion

booleangetIsLastWordInRegion()

Returns:

true if the word is the last word in a region, false if it is not

Specifies whether this word is the last word in a region as determined by the WordFinder.

This can be useful for determining visual line breaks in tagged PDFs. In tagged PDF documents, WordAttributeFlags.LastWordOnLine is set according to the tags in the document, so that flag cannot be used to determine visual line breaks.

getQuads

Returns:

A list containing the Quads found in the word.

Gets the specified word's quads, specified in user space coordinates.

The quad's height is the height of the font's bounding box, not the height of the tallest character used in the word. The font's bounding box is determined by the glyphs in the font that extend farthest above and below the baseline; it often extends somewhat above the top of 'A' and below the bottom of 'y'.

The quad's width is determined from the characters actually present in the word.

For example, the quads for the words "AWAY" and "away" have the same height, but generally do not have the same width unless the font is a mono-spaced font (a font in which all characters have the same width).

Despite the names of the fields in an Quad (TopLeft for top left, BottomLeft for bottom left, and so forth) the corners of Quad do not necessarily have these positions.

getStyleTransitions

Returns:

the list of StyleTransition objects.

Gets a list of style transitions for the word. Every word has at least one style transition, at character position zero in the word.

getText

StringgetText()

Returns:

The text of the Word.

Gets a word's text and also converts ligatures to their constituent characters. The string to return includes any word break characters (such as space characters) that follow the word, but not any that precede the word.