Word Class Documentation
classWordNamespace:com::datalogics::PDFL
Detailed Description
A word in a PDF file. Each word contains a sequence of characters in one or more styles (see Style).
Referenced by
Uses types
Member Function Documentation
DisposeChildren
voidDisposeChildren()Returns:
void[static initializer]
static void[static initializer]()delete
synchronized voiddelete(Booleandisposing)Parameters
disposing: Boolean
Returns:
synchronized voiddelete
synchronized voiddelete()Returns:
synchronized voidfinalize
voidfinalize()Returns:
voidgetAttributes
java.util.EnumSet< WordAttributeFlags >getAttributes()Returns:
A set of flags containing information on the types of characters in word.Gets a set of summary flags containing information on the types of characters in a word.
getCharQuads
java.util.List< Quad >getCharQuads()Returns:
A list containing the Quads found in the words, for each individual characterGets a list of Quads occupied by the individual characters in a word.
getIsLastWordInRegion
booleangetIsLastWordInRegion()Returns:
true if the word is the last word in a region, false if it is notSpecifies whether this word is the last word in a region as determined by the WordFinder.
This can be useful for determining visual line breaks in tagged PDFs. In tagged PDF documents, WordAttributeFlags.LastWordOnLine is set according to the tags in the document, so that flag cannot be used to determine visual line breaks.
getQuads
java.util.List< Quad >getQuads()Returns:
A list containing the Quads found in the word.Gets the specified word's quads, specified in user space coordinates.
The quad's height is the height of the font's bounding box, not the height of the tallest character used in the word. The font's bounding box is determined by the glyphs in the font that extend farthest above and below the baseline; it often extends somewhat above the top of 'A' and below the bottom of 'y'.
The quad's width is determined from the characters actually present in the word.
For example, the quads for the words "AWAY" and "away" have the same height, but generally do not have the same width unless the font is a mono-spaced font (a font in which all characters have the same width).
Despite the names of the fields in an Quad (TopLeft for top left, BottomLeft for bottom left, and so forth) the corners of Quad do not necessarily have these positions.
getStyleTransitions
java.util.List< StyleTransition >getStyleTransitions()Returns:
the list of StyleTransition objects.Gets a list of style transitions for the word. Every word has at least one style transition, at character position zero in the word.
getText
StringgetText()Returns:
The text of the Word.Gets a word's text and also converts ligatures to their constituent characters. The string to return includes any word break characters (such as space characters) that follow the word, but not any that precede the word.