WordFinder Class Documentation

classWordFinder

Namespace:com::datalogics::PDFL

Detailed Description

A class used to extract words from a document.

Referenced by

Constructor & Destructor Documentation

WordFinder

WordFinder(Documentdoc, WordFinderVersionalgVersion, WordFinderConfigwbConfig)

Parameters

doc: Document

document in which to find words

algVersion: WordFinderVersion

algorithm version to use

wbConfig: WordFinderConfig

configuration for this word finder

Create a word finder

Member Function Documentation

DisposeChildren

voidDisposeChildren()

Returns:

void

[static initializer]

static void[static initializer]()

delete

synchronized voiddelete(Booleandisposing)

Parameters

disposing: Boolean

Returns:

synchronized void

delete

synchronized voiddelete()

Returns:

synchronized void

enumWords

booleanenumWords(intpageNum, WordProcwordProc)

Parameters

pageNum: int

The page number from which to extract words. Pass AllPages to sequentially process all pages in the document.

wordProc: WordProc

A user-supplied callback to call once for each word found. Enumeration halts if wordProc returns false.

Returns:

true if enumeration was successfully completed, false if enumeration was terminated because wordProc returned false.

Extracts words, one at a time, from the specified page or the entire document. It calls a user-supplied procedure once for each word found.

Only words within or partially within the page's crop box (see Page.CropBox) are enumerated. Words outside the crop box are skipped.

Word objects passed to wordProc become invalid once the call to EnumWords returns, or if the page number changes between calls to the WordProc.

finalize

voidfinalize()

Returns:

void

getVisibleWordList

java.util.List< Word >getVisibleWordList(intpageNum)

Parameters

pageNum: int

the page number for which to get the list of words.

Returns:

a list of x, y sorted words on the specified page.

Finds all Visible words on the specified page and returns a list containing the words. The words are sorted by their x- and y-coordinates on the page.

getWordList

java.util.List< Word >getWordList(intpageNum)

Parameters

pageNum: int

the page number for which to get the list of words.

Returns:

a list of x, y sorted words on the specified page.

Finds all words on the specified page and returns a list containing the words. The words are sorted by their x- and y-coordinates on the page.