WordFinder Class Documentation
classWordFinderNamespace:datalogics_interface
Detailed Description
A class used to extract words from a document.
Uses types
Constructor & Destructor Documentation
WordFinder
WordFinder(Document &doc, WordFinderVersionversion, const WordFinderConfig &config)Parameters
doc: Document &document in which to find words
version: WordFinderVersionalgorithm version to use
config: const WordFinderConfig &configuration for this word finder
Construct a WordFinder with a specific configuration. Create a word finder
WordFinder
WordFinder(Document &doc, WordFinderVersionversion)Parameters
doc: Document &document in which to find words
version: WordFinderVersionalgorithm version to use
Create a word finder
WordFinder
WordFinder(WordFinder &&)Parameters
~WordFinder
~WordFinder()Member Function Documentation
enum_words
boolenum_words(intpage_num, WordCallbackcallback)Parameters
page_num: intThe page number from which to extract words. Pass AllPages to sequentially process all pages in the document.
callback: WordCallbackA user-supplied callback to call once for each word found. Enumeration halts if wordProc returns false.
Returns:
true if enumeration was successfully completed, false if enumeration was terminated because wordProc returned false.Extracts words, one at a time, from the specified page or the entire document. It calls a user-supplied procedure once for each word found.
Only words within or partially within the page's crop box (see Page::CropBox) are enumerated. Words outside the crop box are skipped.
Word objects passed to wordProc become invalid once the call to EnumWords returns, or if the page number changes between calls to the WordProc.
get_visible_word_list
std::vector< Word >get_visible_word_list(intpage_num)Parameters
page_num: intthe page number for which to get the list of words.
Returns:
a list of x, y sorted words on the specified page.Finds all Visible words on the specified page and returns a list containing the words. The words are sorted by their x- and y-coordinates on the page.
NOTE: Only words visible according to the Optional Content are returned.
get_word_list
std::vector< Word >get_word_list(intpage_num)Parameters
page_num: intthe page number for which to get the list of words.
Returns:
a list of x, y sorted words on the specified page.Finds all words on the specified page and returns a list containing the words. The words are sorted by their x- and y-coordinates on the page.
NOTE: All words found on the page are returned, even if they are currently invisible as specified by the Optional Content.
operator=
WordFinder &operator=(WordFinder &&)