DL Logo

PDDocTextFinder Typedefs

PDDocTextFinder

Header: DLExtrasExpT.h:388

Description

Extracts words or phrases that match a regular expression (regex) on a given page range or on all of the pages in a document.

Syntax

typedef struct _t_PDDocTextFinder *PDDocTextFinder;

Returned From

Used By

PDDocTextFinder Structures

PDDocTextFinderMatchList

Header: DLExtrasExpT.h:420

Description

Structure containing the matches.

Syntax

struct PDDocTextFinderMatchList {
A pointer to an array that holds the matches.
ASUns32 numMatches;
The number of matches in the matches array.
} PDDocTextFinderMatchList;

Returned From

PDDocTextFinderMatchQuadRec

Header: DLExtrasExpT.h:400

Description

Structure representing a matched phrase bounding quad and the page it was located on.

Syntax

struct PDDocTextFinderMatchQuadRec {
ASInt32 pageNum;
Page number where quad is located
ASFixedQuad boundingQuad;
Matched phrase bounding quad
} PDDocTextFinderMatchQuadRec;

Used In

PDDocTextFinderMatchRec

Header: DLExtrasExpT.h:408

Description

Structure representing the matched phrase and the array of quads that make up the phrase.

Syntax

struct PDDocTextFinderMatchRec {
char *phrase;
The matched phrase.
ASUns32 phraseLength;
The length of matched phrase.
A pointer to an array that holds the match's quad information.
ASUns32 numQuads;
The number of quads in the quad array.
} PDDocTextFinderMatchRec;

Used In

PDDocTextFinder Functions

PDDocTextFinderAcquireMatchList

Header: DLExtrasProcs.h:998

Description

Finds all regular expression (regex) matches for the given page range.

Only words within or partially within the page's crop box (see PDPageGetCropBox()) are included. Words outside the crop box are skipped.

There can be only one match list in existence at a time; clients must release the previous match list, using PDDocTextFinderReleaseMatchList(), before creating a new one.

Available only on Windows, Mac, and Linux platforms

Syntax

PDDocTextFinderMatchList PDDocTextFinderAcquireMatchList(PDDocTextFinder mObj, PDDoc pdDoc, ASInt32 beginPageNumber, ASInt32 endPageNumber, const char *regexstr);

Parameters

mObj
IN (Required) The document text finder used to acquire the match list.
pdDoc
IN (Required) The document to search for matches.
beginPageNumber
IN (Required) The beginning page number from which to search. The first page is 0, not 1 as designated in Acrobat. Pass PDAllPages (see PDExpT.h) to sequentially process all pages in the document.
endPageNumber
IN (Required) The end page number from which to search to. If beginPageNumber is set to PDAllPages, this parameter is ignored.
regexStr
IN (Required) Regular expression to use for search.

Returns

The PDDocTextFinderMatchList structure.

Exceptions

PDDocTextFinderCreate

Header: DLExtrasProcs.h:930

Description

Creates a document text finder that is used to extract words or phrases that match regular expressions from a PDF file based on words extracted using a given word finder configuration. Available only on Windows, Mac, and Linux platforms

Syntax

PDDocTextFinder PDDocTextFinderCreate(PDWordFinderConfig wfConfig);

Parameters

wfConfig
IN (Required) The word finder configuration to be used to extract the words.

Returns

The newly created document text finder object that will be used to find matches.

PDDocTextFinderCreateEx

Header: DLExtrasProcs.h:948

Description

Creates a document text finder with additional configurable properties. Available only on Windows, Mac, and Linux platforms

Syntax

PDDocTextFinder PDDocTextFinderCreateEx(PDWordFinderConfig wfConfig, PDDocTextFinderConfig dtfConfig);

Parameters

wfConfig
IN (Required) The word finder configuration to be used to extract the words.
dtfConfig
IN (Required) The document text finder configuration to be used to configure the extracted text.

Returns

The newly created document text finder object that will be used to find matches.

PDDocTextFinderDestroy

Header: DLExtrasProcs.h:962

Description

Destroys a document text finder. Use this when you are done extracting phrases in a file. Available only on Windows, Mac, and Linux platforms

Related Methods

Syntax

void PDDocTextFinderDestroy(PDDocTextFinder mObj);

Parameters

mObj
IN (Required) The document text finder to destroy.

PDDocTextFinderReleaseMatchList

Header: DLExtrasProcs.h:1014

Description

Releases the match list. Use this to release a list created by PDDocTextFinderAcquireMatchList() when you are done using this list. Available only on Windows, Mac, and Linux platforms

Syntax

void PDDocTextFinderReleaseMatchList(PDDocTextFinder mObj);

Parameters

mObj
IN (Required) A document text finder object.