OCRParams Class Documentation
classOCRParamsNamespace:com::datalogics::PDFL
Detailed Description
Parameters for configuring the OCR (Optical Character Recognition) engine. OCRParams controls settings such as the engine to use, the resolution of the input images, the language(s) to recognize, performance tuning, page segmentation mode, candidate font names for placing text, and image preprocessing options.
Referenced by
Uses types
Constructor & Destructor Documentation
OCRParams
OCRParams()Create an OCRParams structure with defaults.
Member Function Documentation
DisposeChildren
voidDisposeChildren()Returns:
void[static initializer]
static void[static initializer]()delete
synchronized voiddelete(Booleandisposing)Parameters
disposing: Boolean
Returns:
synchronized voiddelete
synchronized voiddelete()Returns:
synchronized voidfinalize
voidfinalize()Returns:
voidgetCandidateFontNames
java.util.List< String >getCandidateFontNames()Returns:
java.util.List< String >The names of candidate fonts for placing text under an image. The default list should work well in most cases. If you're using text that isn't represented by Latin fonts, or by Chinese, Japanese, or Korean fonts, then retrieve this list, add the font that can represent that text, then set that new list on this object.
Enough font names must be supplied to cover the expected languages/scripts in use.
The code selects a font to represent each word. If a word code-switches between different scripts, for instance, if it contains non-Latin text and Arabic numerals, then make sure to supply the name of a font family that can handle both the text and the numerals.
The quality of the results depends on the font choice. The list is searched in order until a font works for a particular word. To make the text fit better, it's recommended to list proportional fonts before fixed-width fonts. Decorative fonts with flourishes, like Zapf Chancery, deliver poor results. Generally, supply a font that would be used in block text, such as in a newspaper or work of literature, such as Times Roman, or a font already in the list, like MinionPro.
If the PlaceTextUnder method in OCREngine can't identify a font that covers the whole text of a word, an exception will be thrown.
getConfigurationParameters
java.util.Map< String, String >getConfigurationParameters()Returns:
java.util.Map< String, String >Get the configuration parameters. Note: Reserved for internal use. Do not use unless directed to by Datalogics Support.
getEnableImagePreprocessing
booleangetEnableImagePreprocessing()Returns:
booleanGet the image preprocessing enable state.True if any image preprocessing is on.
getEngine
StringgetEngine()Returns:
StringThe name of the OCR engine to use. Use the Tesseract4Engine constant to select the Tesseract v4 engine.
getLanguages
java.util.List< LanguageSetting >getLanguages()Returns:
java.util.List< LanguageSetting >The list of languages the OCR engine should recognize.
getPageSegmentationMode
PageSegmentationModegetPageSegmentationMode()Returns:
PageSegmentationModeThe page segmentation mode, which controls how the OCR engine detects text regions on the page.
getPerformance
PerformancegetPerformance()Returns:
PerformanceThe desired performance mode for the OCR engine, balancing speed versus accuracy.
getResolution
doublegetResolution()Returns:
doubleThe resolution, in DPI, at which images are rendered for OCR processing.
getTesseract4Engine
static StringgetTesseract4Engine()Returns:
StringThe name of the Tesseract v4 engine. Pass this value to the Engine property to select the Tesseract v4 engine.
setCandidateFontNames
voidsetCandidateFontNames(java.util.List< String >candidateFonts)Parameters
candidateFonts: java.util.List< String >
Returns:
voidThe names of candidate fonts for placing text under an image. The default list should work well in most cases. If you're using text that isn't represented by Latin fonts, or by Chinese, Japanese, or Korean fonts, then retrieve this list, add the font that can represent that text, then set that new list on this object.
Enough font names must be supplied to cover the expected languages/scripts in use.
The code selects a font to represent each word. If a word code-switches between different scripts, for instance, if it contains non-Latin text and Arabic numerals, then make sure to supply the name of a font family that can handle both the text and the numerals.
The quality of the results depends on the font choice. The list is searched in order until a font works for a particular word. To make the text fit better, it's recommended to list proportional fonts before fixed-width fonts. Decorative fonts with flourishes, like Zapf Chancery, deliver poor results. Generally, supply a font that would be used in block text, such as in a newspaper or work of literature, such as Times Roman, or a font already in the list, like MinionPro.
If the PlaceTextUnder method in OCREngine can't identify a font that covers the whole text of a word, an exception will be thrown.
setConfigurationParameters
voidsetConfigurationParameters(java.util.Map< String, String >configurationParams)Parameters
configurationParams: java.util.Map< String, String >
Returns:
voidSet the configuration parameters. Note: Reserved for internal use. Do not use unless directed to by Datalogics Support.
setEnableImagePreprocessing
voidsetEnableImagePreprocessing(booleanenable)Parameters
enable: boolean
Returns:
voidEnable all image preprocessing, default enabled.Note that once the OCREngine is initialized, this setting is permanent.
setEngine
voidsetEngine(Stringvalue)Parameters
value: String
Returns:
voidThe name of the OCR engine to use. Use the Tesseract4Engine constant to select the Tesseract v4 engine.
setLanguages
voidsetLanguages(java.util.List< LanguageSetting >languages)Parameters
languages: java.util.List< LanguageSetting >
Returns:
voidThe list of languages the OCR engine should recognize.
setPageSegmentationMode
voidsetPageSegmentationMode(PageSegmentationModemode)Parameters
mode: PageSegmentationMode
Returns:
voidThe page segmentation mode, which controls how the OCR engine detects text regions on the page.
setPerformance
voidsetPerformance(PerformanceperformanceSetting)Parameters
performanceSetting: Performance
Returns:
voidThe desired performance mode for the OCR engine, balancing speed versus accuracy.
setResolution
voidsetResolution(doublevalue)Parameters
value: double
Returns:
voidThe resolution, in DPI, at which images are rendered for OCR processing.