Apryse SDK Node Class: DataExtractionOptions

new DataExtractionOptions()

Options for PDFNet.DataExtractionModule.extractData, PDFNet.DataExtractionModule.extractToXSLX, and PDFNet.DataExtractionModule.extractToXSLXWithFilter

Methods

addExclusionZonesForPage(regions, page_num)

Adds an Exclusion Zone to the ExclusionZonesForPage array. Optional list of page areas to be excluded from analysis. Zones should be provided as a collection of Rects paired with a page number. The Rects are then applied to the corresponding page. Rects are specified in User Space coordinates. If this is set, the specified areas will not be analyzed. If neither this nor InclusionZonesForPage is set, the entire page will be analyzed. This option only affects the GenericKeyValue, FormKeyValue, and FormField engines.

Parameters:

Name	Type	Description
`regions`	Array.<PDFNet.Rect>	List of page areas to be excluded from analysis.
`page_num`	number	The page number (1-indexed) to which the regions are applied.

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

addInclusionZonesForPage(regions, page_num)

Adds an Inclusion Zone to the InclusionZonesForPage array. Optional list of page areas to be included in analysis (to the exclusion of all other areas). Zones should be provided as a collection of Rects paired with a page number. The Rects are then applied to the corresponding page. Rects are specified in User Space coordinates. If this is set, only the areas specified will be analyzed. If neither this nor ExclusionZonesForPage is set, the entire page will be analyzed. This option only affects the GenericKeyValue, FormKeyValue, and FormField engines.

Parameters:

Name	Type	Description
`regions`	Array.<PDFNet.Rect>	List of page areas to be included in analysis.
`page_num`	number	The page number (1-indexed) to which the regions are applied.

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

getDeepLearningAssist()

Gets the value DeepLearningAssist from the options object Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

Returns:

the current value for DeepLearningAssist.

Type: boolean

getFormExtractionEngine()

Gets the value FormExtractionEngine from the options object Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'. Note: The 'FormKeyValue' engine is experimental and subject to change.

Returns:

the current value for FormExtractionEngine.

Type: string

getLanguage()

Gets the value Language from the options object Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Returns:

the current value for Language.

Type: string

getOverlappingFormFieldBehavior()

Gets the value OverlappingFormFieldBehavior from the options object When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

Returns:

the current value for OverlappingFormFieldBehavior.

Type: string

getPages()

Gets the value Pages from the options object Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

Returns:

the current value for Pages.

Type: string

getPDFPassword()

Gets the value PDFPassword from the options object Specifies the password if the PDF requires one. The default is no password.

Returns:

the current value for PDFPassword.

Type: string

setDeepLearningAssist(value)

Sets the value for DeepLearningAssist in the options object Specifies if Deep Learning is used with table recognition in the DocStructure engine. The default is false. When true, table recognition accuracy improves at the cost of increased processing time. This only affects the DocStructure engine.

Parameters:

Name	Type	Description
`value`	boolean	the new value for DeepLearningAssist

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

setFormExtractionEngine(value)

Sets the value for FormExtractionEngine in the options object Specifies the form extraction engine used in DetectAndAddFormFieldsToPDF, either 'Form' or 'FormKeyValue'. The default is 'Form'. Note: The 'FormKeyValue' engine is experimental and subject to change.

Parameters:

Name	Type	Description
`value`	string	the new value for FormExtractionEngine

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

setLanguage(value)

Sets the value for Language in the options object Specifies the OCR language(s). Use 3-letter ISO 639-2 language codes, separated by spaces. Example: "eng deu spa fra". The default is English.

Parameters:

Name	Type	Description
`value`	string	the new value for Language

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

setOverlappingFormFieldBehavior(value)

Sets the value for OverlappingFormFieldBehavior in the options object When a detected form field overlaps with an existing one, keep either the old field (value 'KeepOld'), or the new one (value 'KeepNew', default).

Parameters:

Name	Type	Description
`value`	string	the new value for OverlappingFormFieldBehavior

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

setPages(value)

Sets the value for Pages in the options object Specifies a range of pages to be converted, such as "1-5". By default all pages are converted. The first page has the page number of 1.

Parameters:

Name	Type	Description
`value`	string	the new value for Pages

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

setPDFPassword(value)

Sets the value for PDFPassword in the options object Specifies the password if the PDF requires one. The default is no password.

Parameters:

Name	Type	Description
`value`	string	the new value for PDFPassword

Returns:

this object, for call chaining

Type: PDFNet.DataExtractionModule.DataExtractionOptions

Class: DataExtractionOptions

new DataExtractionOptions()

Methods

addExclusionZonesForPage(regions, page_num)

Parameters:

Returns:

addInclusionZonesForPage(regions, page_num)

Parameters:

Returns:

getDeepLearningAssist()

Returns:

getFormExtractionEngine()

Returns:

getLanguage()

Returns:

getOverlappingFormFieldBehavior()

Returns:

getPages()

Returns:

getPDFPassword()

Returns:

setDeepLearningAssist(value)

Parameters:

Returns:

setFormExtractionEngine(value)

Parameters:

Returns:

setLanguage(value)

Parameters:

Returns:

setOverlappingFormFieldBehavior(value)

Parameters:

Returns:

setPages(value)

Parameters:

Returns:

setPDFPassword(value)

Parameters:

Returns:

Search results