Options
All
  • Public
  • Public/Protected
  • All
Menu

Interface IndexingParametersConfiguration

Package version

A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.

Hierarchy

  • IndexingParametersConfiguration

Indexable

[property: string]: any

Describes unknown properties. The value of an unknown property can be of "any" type.

Index

Properties

Optional allowSkillsetToReadFileData

allowSkillsetToReadFileData: undefined | false | true

If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill. Default value: false.

Optional dataToExtract

Possible values include: 'StorageMetadata', 'AllMetadata', 'ContentAndMetadata'. Default value: 'contentAndMetadata'.

Optional delimitedTextDelimiter

delimitedTextDelimiter: undefined | string

For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "|").

Optional delimitedTextHeaders

delimitedTextHeaders: undefined | string

For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.

Optional documentRoot

documentRoot: undefined | string

For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.

Optional excludedFileNameExtensions

excludedFileNameExtensions: undefined | string

Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing. Default value: ''.

Optional executionEnvironment

executionEnvironment: IndexerExecutionEnvironment

Possible values include: 'standard', 'private'. Default value: 'standard'.

Optional failOnUnprocessableDocument

failOnUnprocessableDocument: undefined | false | true

For Azure blobs, set to false if you want to continue indexing if a document fails indexing. Default value: false.

Optional failOnUnsupportedContentType

failOnUnsupportedContentType: undefined | false | true

For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance. Default value: false.

Optional firstLineContainsHeaders

firstLineContainsHeaders: undefined | false | true

For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. Default value: true.

Optional imageAction

Possible values include: 'None', 'GenerateNormalizedImages', 'GenerateNormalizedImagePerPage'. Default value: 'none'.

Optional indexStorageMetadataOnlyForOversizedDocuments

indexStorageMetadataOnlyForOversizedDocuments: undefined | false | true

For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://docs.microsoft.com/azure/search/search-limits-quotas-capacity. Default value: false.

Optional indexedFileNameExtensions

indexedFileNameExtensions: undefined | string

Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types. Default value: ''.

Optional parsingMode

Possible values include: 'Default', 'Text', 'DelimitedText', 'Json', 'JsonArray', 'JsonLines'. Default value: 'default'.

Optional pdfTextRotationAlgorithm

pdfTextRotationAlgorithm: BlobIndexerPDFTextRotationAlgorithm

Possible values include: 'None', 'DetectAngles'. Default value: 'none'.

Optional queryTimeout

queryTimeout: undefined | string

Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". Default value: '00:05:00'.

Generated using TypeDoc