Defines values for Answers.
KnownAnswers can be used interchangeably with Answers,
this enum contains the known values that the service supports.
none: Do not return answers for the query.
extractive: Extracts answer candidates from the contents of the documents returned in response to a query expressed as a question in natural language.
Defines values for ApiVersion20210430Preview.
KnownApiVersion20210430Preview can be used interchangeably with ApiVersion20210430Preview,
this enum contains the known values that the service supports.
2021-04-30-Preview: Api Version '2021-04-30-Preview'
Defines values for ApiVersion20210430Preview.
KnownApiVersion20210430Preview can be used interchangeably with ApiVersion20210430Preview,
this enum contains the known values that the service supports.
2021-04-30-Preview: Api Version '2021-04-30-Preview'
Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. This token filter is implemented using Apache Lucene.
Defines values for AutocompleteMode.
Ranking function based on the Okapi BM25 similarity algorithm. BM25 is a TF-IDF-like algorithm that includes length normalization (controlled by the 'b' parameter) as well as term frequency saturation (controlled by the 'k1' parameter).
Defines values for BlobIndexerDataToExtract.
KnownBlobIndexerDataToExtract can be used interchangeably with BlobIndexerDataToExtract,
this enum contains the known values that the service supports.
storageMetadata: Indexes just the standard blob properties and user-specified metadata.
allMetadata: Extracts metadata provided by the Azure blob storage subsystem and the content-type specific metadata (for example, metadata unique to just .png files are indexed).
contentAndMetadata: Extracts all metadata and textual content from each blob.
Defines values for BlobIndexerImageAction.
KnownBlobIndexerImageAction can be used interchangeably with BlobIndexerImageAction,
this enum contains the known values that the service supports.
none: Ignores embedded images or image files in the data set. This is the default.
generateNormalizedImages: Extracts text from images (for example, the word "STOP" from a traffic stop sign), and embeds it into the content field. This action requires that "dataToExtract" is set to "contentAndMetadata". A normalized image refers to additional processing resulting in uniform image output, sized and rotated to promote consistent rendering when you include images in visual search results. This information is generated for each image when you use this option.
generateNormalizedImagePerPage: Extracts text from images (for example, the word "STOP" from a traffic stop sign), and embeds it into the content field, but treats PDF files differently in that each page will be rendered as an image and normalized accordingly, instead of extracting embedded images. Non-PDF file types will be treated the same as if "generateNormalizedImages" was set.
Defines values for BlobIndexerPDFTextRotationAlgorithm.
KnownBlobIndexerPDFTextRotationAlgorithm can be used interchangeably with BlobIndexerPDFTextRotationAlgorithm,
this enum contains the known values that the service supports.
none: Leverages normal text extraction. This is the default.
detectAngles: May produce better and more readable text extraction from PDF files that have rotated text within them. Note that there may be a small performance speed impact when this parameter is used. This parameter only applies to PDF files, and only to PDFs with embedded text. If the rotated text appears within an embedded image in the PDF, this parameter does not apply.
Defines values for BlobIndexerParsingMode.
KnownBlobIndexerParsingMode can be used interchangeably with BlobIndexerParsingMode,
this enum contains the known values that the service supports.
default: Set to default for normal file processing.
text: Set to text to improve indexing performance on plain text files in blob storage.
delimitedText: Set to delimitedText when blobs are plain CSV files.
json: Set to json to extract structured content from JSON files.
jsonArray: Set to jsonArray to extract individual elements of a JSON array as separate documents in Azure Cognitive Search.
jsonLines: Set to jsonLines to extract individual JSON entities, separated by a new line, as separate documents in Azure Cognitive Search.
Defines values for Captions.
KnownCaptions can be used interchangeably with Captions,
this enum contains the known values that the service supports.
none: Do not return captions for the query.
extractive: Extracts captions from the matching documents that contain passages relevant to the search query.
Defines values for CharFilterName.
KnownCharFilterName can be used interchangeably with CharFilterName,
this enum contains the known values that the service supports.
html_strip: A character filter that attempts to strip out HTML constructs. See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/charfilter/HTMLStripCharFilter.html
Forms bigrams of CJK terms that are generated from the standard tokenizer. This token filter is implemented using Apache Lucene.
Defines values for CjkBigramTokenFilterScripts.
Legacy similarity algorithm which uses the Lucene TFIDFSimilarity implementation of TF-IDF. This variation of TF-IDF introduces static document length normalization as well as coordinating factors that penalize documents that only partially match the searched queries.
Grammar-based tokenizer that is suitable for processing most European-language documents. This tokenizer is implemented using Apache Lucene.
A cognitive service resource provisioned with a key that is attached to a skillset.
Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This token filter is implemented using Apache Lucene.
Defines values for ComplexDataType. Possible values include: 'Edm.ComplexType', 'Collection(Edm.ComplexType)'
A skill that enables scenarios that require a Boolean operation to determine the data to assign to an output.
Options for performing the count operation on the index.
Options for create datasource operation.
Options for create index operation.
Options for create indexer operation.
Options for create skillset operation.
Options for create synonymmap operation.
Allows you to take control over the process of converting text into indexable/searchable tokens. It's a user-defined configuration consisting of a single predefined tokenizer and one or more filters. The tokenizer is responsible for breaking text into tokens, and the filters for modifying tokens emitted by the tokenizer. Allows you to take control over the process of converting text into indexable/searchable tokens. It's a user-defined configuration consisting of a single predefined tokenizer and one or more filters. The tokenizer is responsible for breaking text into tokens, and the filters for modifying tokens emitted by the tokenizer.
A list of character filters used to prepare input text before it is processed by the tokenizer. For instance, they can replace certain characters or symbols. The filters are run in the order in which they are listed.
The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
A list of token filters used to filter out or modify the tokens generated by a tokenizer. For example, you can specify a lowercase filter that converts all characters to lowercase. The filters are run in the order in which they are listed.
The name of the tokenizer to use to divide continuous text into a sequence of tokens, such as breaking a sentence into words. KnownTokenizerNames is an enum containing known values.
A skill looks for text from a custom, user-defined list of words and phrases.
Defines values for CustomEntityLookupSkillLanguage.
KnownCustomEntityLookupSkillLanguage can be used interchangeably with CustomEntityLookupSkillLanguage,
this enum contains the known values that the service supports.
da: Danish
de: German
en: English
es: Spanish
fi: Finnish
fr: French
it: Italian
ko: Korean
pt: Portuguese
Allows you to configure normalization for filterable, sortable, and facetable fields, which by default operate with strict matching. This is a user-defined configuration consisting of at least one or more filters, which modify the token that is stored.
Contains response data for the createOrUpdate operation.
Contains response data for the create operation.
Contains response data for the get operation.
Contains response data for the list operation.
An empty object that represents the default cognitive service resource for a skillset.
Options for the delete documents operation.
Decomposes compound words found in many Germanic languages. This token filter is implemented using Apache Lucene.
Defines a function that boosts scores based on distance from a geographic location.
A skill that extracts content from a file within the enrichment pipeline.
Contains response data for the autocompleteGet operation.
Contains response data for the autocompletePost operation.
Contains response data for the count operation.
The underlying HTTP response.
The parsed response body.
Contains response data for the get operation.
The underlying HTTP response.
The parsed response body.
Contains response data for the index operation.
Contains response data for the searchGet operation.
Contains response data for the searchPost operation.
Contains response data for the suggestGet operation.
Contains response data for the suggestPost operation.
Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene. Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.
The maximum n-gram length. Default is 2. Maximum is 300. Default value: 2.
The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram. Default value: 1.
The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
Specifies which side of the input the n-gram should be generated from. Default is "front". Possible values include: 'Front', 'Back'
Defines values for EdgeNGramTokenFilterSide.
Generates n-grams of the given size(s) starting from the front or the back of an input token. This token filter is implemented using Apache Lucene.
Tokenizes the input from an edge into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.
Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). This token filter is implemented using Apache Lucene.
Defines values for EntityCategory.
KnownEntityCategory can be used interchangeably with EntityCategory,
this enum contains the known values that the service supports.
location: Entities describing a physical location.
organization: Entities describing an organization.
person: Entities describing a person.
quantity: Entities describing a quantity.
datetime: Entities describing a date and time.
url: Entities describing a URL.
email: Entities describing an email address.
Using the Text Analytics API, extracts linked entities from text.
Text analytics entity recognition.
Defines values for EntityRecognitionSkillLanguage.
KnownEntityRecognitionSkillLanguage can be used interchangeably with EntityRecognitionSkillLanguage,
this enum contains the known values that the service supports.
ar: Arabic
cs: Czech
zh-Hans: Chinese-Simplified
zh-Hant: Chinese-Traditional
da: Danish
nl: Dutch
en: English
fi: Finnish
fr: French
de: German
el: Greek
hu: Hungarian
it: Italian
ja: Japanese
ko: Korean
no: Norwegian (Bokmaal)
pl: Polish
pt-PT: Portuguese (Portugal)
pt-BR: Portuguese (Brazil)
ru: Russian
es: Spanish
sv: Swedish
tr: Turkish
Using the Text Analytics API, extracts entities of different types from text.
Defines a function that boosts scores based on the value of a date-time field.
Options for get datasource operation.
Options for get index operation.
Options for get index statistics operation.
Options for get indexer operation.
Options for get indexer status operation.
Options for get service statistics operation.
Options for get skillset operation.
Options for get synonymmaps operation.
Defines a data change detection policy that captures changes based on the value of a high water mark column.
A skill that analyzes image files. It extracts a rich set of visual features based on the image content.
Defines values for ImageAnalysisSkillLanguage.
KnownImageAnalysisSkillLanguage can be used interchangeably with ImageAnalysisSkillLanguage,
this enum contains the known values that the service supports.
en: English
es: Spanish
ja: Japanese
pt: Portuguese
zh: Chinese
Defines values for ImageDetail.
KnownImageDetail can be used interchangeably with ImageDetail,
this enum contains the known values that the service supports.
celebrities: Details recognized as celebrities.
landmarks: Details recognized as landmarks.
Defines values for IndexActionType.
Represents an index action that operates on a document.
An iterator for listing the indexes that exist in the Search service. Will make requests as needed during iteration. Use .byPage() to make one request to the server per iteration.
An iterator for listing the indexes that exist in the Search service. Will make requests as needed during iteration. Use .byPage() to make one request to the server per iteration.
Defines values for IndexerExecutionEnvironment.
KnownIndexerExecutionEnvironment can be used interchangeably with IndexerExecutionEnvironment,
this enum contains the known values that the service supports.
standard: Indicates that Azure Cognitive Search can determine where the indexer should execute. This is the default environment when nothing is specified and is the recommended value.
private: Indicates that the indexer should run with the environment provisioned specifically for the search service. This should only be specified as the execution environment if the indexer needs to access resources securely over shared private link resources.
Defines values for IndexerExecutionStatus.
Defines values for IndexerStatus.
Contains response data for the createOrUpdate operation.
Contains response data for the create operation.
Contains response data for the get operation.
Contains response data for the getStatus operation.
Contains response data for the list operation.
Contains response data for the analyze operation.
Contains response data for the createOrUpdate operation.
Contains response data for the create operation.
Contains response data for the get operation.
Contains response data for the getStatistics operation.
Contains response data for the list operation.
A token filter that only keeps tokens with text contained in a specified list of words. This token filter is implemented using Apache Lucene.
A skill that uses text analytics for key phrase extraction.
Defines values for KeyPhraseExtractionSkillLanguage.
KnownKeyPhraseExtractionSkillLanguage can be used interchangeably with KeyPhraseExtractionSkillLanguage,
this enum contains the known values that the service supports.
da: Danish
nl: Dutch
en: English
fi: Finnish
fr: French
de: German
it: Italian
ja: Japanese
ko: Korean
no: Norwegian (Bokmaal)
pl: Polish
pt-PT: Portuguese (Portugal)
pt-BR: Portuguese (Brazil)
ru: Russian
es: Spanish
sv: Swedish
Marks terms as keywords. This token filter is implemented using Apache Lucene.
Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene. Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.
The maximum token length. Default is 256. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters. Default value: 256.
The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
Emits the entire input as a single token. This tokenizer is implemented using Apache Lucene.
A skill that detects the language of input text and reports a single language code for every document submitted on the request. The language code is paired with a score indicating the confidence of the analysis.
Removes words that are too long or too short. This token filter is implemented using Apache Lucene.
Defines values for LexicalAnalyzerName.
KnownLexicalAnalyzerName can be used interchangeably with LexicalAnalyzerName,
this enum contains the known values that the service supports.
ar.microsoft: Microsoft analyzer for Arabic.
ar.lucene: Lucene analyzer for Arabic.
hy.lucene: Lucene analyzer for Armenian.
bn.microsoft: Microsoft analyzer for Bangla.
eu.lucene: Lucene analyzer for Basque.
bg.microsoft: Microsoft analyzer for Bulgarian.
bg.lucene: Lucene analyzer for Bulgarian.
ca.microsoft: Microsoft analyzer for Catalan.
ca.lucene: Lucene analyzer for Catalan.
zh-Hans.microsoft: Microsoft analyzer for Chinese (Simplified).
zh-Hans.lucene: Lucene analyzer for Chinese (Simplified).
zh-Hant.microsoft: Microsoft analyzer for Chinese (Traditional).
zh-Hant.lucene: Lucene analyzer for Chinese (Traditional).
hr.microsoft: Microsoft analyzer for Croatian.
cs.microsoft: Microsoft analyzer for Czech.
cs.lucene: Lucene analyzer for Czech.
da.microsoft: Microsoft analyzer for Danish.
da.lucene: Lucene analyzer for Danish.
nl.microsoft: Microsoft analyzer for Dutch.
nl.lucene: Lucene analyzer for Dutch.
en.microsoft: Microsoft analyzer for English.
en.lucene: Lucene analyzer for English.
et.microsoft: Microsoft analyzer for Estonian.
fi.microsoft: Microsoft analyzer for Finnish.
fi.lucene: Lucene analyzer for Finnish.
fr.microsoft: Microsoft analyzer for French.
fr.lucene: Lucene analyzer for French.
gl.lucene: Lucene analyzer for Galician.
de.microsoft: Microsoft analyzer for German.
de.lucene: Lucene analyzer for German.
el.microsoft: Microsoft analyzer for Greek.
el.lucene: Lucene analyzer for Greek.
gu.microsoft: Microsoft analyzer for Gujarati.
he.microsoft: Microsoft analyzer for Hebrew.
hi.microsoft: Microsoft analyzer for Hindi.
hi.lucene: Lucene analyzer for Hindi.
hu.microsoft: Microsoft analyzer for Hungarian.
hu.lucene: Lucene analyzer for Hungarian.
is.microsoft: Microsoft analyzer for Icelandic.
id.microsoft: Microsoft analyzer for Indonesian (Bahasa).
id.lucene: Lucene analyzer for Indonesian.
ga.lucene: Lucene analyzer for Irish.
it.microsoft: Microsoft analyzer for Italian.
it.lucene: Lucene analyzer for Italian.
ja.microsoft: Microsoft analyzer for Japanese.
ja.lucene: Lucene analyzer for Japanese.
kn.microsoft: Microsoft analyzer for Kannada.
ko.microsoft: Microsoft analyzer for Korean.
ko.lucene: Lucene analyzer for Korean.
lv.microsoft: Microsoft analyzer for Latvian.
lv.lucene: Lucene analyzer for Latvian.
lt.microsoft: Microsoft analyzer for Lithuanian.
ml.microsoft: Microsoft analyzer for Malayalam.
ms.microsoft: Microsoft analyzer for Malay (Latin).
mr.microsoft: Microsoft analyzer for Marathi.
nb.microsoft: Microsoft analyzer for Norwegian (Bokmål).
no.lucene: Lucene analyzer for Norwegian.
fa.lucene: Lucene analyzer for Persian.
pl.microsoft: Microsoft analyzer for Polish.
pl.lucene: Lucene analyzer for Polish.
pt-BR.microsoft: Microsoft analyzer for Portuguese (Brazil).
pt-BR.lucene: Lucene analyzer for Portuguese (Brazil).
pt-PT.microsoft: Microsoft analyzer for Portuguese (Portugal).
pt-PT.lucene: Lucene analyzer for Portuguese (Portugal).
pa.microsoft: Microsoft analyzer for Punjabi.
ro.microsoft: Microsoft analyzer for Romanian.
ro.lucene: Lucene analyzer for Romanian.
ru.microsoft: Microsoft analyzer for Russian.
ru.lucene: Lucene analyzer for Russian.
sr-cyrillic.microsoft: Microsoft analyzer for Serbian (Cyrillic).
sr-latin.microsoft: Microsoft analyzer for Serbian (Latin).
sk.microsoft: Microsoft analyzer for Slovak.
sl.microsoft: Microsoft analyzer for Slovenian.
es.microsoft: Microsoft analyzer for Spanish.
es.lucene: Lucene analyzer for Spanish.
sv.microsoft: Microsoft analyzer for Swedish.
sv.lucene: Lucene analyzer for Swedish.
ta.microsoft: Microsoft analyzer for Tamil.
te.microsoft: Microsoft analyzer for Telugu.
th.microsoft: Microsoft analyzer for Thai.
th.lucene: Lucene analyzer for Thai.
tr.microsoft: Microsoft analyzer for Turkish.
tr.lucene: Lucene analyzer for Turkish.
uk.microsoft: Microsoft analyzer for Ukrainian.
ur.microsoft: Microsoft analyzer for Urdu.
vi.microsoft: Microsoft analyzer for Vietnamese.
standard.lucene: Standard Lucene analyzer.
standardasciifolding.lucene: Standard ASCII Folding Lucene analyzer. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#Analyzers
keyword: Treats the entire content of a field as a single token. This is useful for data like zip codes, ids, and some product names. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html
pattern: Flexibly separates text into terms via a regular expression pattern. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/PatternAnalyzer.html
simple: Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/SimpleAnalyzer.html
stop: Divides text at non-letters; Applies the lowercase and stopword token filters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopAnalyzer.html
whitespace: An analyzer that uses the whitespace tokenizer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceAnalyzer.html
Defines values for LexicalNormalizerName.
KnownLexicalNormalizerName can be used interchangeably with LexicalNormalizerName,
this enum contains the known values that the service supports.
asciifolding: Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
elision: Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
lowercase: Normalizes token text to lowercase. See https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html
standard: Standard normalizer, which consists of lowercase and asciifolding. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
uppercase: Normalizes token text to uppercase. See https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html
Defines values for LexicalTokenizerName.
KnownLexicalTokenizerName can be used interchangeably with LexicalTokenizerName,
this enum contains the known values that the service supports.
classic: Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html
edgeNGram: Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html
keyword_v2: Emits the entire input as a single token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/KeywordTokenizer.html
letter: Divides text at non-letters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LetterTokenizer.html
lowercase: Divides text at non-letters and converts them to lower case. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/LowerCaseTokenizer.html
microsoft_language_tokenizer: Divides text using language-specific rules.
microsoft_language_stemming_tokenizer: Divides text using language-specific rules and reduces words to their base forms.
nGram: Tokenizes the input into n-grams of the given size(s). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html
path_hierarchy_v2: Tokenizer for path-like hierarchies. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html
pattern: Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html
standard_v2: Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html
uax_url_email: Tokenizes urls and emails as one token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizer.html
whitespace: Divides text at whitespace. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/WhitespaceTokenizer.html
Limits the number of tokens while indexing. This token filter is implemented using Apache Lucene.
Defines values for LineEnding.
KnownLineEnding can be used interchangeably with LineEnding,
this enum contains the known values that the service supports.
space: Lines are separated by a single space character.
carriageReturn: Lines are separated by a carriage return ('\r') character.
lineFeed: Lines are separated by a single line feed ('\n') character.
carriageReturnLineFeed: Lines are separated by a carriage return and a line feed ('\r\n') character.
Options for a list data sources operation.
Options for a list indexers operation.
Options for a list indexes operation.
Options for a list skillsets operation.
Options for a list synonymMaps operation.
Standard Apache Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter.
Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene. Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.
The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters. Default value: 255.
The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.
Defines a function that boosts scores based on the magnitude of a numeric field.
A character filter that applies mappings defined with the mappings option. Matching is greedy (longest pattern matching at a given point wins). Replacement is allowed to be the empty string. This character filter is implemented using Apache Lucene.
Options for the merge documents operation.
Options for the merge or upload documents operation.
A skill for merging two or more strings into a single unified string, with an optional user-defined delimiter separating each component part.
Divides text using language-specific rules and reduces words to their base forms.
Divides text using language-specific rules.
Defines values for MicrosoftStemmingTokenizerLanguage.
Defines values for MicrosoftTokenizerLanguage.
Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene. Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.
The maximum n-gram length. Default is 2. Maximum is 300. Default value: 2.
The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram. Default value: 1.
The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
Generates n-grams of the given size(s). This token filter is implemented using Apache Lucene.
Tokenizes the input into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.
A skill that extracts text from image files.
Defines values for OcrSkillLanguage.
KnownOcrSkillLanguage can be used interchangeably with OcrSkillLanguage,
this enum contains the known values that the service supports.
zh-Hans: Chinese-Simplified
zh-Hant: Chinese-Traditional
cs: Czech
da: Danish
nl: Dutch
en: English
fi: Finnish
fr: French
de: German
el: Greek
hu: Hungarian
it: Italian
ja: Japanese
ko: Korean
nb: Norwegian (Bokmaal)
pl: Polish
pt: Portuguese
ru: Russian
es: Spanish
sv: Swedish
tr: Turkish
ar: Arabic
ro: Romanian
sr-Cyrl: Serbian (Cyrillic, Serbia)
sr-Latn: Serbian (Latin, Serbia)
sk: Slovak
Using the Text Analytics API, extracts personal information from an input text and gives you the option of masking it.
Defines values for PIIDetectionSkillMaskingMode.
KnownPIIDetectionSkillMaskingMode can be used interchangeably with PIIDetectionSkillMaskingMode,
this enum contains the known values that the service supports.
none: No masking occurs and the maskedText output will not be returned.
replace: Replaces the detected entities with the character given in the maskingCharacter parameter. The character will be repeated to the length of the detected entity so that the offsets will correctly correspond to both the input text as well as the output maskedText.
Tokenizer for path-like hierarchies. This tokenizer is implemented using Apache Lucene.
Flexibly separates text into terms via a regular expression pattern. This analyzer is implemented using Apache Lucene. Flexibly separates text into terms via a regular expression pattern. This analyzer is implemented using Apache Lucene.
Regular expression flags. Possible values include: 'CANON_EQ', 'CASE_INSENSITIVE', 'COMMENTS', 'DOTALL', 'LITERAL', 'MULTILINE', 'UNICODE_CASE', 'UNIX_LINES'
A value indicating whether terms should be lower-cased. Default is true. Default value: true.
The name of the analyzer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
A regular expression pattern to match token separators. Default is an expression that matches
one or more whitespace characters. Default value: \W+
.
A list of stopwords.
Uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns. This token filter is implemented using Apache Lucene.
A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This character filter is implemented using Apache Lucene.
A character filter that replaces characters in the input string. It uses a regular expression to identify character sequences to preserve and a replacement pattern to identify characters to replace. For example, given the input text "aa bb aa bb", pattern "(aa)\s+(bb)", and replacement "$1#$2", the result would be "aa#bb aa#bb". This token filter is implemented using Apache Lucene.
Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene. Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene.
Regular expression flags. Possible values include: 'CANON_EQ', 'CASE_INSENSITIVE', 'COMMENTS', 'DOTALL', 'LITERAL', 'MULTILINE', 'UNICODE_CASE', 'UNIX_LINES'
The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1. Default value: -1.
The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
Polymorphic Discriminator
A regular expression pattern to match token separators. Default is an expression that matches
one or more whitespace characters. Default value: \W+
.
Defines values for PhoneticEncoder.
Create tokens for phonetic matches. This token filter is implemented using Apache Lucene.
Defines values for QueryLanguage.
KnownQueryLanguage can be used interchangeably with QueryLanguage,
this enum contains the known values that the service supports.
none: Query language not specified.
en-us: English
Defines values for QueryType.
Defines values for RegexFlags.
KnownRegexFlags can be used interchangeably with RegexFlags,
this enum contains the known values that the service supports.
CANON_EQ: Enables canonical equivalence.
CASE_INSENSITIVE: Enables case-insensitive matching.
COMMENTS: Permits whitespace and comments in the pattern.
DOTALL: Enables dotall mode.
LITERAL: Enables literal parsing of the pattern.
MULTILINE: Enables multiline mode.
UNICODE_CASE: Enables Unicode-aware case folding.
UNIX_LINES: Enables Unix lines mode.
Options for reset indexer operation.
Options for run indexer operation.
Defines values for ScoringFunctionAggregation.
Defines values for ScoringFunctionInterpolation.
Defines values for ScoringStatistics.
Defines values for SearchFieldDataType.
KnownSearchFieldDataType can be used interchangeably with SearchFieldDataType,
this enum contains the known values that the service supports.
Edm.String: Indicates that a field contains a string.
Edm.Int32: Indicates that a field contains a 32-bit signed integer.
Edm.Int64: Indicates that a field contains a 64-bit signed integer.
Edm.Double: Indicates that a field contains an IEEE double-precision floating point number.
Edm.Boolean: Indicates that a field contains a Boolean value (true or false).
Edm.DateTimeOffset: Indicates that a field contains a date/time value, including timezone information.
Edm.GeographyPoint: Indicates that a field contains a geo-location in terms of longitude and latitude.
Edm.ComplexType: Indicates that a field contains one or more complex objects that in turn have sub-fields of other types.
Collection(Edm.String)
Collection(Edm.Int32)
Collection(Edm.Int64)
Collection(Edm.Double)
Collection(Edm.Boolean)
Collection(Edm.DateTimeOffset)
Collection(Edm.GeographyPoint)
Collection(Edm.ComplexType)
Defines values for SearchFieldDataType.
Possible values include: 'Edm.String', 'Edm.Int32', 'Edm.Int64', 'Edm.Double', 'Edm.Boolean',
'Edm.DateTimeOffset', 'Edm.GeographyPoint', 'Collection(Edm.String)',
'Collection(Edm.Int32)', 'Collection(Edm.Int64)', 'Collection(Edm.Double)',
'Collection(Edm.Boolean)', 'Collection(Edm.DateTimeOffset)', 'Collection(Edm.GeographyPoint)'
Clears the identity property of a datasource.
Defines values for SearchIndexerDataSourceType.
KnownSearchIndexerDataSourceType can be used interchangeably with SearchIndexerDataSourceType,
this enum contains the known values that the service supports.
azuresql: Indicates an Azure SQL datasource.
cosmosdb: Indicates a CosmosDB datasource.
azureblob: Indicates an Azure Blob datasource.
azuretable: Indicates an Azure Table datasource.
mysql: Indicates a MySql datasource.
adlsgen2: Indicates an ADLS Gen2 datasource.
Specifies the identity for a datasource to use.
Abstract class to share properties between concrete selectors.
Projection definition for what data to store in Azure Files.
Projection definition for what data to store in Azure Blob.
Description for what data to store in Azure Tables.
Options for SearchIndexingBufferedSenderDeleteDocuments.
Options for SearchIndexingBufferedSenderFlushDocuments.
Options for SearchIndexingBufferedSenderMergeDocuments.
Options for SearchIndexingBufferedSenderMergeOrUploadDocuments.
Options for SearchIndexingBufferedSenderUploadDocuments.
An iterator for search results of a paticular query. Will make requests as needed during iteration. Use .byPage() to make one request to the server per iteration.
Defines values for SearchMode.
Contains response data for the getServiceStatistics operation.
Text analytics positive-negative sentiment analysis, scored as a floating point value in a range of zero to 1.
Defines values for SentimentSkillLanguage.
KnownSentimentSkillLanguage can be used interchangeably with SentimentSkillLanguage,
this enum contains the known values that the service supports.
da: Danish
nl: Dutch
en: English
fi: Finnish
fr: French
de: German
el: Greek
it: Italian
no: Norwegian (Bokmaal)
pl: Polish
pt-PT: Portuguese (Portugal)
ru: Russian
es: Spanish
sv: Swedish
tr: Turkish
Using the Text Analytics API, evaluates unstructured text and for each record, provides sentiment labels (such as "negative", "neutral" and "positive") based on the highest confidence score found by the service at a sentence and document-level.
A skill for reshaping the outputs. It creates a complex type to support composite fields (also known as multipart fields).
Creates combinations of tokens as a single token. This token filter is implemented using Apache Lucene.
Contains the possible cases for Similarity.
Contains response data for the createOrUpdate operation.
Contains response data for the create operation.
Contains response data for the get operation.
Contains response data for the list operation.
A filter that stems words using a Snowball-generated stemmer. This token filter is implemented using Apache Lucene.
Defines values for SnowballTokenFilterLanguage.
Defines a data deletion detection policy that implements a soft-deletion strategy. It determines whether an item should be deleted based on the value of a designated 'soft delete' column.
Defines values for Speller.
KnownSpeller can be used interchangeably with Speller,
this enum contains the known values that the service supports.
none: Speller not enabled.
lexicon: Speller corrects individual query terms using a static lexicon for the language specified by the queryLanguage parameter.
A skill to split a string into chunks of text.
Defines values for SplitSkillLanguage.
KnownSplitSkillLanguage can be used interchangeably with SplitSkillLanguage,
this enum contains the known values that the service supports.
da: Danish
de: German
en: English
es: Spanish
fi: Finnish
fr: French
it: Italian
ko: Korean
pt: Portuguese
Defines a data change detection policy that captures changes using the Integrated Change Tracking feature of Azure SQL Database.
Provides the ability to override other stemming filters with custom dictionary-based stemming. Any dictionary-stemmed terms will be marked as keywords so that they will not be stemmed with stemmers down the chain. Must be placed before any stemming filters. This token filter is implemented using Apache Lucene.
Language specific stemming filter. This token filter is implemented using Apache Lucene.
Defines values for StemmerTokenFilterLanguage.
Divides text at non-letters; Applies the lowercase and stopword token filters. This analyzer is implemented using Apache Lucene.
Defines values for StopwordsList.
Removes stop words from a token stream. This token filter is implemented using Apache Lucene.
Contains response data for the createOrUpdate operation.
Contains response data for the create operation.
Contains response data for the get operation.
Contains response data for the list operation.
Matches single or multi-word synonyms in a token stream. This token filter is implemented using Apache Lucene.
Defines a function that boosts scores of documents with string values matching a given list of tags.
Defines values for TextSplitMode.
KnownTextSplitMode can be used interchangeably with TextSplitMode,
this enum contains the known values that the service supports.
pages: Split the text into individual pages.
sentences: Split the text into individual sentences.
A skill to translate text from one language to another.
Defines values for TextTranslationSkillLanguage.
KnownTextTranslationSkillLanguage can be used interchangeably with TextTranslationSkillLanguage,
this enum contains the known values that the service supports.
af: Afrikaans
ar: Arabic
bn: Bangla
bs: Bosnian (Latin)
bg: Bulgarian
yue: Cantonese (Traditional)
ca: Catalan
zh-Hans: Chinese Simplified
zh-Hant: Chinese Traditional
hr: Croatian
cs: Czech
da: Danish
nl: Dutch
en: English
et: Estonian
fj: Fijian
fil: Filipino
fi: Finnish
fr: French
de: German
el: Greek
ht: Haitian Creole
he: Hebrew
hi: Hindi
mww: Hmong Daw
hu: Hungarian
is: Icelandic
id: Indonesian
it: Italian
ja: Japanese
sw: Kiswahili
tlh: Klingon
tlh-Latn: Klingon (Latin script)
tlh-Piqd: Klingon (Klingon script)
ko: Korean
lv: Latvian
lt: Lithuanian
mg: Malagasy
ms: Malay
mt: Maltese
nb: Norwegian
fa: Persian
pl: Polish
pt: Portuguese
pt-br: Portuguese (Brazil)
pt-PT: Portuguese (Portugal)
otq: Queretaro Otomi
ro: Romanian
ru: Russian
sm: Samoan
sr-Cyrl: Serbian (Cyrillic)
sr-Latn: Serbian (Latin)
sk: Slovak
sl: Slovenian
es: Spanish
sv: Swedish
ty: Tahitian
ta: Tamil
te: Telugu
th: Thai
to: Tongan
tr: Turkish
uk: Ukrainian
ur: Urdu
vi: Vietnamese
cy: Welsh
yua: Yucatec Maya
ga: Irish
kn: Kannada
mi: Maori
ml: Malayalam
pa: Punjabi
Defines values for TokenCharacterKind.
Defines values for TokenFilterName.
KnownTokenFilterName can be used interchangeably with TokenFilterName,
this enum contains the known values that the service supports.
arabic_normalization: A token filter that applies the Arabic normalizer to normalize the orthography. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ar/ArabicNormalizationFilter.html
apostrophe: Strips all characters after an apostrophe (including the apostrophe itself). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html
asciifolding: Converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if such equivalents exist. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
cjk_bigram: Forms bigrams of CJK terms that are generated from the standard tokenizer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKBigramFilter.html
cjk_width: Normalizes CJK width differences. Folds fullwidth ASCII variants into the equivalent basic Latin, and half-width Katakana variants into the equivalent Kana. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/cjk/CJKWidthFilter.html
classic: Removes English possessives, and dots from acronyms. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicFilter.html
common_grams: Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsFilter.html
edgeNGram_v2: Generates n-grams of the given size(s) starting from the front or the back of an input token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html
elision: Removes elisions. For example, "l'avion" (the plane) will be converted to "avion" (plane). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/util/ElisionFilter.html
german_normalization: Normalizes German characters according to the heuristics of the German2 snowball algorithm. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
hindi_normalization: Normalizes text in Hindi to remove some differences in spelling variations. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/hi/HindiNormalizationFilter.html
indic_normalization: Normalizes the Unicode representation of text in Indian languages. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/in/IndicNormalizationFilter.html
keyword_repeat: Emits each incoming token twice, once as keyword and once as non-keyword. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/KeywordRepeatFilter.html
kstem: A high-performance kstem filter for English. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/en/KStemFilter.html
length: Removes words that are too long or too short. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LengthFilter.html
limit: Limits the number of tokens while indexing. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html
lowercase: Normalizes token text to lower case. See https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html
nGram_v2: Generates n-grams of the given size(s). See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenFilter.html
persian_normalization: Applies normalization for Persian. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/fa/PersianNormalizationFilter.html
phonetic: Create tokens for phonetic matches. See https://lucene.apache.org/core/4_10_3/analyzers-phonetic/org/apache/lucene/analysis/phonetic/package-tree.html
porter_stem: Uses the Porter stemming algorithm to transform the token stream. See http://tartarus.org/~martin/PorterStemmer
reverse: Reverses the token string. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/reverse/ReverseStringFilter.html
scandinavian_normalization: Normalizes use of the interchangeable Scandinavian characters. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianNormalizationFilter.html
scandinavian_folding: Folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o. It also discriminates against use of double vowels aa, ae, ao, oe and oo, leaving just the first one. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/ScandinavianFoldingFilter.html
shingle: Creates combinations of tokens as a single token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilter.html
snowball: A filter that stems words using a Snowball-generated stemmer. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/snowball/SnowballFilter.html
sorani_normalization: Normalizes the Unicode representation of Sorani text. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ckb/SoraniNormalizationFilter.html
stemmer: Language specific stemming filter. See https://docs.microsoft.com/rest/api/searchservice/Custom-analyzers-in-Azure-Search#TokenFilters
stopwords: Removes stop words from a token stream. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html
trim: Trims leading and trailing whitespace from tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TrimFilter.html
truncate: Truncates the terms to a specific length. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/TruncateTokenFilter.html
unique: Filters out tokens with same text as the previous token. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/RemoveDuplicatesTokenFilter.html
uppercase: Normalizes token text to upper case. See https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html
word_delimiter: Splits words into subwords and performs optional transformations on subword groups.
Truncates the terms to a specific length. This token filter is implemented using Apache Lucene.
Tokenizes urls and emails as one token. This tokenizer is implemented using Apache Lucene.
Filters out tokens with same text as the previous token. This token filter is implemented using Apache Lucene.
Options for the upload documents operation.
Defines values for VisualFeature.
KnownVisualFeature can be used interchangeably with VisualFeature,
this enum contains the known values that the service supports.
adult: Visual features recognized as adult persons.
brands: Visual features recognized as commercial brands.
categories: Categories.
description: Description.
faces: Visual features recognized as people faces.
objects: Visual features recognized as objects.
tags: Tags.
A skill that can call a Web API endpoint, allowing you to extend a skillset by having it call your custom code.
Splits words into subwords and performs optional transformations on subword groups. This token filter is implemented using Apache Lucene.
Default Batch Size
Default window flush interval
Default Max Delay between retries.
Default number of times to retry.
Default retry delay.
The @azure/logger
configuration for this package.
Create an HTTP pipeline policy to authenticate a request
using an AzureKeyCredential
for Azure Cognitive Search
Helper method to create a SynonymMap object. This is a NodeJS only method.
Name of the SynonymMap.
Path of the file that contains the Synonyms (seperated by new lines)
SynonymMap object
Decodes a base64 string into a regular string.
The base64 string to decode.
Encodes a string in base64 format.
The string to encode.
Escapes an odata filter expression to avoid errors with quoting string literals. Example usage:
const baseRateMax = 200;
const ratingMin = 4;
const filter = odata`Rooms/any(room: room/BaseRate lt ${baseRateMax}) and Rating ge ${ratingMin}`;
For more information on supported syntax see: https://docs.microsoft.com/en-us/azure/search/search-query-odata-filter
Array of strings for the expression
Array of values for the expression
Generated using TypeDoc
Options for analyze text operation.