Options
All
  • Public
  • Public/Protected
  • All
Menu

Enumeration KnownTokenizerNames

Package version

Defines values for TokenizerName.

readonly

Index

Enumeration members

Classic

Classic: = "classic"

Grammar-based tokenizer that is suitable for processing most European-language documents. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html

EdgeNGram

EdgeNGram: = "edgeNGram"

Tokenizes the input from an edge into n-grams of the given size(s). See https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html

Keyword

Keyword: = "keyword_v2"

Letter

Letter: = "letter"

Lowercase

Lowercase: = "lowercase"

MicrosoftLanguageStemmingTokenizer

MicrosoftLanguageStemmingTokenizer: = "microsoft_language_stemming_tokenizer"

Divides text using language-specific rules and reduces words to their base forms.

MicrosoftLanguageTokenizer

MicrosoftLanguageTokenizer: = "microsoft_language_tokenizer"

Divides text using language-specific rules.

NGram

NGram: = "nGram"

PathHierarchy

PathHierarchy: = "path_hierarchy_v2"

Pattern

Pattern: = "pattern"

Tokenizer that uses regex pattern matching to construct distinct tokens. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/pattern/PatternTokenizer.html

Standard

Standard: = "standard_v2"

Standard Lucene analyzer; Composed of the standard tokenizer, lowercase filter and stop filter. See http://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html

UaxUrlEmail

UaxUrlEmail: = "uax_url_email"

Whitespace

Whitespace: = "whitespace"

Generated using TypeDoc