Class LuceneStandardTokenizer

java.lang.Object
com.azure.search.documents.indexes.models.LexicalTokenizer
com.azure.search.documents.indexes.models.LuceneStandardTokenizer

public final class LuceneStandardTokenizer extends LexicalTokenizer
Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.
  • Constructor Details

    • LuceneStandardTokenizer

      public LuceneStandardTokenizer(String name)
      Constructor of LuceneStandardTokenizer.
      Parameters:
      name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
  • Method Details

    • getMaxTokenLength

      public Integer getMaxTokenLength()
      Get the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.
      Returns:
      the maxTokenLength value.
    • setMaxTokenLength

      public LuceneStandardTokenizer setMaxTokenLength(Integer maxTokenLength)
      Set the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.
      Parameters:
      maxTokenLength - the maxTokenLength value to set.
      Returns:
      the LuceneStandardTokenizer object itself.