Class LuceneStandardTokenizer


  • public final class LuceneStandardTokenizer
    extends LexicalTokenizer
    Breaks text following the Unicode Text Segmentation rules. This tokenizer is implemented using Apache Lucene.
    • Constructor Detail

      • LuceneStandardTokenizer

        public LuceneStandardTokenizer​(String name)
        Constructor of LuceneStandardTokenizer.
        Parameters:
        name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getMaxTokenLength

        public Integer getMaxTokenLength()
        Get the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.
        Returns:
        the maxTokenLength value.
      • setMaxTokenLength

        public LuceneStandardTokenizer setMaxTokenLength​(Integer maxTokenLength)
        Set the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split.
        Parameters:
        maxTokenLength - the maxTokenLength value to set.
        Returns:
        the LuceneStandardTokenizer object itself.