Class ClassicTokenizer


  • public final class ClassicTokenizer
    extends LexicalTokenizer
    Grammar-based tokenizer that is suitable for processing most European-language documents. This tokenizer is implemented using Apache Lucene.
    • Constructor Detail

      • ClassicTokenizer

        public ClassicTokenizer​(String name)
        Constructor of ClassicTokenizer.
        Parameters:
        name - The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getMaxTokenLength

        public Integer getMaxTokenLength()
        Get the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.
        Returns:
        the maxTokenLength value.
      • setMaxTokenLength

        public ClassicTokenizer setMaxTokenLength​(Integer maxTokenLength)
        Set the maxTokenLength property: The maximum token length. Default is 255. Tokens longer than the maximum length are split. The maximum token length that can be used is 300 characters.
        Parameters:
        maxTokenLength - the maxTokenLength value to set.
        Returns:
        the ClassicTokenizer object itself.