Class NGramTokenizer


  • public final class NGramTokenizer
    extends LexicalTokenizer
    Tokenizes the input into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.
    • Constructor Detail

      • NGramTokenizer

        public NGramTokenizer​(String name)
        Constructor of NGramTokenizer.
        Parameters:
        name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getMinGram

        public Integer getMinGram()
        Get the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.
        Returns:
        the minGram value.
      • setMinGram

        public NGramTokenizer setMinGram​(Integer minGram)
        Set the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.
        Parameters:
        minGram - the minGram value to set.
        Returns:
        the NGramTokenizer object itself.
      • getMaxGram

        public Integer getMaxGram()
        Get the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.
        Returns:
        the maxGram value.
      • setMaxGram

        public NGramTokenizer setMaxGram​(Integer maxGram)
        Set the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.
        Parameters:
        maxGram - the maxGram value to set.
        Returns:
        the NGramTokenizer object itself.
      • getTokenChars

        public List<TokenCharacterKind> getTokenChars()
        Get the tokenChars property: Character classes to keep in the tokens.
        Returns:
        the tokenChars value.
      • setTokenChars

        public NGramTokenizer setTokenChars​(TokenCharacterKind... tokenChars)
        Set the tokenChars property: Character classes to keep in the tokens.
        Parameters:
        tokenChars - the tokenChars value to set.
        Returns:
        the NGramTokenizer object itself.
      • setTokenChars

        public NGramTokenizer setTokenChars​(List<TokenCharacterKind> tokenChars)
        Set the tokenChars property: Character classes to keep in the tokens.
        Parameters:
        tokenChars - the tokenChars value to set.
        Returns:
        the NGramTokenizer object itself.