Class EdgeNGramTokenizer


  • public final class EdgeNGramTokenizer
    extends LexicalTokenizer
    Tokenizes the input from an edge into n-grams of the given size(s). This tokenizer is implemented using Apache Lucene.
    • Constructor Detail

      • EdgeNGramTokenizer

        public EdgeNGramTokenizer​(String name)
        Constructor of LexicalTokenizer.
        Parameters:
        name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getMinGram

        public Integer getMinGram()
        Get the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.
        Returns:
        the minGram value.
      • setMinGram

        public EdgeNGramTokenizer setMinGram​(Integer minGram)
        Set the minGram property: The minimum n-gram length. Default is 1. Maximum is 300. Must be less than the value of maxGram.
        Parameters:
        minGram - the minGram value to set.
        Returns:
        the EdgeNGramTokenizer object itself.
      • getMaxGram

        public Integer getMaxGram()
        Get the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.
        Returns:
        the maxGram value.
      • setMaxGram

        public EdgeNGramTokenizer setMaxGram​(Integer maxGram)
        Set the maxGram property: The maximum n-gram length. Default is 2. Maximum is 300.
        Parameters:
        maxGram - the maxGram value to set.
        Returns:
        the EdgeNGramTokenizer object itself.
      • getTokenChars

        public List<TokenCharacterKind> getTokenChars()
        Get the tokenChars property: Character classes to keep in the tokens.
        Returns:
        the tokenChars value.
      • setTokenChars

        public EdgeNGramTokenizer setTokenChars​(TokenCharacterKind... tokenChars)
        Set the tokenChars property: Character classes to keep in the tokens.
        Parameters:
        tokenChars - the tokenChars value to set.
        Returns:
        the EdgeNGramTokenizer object itself.
      • setTokenChars

        public EdgeNGramTokenizer setTokenChars​(List<TokenCharacterKind> tokenChars)
        Set the tokenChars property: Character classes to keep in the tokens.
        Parameters:
        tokenChars - the tokenChars value to set.
        Returns:
        the EdgeNGramTokenizer object itself.