Class PatternTokenizer


  • public final class PatternTokenizer
    extends LexicalTokenizer
    Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene.
    • Constructor Detail

      • PatternTokenizer

        public PatternTokenizer​(String name)
        Constructor of PatternTokenizer.
        Parameters:
        name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getPattern

        public String getPattern()
        Get the pattern property: A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.
        Returns:
        the pattern value.
      • setPattern

        public PatternTokenizer setPattern​(String pattern)
        Set the pattern property: A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.
        Parameters:
        pattern - the pattern value to set.
        Returns:
        the PatternTokenizer object itself.
      • getFlags

        public List<RegexFlags> getFlags()
        Get the flags property: Regular expression flags.
        Returns:
        the flags value.
      • setFlags

        public PatternTokenizer setFlags​(RegexFlags... flags)
        Set the flags property: Regular expression flags.
        Parameters:
        flags - the flags value to set.
        Returns:
        the PatternTokenizer object itself.
      • setFlags

        public PatternTokenizer setFlags​(List<RegexFlags> flags)
        Set the flags property: Regular expression flags.
        Parameters:
        flags - the flags value to set.
        Returns:
        the PatternTokenizer object itself.
      • getGroup

        public Integer getGroup()
        Get the group property: The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1.
        Returns:
        the group value.
      • setGroup

        public PatternTokenizer setGroup​(Integer group)
        Set the group property: The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1.
        Parameters:
        group - the group value to set.
        Returns:
        the PatternTokenizer object itself.