Class PatternTokenizer

java.lang.Object
com.azure.search.documents.indexes.models.LexicalTokenizer
com.azure.search.documents.indexes.models.PatternTokenizer

public final class PatternTokenizer extends LexicalTokenizer
Tokenizer that uses regex pattern matching to construct distinct tokens. This tokenizer is implemented using Apache Lucene.
  • Constructor Details

    • PatternTokenizer

      public PatternTokenizer(String name)
      Constructor of PatternTokenizer.
      Parameters:
      name - The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
  • Method Details

    • getPattern

      public String getPattern()
      Get the pattern property: A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.
      Returns:
      the pattern value.
    • setPattern

      public PatternTokenizer setPattern(String pattern)
      Set the pattern property: A regular expression pattern to match token separators. Default is an expression that matches one or more non-word characters.
      Parameters:
      pattern - the pattern value to set.
      Returns:
      the PatternTokenizer object itself.
    • getFlags

      public List<RegexFlags> getFlags()
      Get the flags property: Regular expression flags.
      Returns:
      the flags value.
    • setFlags

      public PatternTokenizer setFlags(RegexFlags... flags)
      Set the flags property: Regular expression flags.
      Parameters:
      flags - the flags value to set.
      Returns:
      the PatternTokenizer object itself.
    • setFlags

      public PatternTokenizer setFlags(List<RegexFlags> flags)
      Set the flags property: Regular expression flags.
      Parameters:
      flags - the flags value to set.
      Returns:
      the PatternTokenizer object itself.
    • getGroup

      public Integer getGroup()
      Get the group property: The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1.
      Returns:
      the group value.
    • setGroup

      public PatternTokenizer setGroup(Integer group)
      Set the group property: The zero-based ordinal of the matching group in the regular expression pattern to extract into tokens. Use -1 if you want to use the entire pattern to split the input into tokens, irrespective of matching groups. Default is -1.
      Parameters:
      group - the group value to set.
      Returns:
      the PatternTokenizer object itself.