Class WordDelimiterTokenFilter


  • public final class WordDelimiterTokenFilter
    extends TokenFilter
    Splits words into subwords and performs optional transformations on subword groups. This token filter is implemented using Apache Lucene.
    • Constructor Detail

      • WordDelimiterTokenFilter

        public WordDelimiterTokenFilter​(String name)
        Constructor of WordDelimiterTokenFilter.
        Parameters:
        name - The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • generateWordParts

        public Boolean generateWordParts()
        Get the generateWordParts property: A value indicating whether to generate part words. If set, causes parts of words to be generated; for example "AzureSearch" becomes "Azure" "Search". Default is true.
        Returns:
        the generateWordParts value.
      • setGenerateWordParts

        public WordDelimiterTokenFilter setGenerateWordParts​(Boolean generateWordParts)
        Set the generateWordParts property: A value indicating whether to generate part words. If set, causes parts of words to be generated; for example "AzureSearch" becomes "Azure" "Search". Default is true.
        Parameters:
        generateWordParts - the generateWordParts value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • generateNumberParts

        public Boolean generateNumberParts()
        Get the generateNumberParts property: A value indicating whether to generate number subwords. Default is true.
        Returns:
        the generateNumberParts value.
      • setGenerateNumberParts

        public WordDelimiterTokenFilter setGenerateNumberParts​(Boolean generateNumberParts)
        Set the generateNumberParts property: A value indicating whether to generate number subwords. Default is true.
        Parameters:
        generateNumberParts - the generateNumberParts value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • areWordsCatenated

        public Boolean areWordsCatenated()
        Get the catenateWords property: A value indicating whether maximum runs of word parts will be catenated. For example, if this is set to true, "Azure-Search" becomes "AzureSearch". Default is false.
        Returns:
        the catenateWords value.
      • setWordsCatenated

        public WordDelimiterTokenFilter setWordsCatenated​(Boolean wordsCatenated)
        Set the catenateWords property: A value indicating whether maximum runs of word parts will be catenated. For example, if this is set to true, "Azure-Search" becomes "AzureSearch". Default is false.
        Parameters:
        wordsCatenated - the catenateWords value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • areNumbersCatenated

        public Boolean areNumbersCatenated()
        Get the catenateNumbers property: A value indicating whether maximum runs of number parts will be catenated. For example, if this is set to true, "1-2" becomes "12". Default is false.
        Returns:
        the catenateNumbers value.
      • setNumbersCatenated

        public WordDelimiterTokenFilter setNumbersCatenated​(Boolean numbersCatenated)
        Set the catenateNumbers property: A value indicating whether maximum runs of number parts will be catenated. For example, if this is set to true, "1-2" becomes "12". Default is false.
        Parameters:
        numbersCatenated - the catenateNumbers value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • catenateAll

        public Boolean catenateAll()
        Get the catenateAll property: A value indicating whether all subword parts will be catenated. For example, if this is set to true, "Azure-Search-1" becomes "AzureSearch1". Default is false.
        Returns:
        the catenateAll value.
      • setCatenateAll

        public WordDelimiterTokenFilter setCatenateAll​(Boolean catenateAll)
        Set the catenateAll property: A value indicating whether all subword parts will be catenated. For example, if this is set to true, "Azure-Search-1" becomes "AzureSearch1". Default is false.
        Parameters:
        catenateAll - the catenateAll value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • splitOnCaseChange

        public Boolean splitOnCaseChange()
        Get the splitOnCaseChange property: A value indicating whether to split words on caseChange. For example, if this is set to true, "AzureSearch" becomes "Azure" "Search". Default is true.
        Returns:
        the splitOnCaseChange value.
      • setSplitOnCaseChange

        public WordDelimiterTokenFilter setSplitOnCaseChange​(Boolean splitOnCaseChange)
        Set the splitOnCaseChange property: A value indicating whether to split words on caseChange. For example, if this is set to true, "AzureSearch" becomes "Azure" "Search". Default is true.
        Parameters:
        splitOnCaseChange - the splitOnCaseChange value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • isPreserveOriginal

        public Boolean isPreserveOriginal()
        Get the preserveOriginal property: A value indicating whether original words will be preserved and added to the subword list. Default is false.
        Returns:
        the preserveOriginal value.
      • setPreserveOriginal

        public WordDelimiterTokenFilter setPreserveOriginal​(Boolean preserveOriginal)
        Set the preserveOriginal property: A value indicating whether original words will be preserved and added to the subword list. Default is false.
        Parameters:
        preserveOriginal - the preserveOriginal value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • splitOnNumerics

        public Boolean splitOnNumerics()
        Get the splitOnNumerics property: A value indicating whether to split on numbers. For example, if this is set to true, "Azure1Search" becomes "Azure" "1" "Search". Default is true.
        Returns:
        the splitOnNumerics value.
      • setSplitOnNumerics

        public WordDelimiterTokenFilter setSplitOnNumerics​(Boolean splitOnNumerics)
        Set the splitOnNumerics property: A value indicating whether to split on numbers. For example, if this is set to true, "Azure1Search" becomes "Azure" "1" "Search". Default is true.
        Parameters:
        splitOnNumerics - the splitOnNumerics value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • isStemEnglishPossessive

        public Boolean isStemEnglishPossessive()
        Get the stemEnglishPossessive property: A value indicating whether to remove trailing "'s" for each subword. Default is true.
        Returns:
        the stemEnglishPossessive value.
      • setStemEnglishPossessive

        public WordDelimiterTokenFilter setStemEnglishPossessive​(Boolean stemEnglishPossessive)
        Set the stemEnglishPossessive property: A value indicating whether to remove trailing "'s" for each subword. Default is true.
        Parameters:
        stemEnglishPossessive - the stemEnglishPossessive value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • getProtectedWords

        public List<String> getProtectedWords()
        Get the protectedWords property: A list of tokens to protect from being delimited.
        Returns:
        the protectedWords value.
      • setProtectedWords

        public WordDelimiterTokenFilter setProtectedWords​(String... protectedWords)
        Set the protectedWords property: A list of tokens to protect from being delimited.
        Parameters:
        protectedWords - the protectedWords value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.
      • setProtectedWords

        public WordDelimiterTokenFilter setProtectedWords​(List<String> protectedWords)
        Set the protectedWords property: A list of tokens to protect from being delimited.
        Parameters:
        protectedWords - the protectedWords value to set.
        Returns:
        the WordDelimiterTokenFilter object itself.