Class ShingleTokenFilter


  • public final class ShingleTokenFilter
    extends TokenFilter
    Creates combinations of tokens as a single token. This token filter is implemented using Apache Lucene.
    • Constructor Detail

      • ShingleTokenFilter

        public ShingleTokenFilter​(String name)
        Constructor of ShingleTokenFilter.
        Parameters:
        name - The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getMaxShingleSize

        public Integer getMaxShingleSize()
        Get the maxShingleSize property: The maximum shingle size. Default and minimum value is 2.
        Returns:
        the maxShingleSize value.
      • setMaxShingleSize

        public ShingleTokenFilter setMaxShingleSize​(Integer maxShingleSize)
        Set the maxShingleSize property: The maximum shingle size. Default and minimum value is 2.
        Parameters:
        maxShingleSize - the maxShingleSize value to set.
        Returns:
        the ShingleTokenFilter object itself.
      • getMinShingleSize

        public Integer getMinShingleSize()
        Get the minShingleSize property: The minimum shingle size. Default and minimum value is 2. Must be less than the value of maxShingleSize.
        Returns:
        the minShingleSize value.
      • setMinShingleSize

        public ShingleTokenFilter setMinShingleSize​(Integer minShingleSize)
        Set the minShingleSize property: The minimum shingle size. Default and minimum value is 2. Must be less than the value of maxShingleSize.
        Parameters:
        minShingleSize - the minShingleSize value to set.
        Returns:
        the ShingleTokenFilter object itself.
      • areOutputUnigrams

        public Boolean areOutputUnigrams()
        Get the outputUnigrams property: A value indicating whether the output stream will contain the input tokens (unigrams) as well as shingles. Default is true.
        Returns:
        the outputUnigrams value.
      • setOutputUnigrams

        public ShingleTokenFilter setOutputUnigrams​(Boolean outputUnigrams)
        Set the outputUnigrams property: A value indicating whether the output stream will contain the input tokens (unigrams) as well as shingles. Default is true.
        Parameters:
        outputUnigrams - the outputUnigrams value to set.
        Returns:
        the ShingleTokenFilter object itself.
      • areOutputUnigramsIfNoShingles

        public Boolean areOutputUnigramsIfNoShingles()
        Get the outputUnigramsIfNoShingles property: A value indicating whether to output unigrams for those times when no shingles are available. This property takes precedence when outputUnigrams is set to false. Default is false.
        Returns:
        the outputUnigramsIfNoShingles value.
      • setOutputUnigramsIfNoShingles

        public ShingleTokenFilter setOutputUnigramsIfNoShingles​(Boolean outputUnigramsIfNoShingles)
        Set the outputUnigramsIfNoShingles property: A value indicating whether to output unigrams for those times when no shingles are available. This property takes precedence when outputUnigrams is set to false. Default is false.
        Parameters:
        outputUnigramsIfNoShingles - the outputUnigramsIfNoShingles value to set.
        Returns:
        the ShingleTokenFilter object itself.
      • getTokenSeparator

        public String getTokenSeparator()
        Get the tokenSeparator property: The string to use when joining adjacent tokens to form a shingle. Default is a single space (" ").
        Returns:
        the tokenSeparator value.
      • setTokenSeparator

        public ShingleTokenFilter setTokenSeparator​(String tokenSeparator)
        Set the tokenSeparator property: The string to use when joining adjacent tokens to form a shingle. Default is a single space (" ").
        Parameters:
        tokenSeparator - the tokenSeparator value to set.
        Returns:
        the ShingleTokenFilter object itself.
      • getFilterToken

        public String getFilterToken()
        Get the filterToken property: The string to insert for each position at which there is no token. Default is an underscore ("_").
        Returns:
        the filterToken value.
      • setFilterToken

        public ShingleTokenFilter setFilterToken​(String filterToken)
        Set the filterToken property: The string to insert for each position at which there is no token. Default is an underscore ("_").
        Parameters:
        filterToken - the filterToken value to set.
        Returns:
        the ShingleTokenFilter object itself.