Class CjkBigramTokenFilter


  • public final class CjkBigramTokenFilter
    extends TokenFilter
    Forms bigrams of CJK terms that are generated from the standard tokenizer. This token filter is implemented using Apache Lucene.
    • Constructor Detail

      • CjkBigramTokenFilter

        public CjkBigramTokenFilter​(String name)
        Constructor of CjkBigramTokenFilter.
        Parameters:
        name - The name of the token filter. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters.
    • Method Detail

      • getIgnoreScripts

        public List<CjkBigramTokenFilterScripts> getIgnoreScripts()
        Get the ignoreScripts property: The scripts to ignore.
        Returns:
        the ignoreScripts value.
      • setIgnoreScripts

        public CjkBigramTokenFilter setIgnoreScripts​(CjkBigramTokenFilterScripts... ignoreScripts)
        Set the ignoreScripts property: The scripts to ignore.
        Parameters:
        ignoreScripts - the ignoreScripts value to set.
        Returns:
        the CjkBigramTokenFilter object itself.
      • setIgnoreScripts

        public CjkBigramTokenFilter setIgnoreScripts​(List<CjkBigramTokenFilterScripts> ignoreScripts)
        Set the ignoreScripts property: The scripts to ignore.
        Parameters:
        ignoreScripts - the ignoreScripts value to set.
        Returns:
        the CjkBigramTokenFilter object itself.
      • areOutputUnigrams

        public Boolean areOutputUnigrams()
        Get the outputUnigrams property: A value indicating whether to output both unigrams and bigrams (if true), or just bigrams (if false). Default is false.
        Returns:
        the outputUnigrams value.
      • setOutputUnigrams

        public CjkBigramTokenFilter setOutputUnigrams​(Boolean outputUnigrams)
        Set the outputUnigrams property: A value indicating whether to output both unigrams and bigrams (if true), or just bigrams (if false). Default is false.
        Parameters:
        outputUnigrams - the outputUnigrams value to set.
        Returns:
        the CjkBigramTokenFilter object itself.