class |
AsciiFoldingTokenFilter |
Converts alphabetic, numeric, and symbolic Unicode characters which are not
in the first 127 ASCII characters (the "Basic Latin" Unicode block) into
their ASCII equivalents, if such equivalents exist.
|
class |
CjkBigramTokenFilter |
Forms bigrams of CJK terms that are generated from the standard tokenizer.
|
class |
CommonGramTokenFilter |
Construct bigrams for frequently occurring terms while indexing.
|
class |
DictionaryDecompounderTokenFilter |
Decomposes compound words found in many Germanic languages.
|
class |
EdgeNGramTokenFilter |
Generates n-grams of the given size(s) starting from the front or the back
of an input token.
|
class |
ElisionTokenFilter |
Removes elisions.
|
class |
KeepTokenFilter |
A token filter that only keeps tokens with text contained in a specified
list of words.
|
class |
KeywordMarkerTokenFilter |
Marks terms as keywords.
|
class |
LengthTokenFilter |
Removes words that are too long or too short.
|
class |
LimitTokenFilter |
Limits the number of tokens while indexing.
|
class |
NGramTokenFilter |
Generates n-grams of the given size(s).
|
class |
PatternCaptureTokenFilter |
Uses Java regexes to emit multiple tokens - one for each capture group in
one or more patterns.
|
class |
PatternReplaceTokenFilter |
A character filter that replaces characters in the input string.
|
class |
PhoneticTokenFilter |
Create tokens for phonetic matches.
|
class |
ShingleTokenFilter |
Creates combinations of tokens as a single token.
|
class |
SnowballTokenFilter |
A filter that stems words using a Snowball-generated stemmer.
|
class |
StemmerOverrideTokenFilter |
Provides the ability to override other stemming filters with custom
dictionary-based stemming.
|
class |
StemmerTokenFilter |
Language specific stemming filter.
|
class |
StopwordsTokenFilter |
Removes stop words from a token stream.
|
class |
SynonymTokenFilter |
Matches single or multi-word synonyms in a token stream.
|
class |
TruncateTokenFilter |
Truncates the terms to a specific length.
|
class |
UniqueTokenFilter |
Filters out tokens with same text as the previous token.
|
class |
WordDelimiterTokenFilter |
Splits words into subwords and performs optional transformations on subword
groups.
|