org.apache.lucene.analysis

Class LetterTokenizer

public class LetterTokenizer extends CharTokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.
Constructor Summary
LetterTokenizer(Reader in)
Construct a new LetterTokenizer.
Method Summary
protected booleanisTokenChar(char c)
Collects only characters which satisfy Character#isLetter(char).

Constructor Detail

LetterTokenizer

public LetterTokenizer(Reader in)
Construct a new LetterTokenizer.

Method Detail

isTokenChar

protected boolean isTokenChar(char c)
Collects only characters which satisfy Character#isLetter(char).
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.