org.apache.lucene.analysis
Class WordlistLoader
public
class
WordlistLoader
extends Object
Loader for text files that represent a list of stopwords.
Version: $Id: WordlistLoader.java 472959 2006-11-09 16:21:50Z yonik $
Author: Gerhard Schwarz
Method Summary |
static HashMap | getStemDict(File wordstemfile)
Reads a stem dictionary. |
static HashSet | getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). |
static HashSet | getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). |
public static HashMap getStemDict(File wordstemfile)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)
Returns: stem dictionary that overrules the stemming algorithm
Throws: IOException
public static HashSet getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the file should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters: wordfile File containing the wordlist
Returns: A HashSet with the file's words
public static HashSet getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting
leading and trailing whitespace). Every line of the Reader should contain only
one word. The words need to be in lowercase if you make use of an
Analyzer which uses LowerCaseFilter (like StandardAnalyzer).
Parameters: reader Reader containing the wordlist
Returns: A HashSet with the reader's words
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.