org.apache.lucene.analysis

Class WordlistLoader

public class WordlistLoader extends Object

Loader for text files that represent a list of stopwords.

Version: $Id: WordlistLoader.java 472959 2006-11-09 16:21:50Z yonik $

Author: Gerhard Schwarz

Method Summary
static HashMapgetStemDict(File wordstemfile)
Reads a stem dictionary.
static HashSetgetWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).
static HashSetgetWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace).

Method Detail

getStemDict

public static HashMap getStemDict(File wordstemfile)
Reads a stem dictionary. Each line contains:
word\tstem
(i.e. two tab seperated words)

Returns: stem dictionary that overrules the stemming algorithm

Throws: IOException

getWordSet

public static HashSet getWordSet(File wordfile)
Loads a text file and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the file should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters: wordfile File containing the wordlist

Returns: A HashSet with the file's words

getWordSet

public static HashSet getWordSet(Reader reader)
Reads lines from a Reader and adds every line as an entry to a HashSet (omitting leading and trailing whitespace). Every line of the Reader should contain only one word. The words need to be in lowercase if you make use of an Analyzer which uses LowerCaseFilter (like StandardAnalyzer).

Parameters: reader Reader containing the wordlist

Returns: A HashSet with the reader's words

Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.