org.apache.lucene.benchmark.byTask.feeds

Class BasicDocMaker

public abstract class BasicDocMaker extends Object implements DocMaker

Create documents for the test. Maintains counters of chars etc. so that sub-classes just need to provide textual content, and the create-by-size is handled here.

Config Params (default is in caps): doc.stored=true|FALSE
doc.tokenized=TRUE|false
doc.term.vector=true|FALSE
doc.store.body.bytes=true|FALSE //Store the body contents raw UTF-8 bytes as a field

Field Summary
protected Configconfig
protected booleanforever
protected Field.IndexindexVal
protected Field.StorestoreVal
protected Field.TermVectortermVecVal
Method Summary
protected voidaddBytes(long n)
protected voidaddUniqueBytes(long n)
protected voidcollectFiles(File f, ArrayList inputFiles)
longgetByteCount()
intgetCount()
HTMLParsergetHtmlParser()
protected abstract DocDatagetNextDocData()
Return the data of the next document.
DocumentmakeDocument()
DocumentmakeDocument(int size)
longnumUniqueBytes()
voidprintDocStatistics()
voidresetInputs()
voidsetConfig(Config config)
voidsetHTMLParser(HTMLParser htmlParser)

Field Detail

config

protected Config config

forever

protected boolean forever

indexVal

protected Field.Index indexVal

storeVal

protected Field.Store storeVal

termVecVal

protected Field.TermVector termVecVal

Method Detail

addBytes

protected void addBytes(long n)

addUniqueBytes

protected void addUniqueBytes(long n)

collectFiles

protected void collectFiles(File f, ArrayList inputFiles)

getByteCount

public long getByteCount()

getCount

public int getCount()

getHtmlParser

public HTMLParser getHtmlParser()

getNextDocData

protected abstract DocData getNextDocData()
Return the data of the next document. All current implementations can create docs forever. When the input data is exhausted, input files are iterated. This re-iteration can be avoided by setting doc.maker.forever to false (default is true).

Returns: data of the next document.

Throws: if cannot create the next doc data NoMoreDataException if data is exhausted (and 'forever' set to false).

makeDocument

public Document makeDocument()

makeDocument

public Document makeDocument(int size)

numUniqueBytes

public long numUniqueBytes()

printDocStatistics

public void printDocStatistics()

resetInputs

public void resetInputs()

setConfig

public void setConfig(Config config)

setHTMLParser

public void setHTMLParser(HTMLParser htmlParser)
Copyright © 2000-2007 Apache Software Foundation. All Rights Reserved.