org.apache.lucene.ant
public class HtmlDocument extends Object
HtmlDocument
class creates a Lucene Document from an HTML document. It does this by using JTidy package. It can take input input from java.io.File or java.io.InputStream.
Constructor Summary | |
---|---|
HtmlDocument(File file)
Constructs an HtmlDocument from a java.io.File.
| |
HtmlDocument(InputStream is)
Constructs an HtmlDocument from an java.io.InputStream.
|
Method Summary | |
---|---|
static Document | Document(File file)
Creates a Lucene Document from a java.io.File.
|
String | getBody()
Gets the bodyText attribute of the
HtmlDocument object.
|
static Document | getDocument(InputStream is)
Creates a Lucene Document from an java.io.InputStream.
|
String | getTitle()
Gets the title attribute of the HtmlDocument
object.
|
static void | main(String[] args)
Runs HtmlDocument on the files specified on
the command line.
|
HtmlDocument
from a java.io.File.
Parameters: file the File
containing the
HTML to parse
Throws: IOException if an I/O exception occurs
HtmlDocument
from an java.io.InputStream.
Parameters: is the InputStream
containing the HTML
Document
from a java.io.File.
Parameters: file
Throws: IOException
HtmlDocument
object.
Returns: the bodyText value
Document
from an java.io.InputStream.
Parameters: is
HtmlDocument
object.
Returns: the title value
HtmlDocument
on the files specified on
the command line.
Parameters: args Command line arguments
Throws: Exception Description of Exception