Hi,
On Sat, 2009-07-25 at 00:49 +0200, Martin Grotzke wrote:
> Hi all,
>
> I'm just starting with tika and try to extract the text content of some
> html. Unfortunately, I get no content at all.
>
> This is my test method (in scala):
>
> def testHtml() {
> val html = "<html><body>my content</body></html>"
> val input = new ByteArrayInputStream(html.getBytes)
> val metadata = new Metadata
> val textHandler = new BodyContentHandler
> val parser = new HtmlParser
> parser.parse(input, textHandler, metadata);
> input.close();
> println("HTML Input: " + html)
> println("Title: " + metadata.get("title"))
> println("Author: " + metadata.get("Author"))
> println("content: " + textHandler.toString)
> }
If the above was not explicit enough: textHandler.toString was empty.Any help? Thx && cheers, Martin > > Is there anything wrong here? > > Thanx && cheers, > Martin >
signature.asc
Description: This is a digitally signed message part
