I know I missed the Friday deadline but...
Has anyone any recommendations for parsing html. I use Lucene and the example has its own HTML parser but I was wondering if anyone has used an existing project or whether there is some built in functionality in an Apache lib to convert <p>Hello <i>World</i></p> To Hello World Your thoughts are appreciated.