http://java-source.net/open-source/html-parsers
"Mark Benussi" <[EMAIL PROTECTED]> 09/03/2005 04:24 AM Please respond to "Struts Users Mailing List" <[email protected]> To "'Struts Users Mailing List'" <[email protected]>, "'Tomcat Users List'" <[email protected]> cc Subject [OT Friday] Parse HTML file to underlying text I know I missed the Friday deadline but... Has anyone any recommendations for parsing html. I use Lucene and the example has its own HTML parser but I was wondering if anyone has used an existing project or whether there is some built in functionality in an Apache lib to convert <p>Hello <i>World</i></p> To Hello World Your thoughts are appreciated.

