Mushfiqur Rahman wrote:
I want to parse a HTML document( may not be a XHTML document) using org.apache.xerces.parsers.DOMParser and get a org.w3c.dom.Document after parsing. Can anyone tell me how can I do it?

If you just need a DOM document, there are a few options. Check out JTidy[1] and NekoHTML[2]. JTidy has been around longer but NekoHTML has the advantage of using less memory and it is built on top of Xerces.

But, as with everything, check out all of your
options and pick the one that works for you.

[1] http://sourceforge.net/projects/jtidy/
[2] http://www.apache.org/~andyc/neko/doc/html/

--
Andy Clark * [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to