As Andy mentioned, parsing HTML, especially non-well-formed HTML, is not something Xerces was designed to do. XML has certain restrictions on it precisely to make it easier to process.
However you could look into using Tidy, found at http://www.w3.org/People/Raggett/tidy/, to parse your document, and then write a program to take the output of tidy to create a well-formed XHTML document, which you could then use Xerces to process. - Shane ---- you atta ur-rehman wrote: ---- > What I'm trying to do is very simple. I have an HTML document, > HTML not XHTML, which may or may not be well formed and I need ===== <eof aka="mailto:[EMAIL PROTECTED]" quote="A mirror is like a window on the other side of behind you."/> __________________________________________________ Do You Yahoo!? Make a great connection at Yahoo! Personals. http://personals.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
