I once had pretty good success parsing some sloppy HTML right off the
web through an HTTP proxy server with a parser called neko. I can
provide code samples off-list if you need them.
It is also an apache offering.
Timothy Jones
Syniverse Technologies
Work
(813) 637-5366
Sr. Systems Engineer
Cell
(813) 857-7650
Development, Tampa, FL
________________________________
From: Dave Brosius [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 21, 2007 9:37 AM
To: Michael Bauer
Cc: [email protected]
Subject: Re: Ignoring errors
No, but there are various html 'tidying' tools that you could use to
preparse the html before passing to the transformer.
Michael Bauer <[EMAIL PROTECTED]>
08/21/2007 09:33 AM
To
[email protected]
cc
Subject
Ignoring errors
I am using Xalan/Xerces to parse out some data from a web page. The
problem is that the web page is not well-formed, and running the
Transformer on it produces:
ERROR: 'Open quote is expected for attribute "href|".'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException:
Open quote is expected for attribute "href|".'
Is there anyway to instruct the Parse/Transformer to ignore such errors?