Hello Everyone! I am trying to write an application that has to parse a sequence of XML documents (thousands of them) from the file/stream. Every document in the sequence should be a well-formed XML, but they are not necessarily in the same encoding. The stream will look somewhat like this:
:BEGIN EXAMPLE: <?xml version="1.0" encoding="utf-8"?> <document id="1"> ... content ... </document> <?xml version="1.0" encoding="iso-8859-1"?> <document id="2"> ... content ... </document> ... :END EXAMPLE: The problem is, that if there is a well-formness error in any of the documents, I don't want to discard the whole stream, since there may be thousands of good well-formed XML documents in it. I want to discard just one document, but try to recover and continue parsing the next one. Anyone has any suggestions on how to do it "the right way"? I was thinking of deriving my own InputSource class, that will be similar to LocalFileInputSource, but will keep reusing the same BinFileInputStream object for every makeStream() call. Then supply this InputSource to SAX2XMLReader::parse(), reset SAX2XMLReader after the doc is complete, and call parse() again and again ... This should work fine (I haven't tried it yet, though) if all documents in the stream are well-formed. If not, parser will die half-way through the document. At this point I will have to recover by searching for the closing </document> tag, to start parsing next document right after it. But in order to do that I need to know what encoding the malformed document was in. Is there any way to get access to that info? I can see other problems with such approach too (e.g. what if well-formness error is even before the opening <document> tag?), and therefore I am wondering if I am at all on the right path. Any advice on this is really appreciated. Thanks a lot, -- Matt --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]