Re: [xml] Recovering from errors in an XML "stream"

Webb Scales Tue, 24 Sep 2019 14:30:05 -0700

Thanks, Eric -- that's an interesting suggestion.

Does this work for you because the '<' character is not permitted in thestream except as the opening of a tag (which makes it verystraightforward to locate each tag) and the root tag is not permitted toappear inside the document (or, are you doing a nesting count?)? I'mtrying to ensure that my code doesn't have to know too much about XML ormaintain too much state (that's what I'm using LibXML2 for! :-) ).



            Thanks,

                Webb



On 9/24/19 5:14 PM, Eric Eberhard wrote:

You can easily read the XML using TCP/IP yourself and find the endingtag, process, read the next document, process, etc. We do that always(much easier than other ideas). You know the ending tag from thestarting tag and there are issues about blocking and non-blockingreads. We read one byte blocking and as soon as we get something weread until the ending tag and pause for processing. Eric
*From:*xml [mailto:[email protected]] *On Behalf Of *Webb Scales
*Sent:* Monday, September 09, 2019 9:30 PM
*To:* Liam R E Quin <[email protected]>; [email protected]
*Subject:* Re: [xml] Recovering from errors in an XML "stream"
I'm OK with making small on-the-fly "edits" to the input (such asremoving the initial comment, or removing all comments), but trying tomake my code discern the overall structure (such as picking out theboundaries between the documents) is starting to step over intoactually parsing it, which defeats the purpose of using LibXML2.
If the TextReader didn't insist upon reading beyond the root end-tag,that would enable me to solve my problem, I think. (I don't understandwhy it does that.) In the absence of any other options, I'm going toexperiment with the SAX interface and see if that will allow me tostop the parse at the right spot.
Anyway, thanks for your replies, Liam.


            Webb


On 9/10/19 12:19 AM, Liam R E Quin wrote:

    On Mon, 2019-09-09 at 22:41 -0400, Webb Scales wrote:

        the

        fact remains that I don't control the text that I'm trying to parse,

        and I still need to parse it, even though it's not "well-formed".

    You may need to write some form of pre-processor that fixes the

    problems. As you say, that may reduce the need for an XML parser.

    I haven't investigated error recovery with libxml, so someone else

    might have better ideas.

    Liam


--

Webb Scales
Principal Software Architect
603-673-2306
www.ursasecure.com <https://www.ursasecure.com>
[email protected] <mailto:[email protected]>

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Recovering from errors in an XML "stream"

Reply via email to