Re: [Discuss] Pull Parsing, JSR-173, and Xerces

David M Williams Sun, 05 Oct 2003 23:08:29 -0700

Andy, I'll publically express my ignorance about this important topic, and make a few comments anyway.

You say you favor an "event" approach, but I thought the lack of events was the very definition of a "pull parser" and event driven approaches were the "push parsers"?

I guess I have a fear that events that (try to) represent a basically linear process due to concerns that it makes multithreaded apps harder to write, though no projects to prove it.

I haven't worked with or even studied the API ... but the spec itself seems to have a higher than usual number of sections that say they are "optional". That always makes be think some people in the inner circle want it, no one can work out the details in time for agreement, so its left optional, and those that implement/gain acceptance first then support that part of the standard defacto, without proper public review. (just my intuition, no data that this is the case here)

My own interest in this API are those that allow another parser to be written on the "output" of the "pull" operation. So, things like 'skip' and 'backup' are important. I did see that 'skip' would be supported, but never heard about 'backup' (there was a section that "random access" was part of this spec, which I think is ok).

Hope these comments spur further discussion.

Thanks for the education,

David

Andy Clark <[EMAIL PROTECTED]>

10/06/2003 12:48 AM
Please respond to xerces-j-user

To: [EMAIL PROTECTED]
cc: [EMAIL PROTECTED]
Subject: [Discuss] Pull Parsing, JSR-173, and Xerces

With the recent public review of JSR-173, the Streaming API for Java, I've been hoping for more of a discussion among Apache users and developers regarding this API and pull parsing in general. But there seems to have been an amazing amount of apathy in this regard. So I would like to kickstart the discussion. I am concerned that the API, as it stands, will not adequately meet the needs of XML developers. Moreover, I have concerns about implementing it efficiently in the Xerces parser. But I'll let others comment on the technical (de)merits of the API because I want to take this opportunity to discuss what I would like to see in a pull parser design. There are two camps of thought in JSR-173: one that wants a single interface iterator model and another that wants discrete event objects to represent the various parts of the document. The first is designed with small footprint in mind while the second is more OO and allows apps to conveniently save document content. To appease both camps, JSR-173 includes both approaches in the API. This is wrong. Users would be better served by a single, simpler, more integrated approach. I favor the event approach with the fundamental change that the event objects returned are singletons owned by the parser. If the application wants the information stored within the object, the app must copy the info out of the singleton and save it. This approach would appease those developers concerned with memory (e.g. people targeting J2ME) while providing a straightforward OO model for everyone else. The counter argument is that users of the API would be confused about who owns the memory and try to keep references to objects whose content is transient. But I disagree. While it may cause some people trouble the first time they sit down to write an app, they quickly learn the paradigm and move on. As we all know, DOM has "live" node lists. That's the model. You may trip over it the first time but then you learn it and move on. And providing a clone method allows applications to keep references to event objects if they choose. So this would be a way to provide that functionality as well while maintaining a single, integrated model which I think is paramount. I'll provide more details as the discussion develops but now I'd like to see what other people think. If you need to catch up with what I'm talking about, you can check out the following URLs regarding JSR-173: http://www.jcp.org/en/jsr/detail?id=173 http://jcp.org/aboutJava/communityprocess/first/jsr173/index.html One last thing: even though I'm cc'ing xerces-j-user, I would like to keep this discussion on the xerces-j-dev list. So if you'd like to contribute your two cents and you're not already subscribed to the xerces-j-dev list, do that now. -- Andy Clark * [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Discuss] Pull Parsing, JSR-173, and Xerces

Reply via email to