Andy, I'll publically express my ignorance about this important topic, and make a few comments anyway.

You say you favor an "event" approach, but I thought the lack of events was the very definition of a "pull parser" and event driven approaches were the "push parsers"?

I guess I have a fear that events that (try to) represent a basically linear process due to concerns that it makes multithreaded apps harder to write, though no projects to prove it.

I haven't worked with or even studied the API ... but the spec itself seems to have a higher than usual number of sections that say they are "optional". That always makes be think some people in the inner circle want it, no one can work out the details in time for agreement, so its left optional, and those that implement/gain acceptance first then support that part of the standard defacto, without proper public review. (just my intuition, no data that this is the case here)

My own interest in this API are those that allow another parser to be written on the "output" of the "pull" operation. So, things like 'skip' and 'backup' are important. I did see that 'skip' would be supported, but never heard about 'backup' (there was a section that "random access" was part of this spec, which I think is ok).

Hope these comments spur further discussion.

Thanks for the education,

David






Andy Clark <[EMAIL PROTECTED]>

10/06/2003 12:48 AM
Please respond to xerces-j-user

       
        To:        [EMAIL PROTECTED]
        cc:        [EMAIL PROTECTED]
        Subject:        [Discuss] Pull Parsing, JSR-173, and Xerces



With the recent public review of JSR-173, the Streaming
API for Java, I've been hoping for more of a discussion
among Apache users and developers regarding this API
and pull parsing in general. But there seems to have
been an amazing amount of apathy in this regard. So I
would like to kickstart the discussion.

I am concerned that the API, as it stands, will not
adequately meet the needs of XML developers. Moreover,
I have concerns about implementing it efficiently in
the Xerces parser. But I'll let others comment on the
technical (de)merits of the API because I want to take
this opportunity to discuss what I would like to see
in a pull parser design.

There are two camps of thought in JSR-173: one that
wants a single interface iterator model and another
that wants discrete event objects to represent the
various parts of the document. The first is designed
with small footprint in mind while the second is more
OO and allows apps to conveniently save document
content.

To appease both camps, JSR-173 includes both approaches
in the API. This is wrong. Users would be better served
by a single, simpler, more integrated approach.

I favor the event approach with the fundamental change
that the event objects returned are singletons owned
by the parser. If the application wants the information
stored within the object, the app must copy the info out
of the singleton and save it.

This approach would appease those developers concerned
with memory (e.g. people targeting J2ME) while providing
a straightforward OO model for everyone else. The counter
argument is that users of the API would be confused about
who owns the memory and try to keep references to objects
whose content is transient. But I disagree.

While it may cause some people trouble the first time
they sit down to write an app, they quickly learn the
paradigm and move on. As we all know, DOM has "live" node
lists. That's the model. You may trip over it the first
time but then you learn it and move on.

And providing a clone method allows applications to keep
references to event objects if they choose. So this would
be a way to provide that functionality as well while
maintaining a single, integrated model which I think is
paramount.

I'll provide more details as the discussion develops but
now I'd like to see what other people think. If you need
to catch up with what I'm talking about, you can check
out the following URLs regarding JSR-173:

  http://www.jcp.org/en/jsr/detail?id=173
  http://jcp.org/aboutJava/communityprocess/first/jsr173/index.html

One last thing: even though I'm cc'ing xerces-j-user, I
would like to keep this discussion on the xerces-j-dev
list. So if you'd like to contribute your two cents and
you're not already subscribed to the xerces-j-dev list,
do that now.

--
Andy Clark * [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to