Re: Application Driven Parsing

Andy Clark 26 Sep 2002 03:12:15 -0000

Nicholas Barratt wrote:
>Joe Kesselman wrote:

I don't think there's any way to run Xerces purely as

>>a pipeline at this time, unless I missed an announcement.
>

Thanks Joe, that's what I thought.


Unless the implementation has changed, Xerces2 does
not have to buffer a large chunk of data (or the
entire document, if the document is small) before
parsing the content. We had a problem with the
previous version of Xerces because it would wait
until it could fill a large buffer (32K? 64K?) or
until the stream ended before it started parsing
the document.

However, while the parser can work with whatever
amount of data is available at that time, you may
still run into a problem with the Java socket I/O
stream blocking. I think you could get around this
by writing a special input stream that checks to
see how much data is available without blocking
by calling the "available" method. Then, you would
only return up to the amount of data available.
Does this make sense?

NOTE: This assumes that the socket input stream
      implements the "available" method.

If I wrote an asynchronous xml scanner, would I be able to plug this
into the Xerces framework to avoid having to duplicate the different
validation steps?  It doesn't look like I want to implement
XMLDocumentSource, as it assumes synchronous or pull parsing.  Where
would be a good place to start?


This would be another solution but would require
more work. But, if you can assume that the XML
stream coming from the Jabber server is both
well-formed and valid then it gets much easier.
And if it doesn't have DOCTYPE declarations, it's
easier still.

In order to plug into the existing Xerces2 parser
configurations and work with our standard components,
you would need to implement the XMLDocumentSource
interface. Then, you can simply replace the standard
XML document scanner in the configuration with your
own and continue to use the existing parser classes
(i.e. DOMParser, SAXParser, etc).

However, there are requirements when using the
standard components. These should be listed within
the documentation. Please refer to the "XNI Manual"
for an overview of XNI and how to use it:

  http://xml.apache.org/xerces2-j/xni.html

That should get you going in the right direction.
And you can use the Xerces2 source code as a basis
as well. For more examples of how to write to XNI,
check out the CyberNeko Tools for XNI. There are
several parser configurations within that set of
tools that are useful examples (e.g. NekoDTD and
NekoHTML). You can find the tools at the following
URL:

  http://www.apache.org/~andyc/neko/doc/index.html

Good luck! And let us know if you have any more
questions.

--
Andy Clark * [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Application Driven Parsing

Reply via email to