Re: [Xerces 2] accessing and controling entity parsing in XNI

Andy Clark Thu, 18 Oct 2001 20:13:56 -0700

Aleksander Slominski wrote:
> i would like to extend to allow application to access and control through
> XNI behavior of input buffer handling and to be able to retrieve
> information about position of events (such as where is start end and end
> position of start tag <...> in input buffer).
> 
> i do not think it is possible now and i am not even sure if it can be
> added?


Currently, we only provide the location via the XMLLocator
passed to the startDocument/startDTD methods in the handlers.
Can you use this between callbacks in order to determine the
boundaries of the markup or content returned?

Please note however, that the locations reported by the
locator object are the row and column numbers of the position
in the *transcoded* stream immediately following the last
scanned markup or content. So this information does not
reflect the actual position in the original stream because
of various issues like character encoding, etc.

> i would like to expose to application fCurrentEntity.position and allow to
> control peekChar() and load() behavior (load is now private final function
> ...).

Why do you want to control the entity scanner? Other people
(e.g. Xalan folks) have also asked about being able to control 
the input buffer in the parser. So it would be useful to know 
why you want this feature.

> finally i would like to be able ot pinpoint input buffer so it is always
> growing but never shrunk with System.arraycopy() - it is very useful if i
> want to keep in memory representation of unparsed XML in memory that can
> be used similarly to DOM as persistent representation of XML doc  ( to
> reconstruct DOM *when* it is needed...).

This is a much more difficult request and I'll explain why.

The scanner is implemented to be as efficient as possible.
So it re-uses the underlying character buffer over and over
again. We've been asked to add a feature to orphan the
character array instead of re-using it so that people can
keep a reference to the character array passed to the
characters() method and know that the data won't be
changed later. This could very easily be done.

However, growing the underlying character array is much
more difficult. Do you want the array contain the decoded 
but non-normalized contents of the document? Or do you
want the array to contain the "flattened" contents of
the document, with all entities inlined, etc? And once
you grow the array, then all of the array references
and position information that you've collected during
the parse is incorrect.

So I would advise not to go down that path.

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Xerces 2] accessing and controling entity parsing in XNI

Reply via email to