Re: [Xerces2] InfoSet Augmentations in XNI

Andy Clark Thu, 20 Sep 2001 22:50:07 -0700
I'm inlining a response from Elena that was off of the mailing
list. Since it pertains directly to this conversation, I hope
she doesn't mind me including it here. :)

Elena Litani wrote:
> I don't really like your idea. First of all, we can only gather PSVI
> information during validatation process meaning that this information
> does not have to be carried over through the parser pipeline (Scanner
> does not have to implement this interface).

You're taking a PSVI-only standpoint. My goal is to provide 
a generic facility to propagate all kinds of infoset 
augmentations within XNI. Such a facility would let us 
implement PSVI and be able to accomodate any other infoset 
augmentations in the future without having to invent new 
interfaces or change the XNI framework. It's for this very 
reason that SAX2 included the generic feature and property 
mechanism.

But let's take PSVI as a use-case to see what is needed 
by this particular instance of an infoset augmentation so
that we can learn more about what a generic infoset
mechanism would need. In this scenario, the XML Schema
validator produces "extra" document information called the
Post-Schema Validation Infoset (PSVI) which contains
addition information beyond just the document's structure
and textual content (e.g. an attribute or element's data
type and value). So far so good. 

This PSVI information will be exposed to the application 
in a variety of ways. The most obvious of which is the
upcoming DOM Level 3 Content Model API. But I'm sure that
there will be others.

Okay, so far we have the XML Schema validator producing
these information items and the DOM parser consuming
them to augment the DOM tree with the content model and 
datatype information. (Notice that I'm not making any 
effort here to define what this information is.) So this 
information must be communicated in some fashion -- the 
open question is how.

> Thus, I believe that for PSVI we need to introduce a new XNI interface
> that in a way simillar to XMLDTDHandler interface: smth like
> PSVIHandler. 

I've stated before (and I'll state again ;) that I don't
think XNI should include PSVI interfaces because it's only 
specific to XML Schema. We went to a lot of effort to come 
up with a set of interfaces that are independent of specific
APIs and implementations. Adding PSVI specific interfaces 
would be taking a step backward, in my opinion.

Anytime that I think about what should be in XNI, I ask
myself the question: "does *everyone* need this?" or "is
this a fundamental part of XML?". Clearly, not everyone
needs XML Schema and all of its infoset augmentations. In 
addition, XML Schema sits on top of XML and therefore PSVI 
should sit on top of XNI and not be a direct part of it. 
I'm being firm on this because I feel that it's important.

But let me play devil's advocate: say we introduced an
infoset augmentation interface. To keep it generic and not
tied to XML Schema, let's call it "XMLInfoSetHandler". Now
this handler has some methods to allow individual stages
in the document pipeline to add or remove infoset items,
thus augmenting the information set of the document. How
does the application associate these information items
to the data that is actually going through the pipeline?

This is the crux of the problem; we need some way to
associate the infoset augmentations to the actual XML
data flowing through the pipeline. You can never assume
that there is a one-to-one correspondence between the
number of events emitted from a stage (e.g. from the XML
Schema validator) to the number of events received at
the end of the pipeline. Why? Because subsequent stages
may add or remove events in the process. (e.g. the
namespace binder or an XInclude processor.)

Continuing my advocacy of the devil, how would we solve
the problem where the actual data flowing through the
pipeline becomes out of sync with the associated infoset
items passed via the XMLInfoSetHandler interface? A
unique identifier would work. So let's follow that train
of thought as it slowly derails... ;) 

If we add a key to the infoset items passed to the infoset 
handler, then there must be a key passed along with the 
actual data. This implies that we would need to add a 
parameter to each and every callback in the existing XNI 
handlers so that components "upstream" from the infoset 
augmentation can correctly associate the information. 

However, if you are willing to add a key parameter, then 
there is no need for the out-of-band infoset handler 
interface. Instead of passing the key, we can just pass 
the infoset itself. And now we arrive back to what I had 
proposed in the first place.

> DOM Parser as well as SAX parser will implement this
> interface and the information will be passed from XML Validator to the
> APIs (DOM .. XNI..). This handler might be eventually contributed to the
> SAX API package (as an extention).

I can see how the DOM would use this information to
build Level 3 Content Model nodes but I don't see the
use in the current version of SAX. The SAX interfaces
are already set and it's very difficult to solve the
problem of associating the document information with
the infoset augmentations. Do you have an idea how
you would solve this problem?

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] InfoSet Augmentations in XNI

Reply via email to