Hi folks, To motivate these thoughts, here's the use-case I'm imagining. It seems to me nost likely that what people will want to cache is external DTD's; probably large DTD's that XML documents to be validated should simply reference and conform to. I don't think we should entertain the idea of doing anything with the internal subset, or with parameter entities whose contents lie outside the document (unless the parameter entity decl also lies outside the document of course).
The first question is whether, given the current infrastructure, it's possible to create grammar objects that correspond to external DTD's. I think the answer is yes, since if a document simply refers to an external DTD and has no internal subset, then the grammar produced is just what we'd want. Assuming we can do this, we should decide whether to read the external subset if there is a grammar for it. I think the answer is that we should not--although we will want a feature to control this... The whole point of grammar caching is to reduce disk access and reduce the number of method calls; so it would seem to me counterproductive to step through the external subset if a grammar for it is known. On the other hand, a validating processor is supposed to read all external decls; there's no provision in the specs for a processor to ignore this if it somehow already "knows" what it's doing. So in the spirit of being 100% conformant, we'd probably want to read external decls by default even if it might not make much sense in this application. I'd love to hear perspectives on this one! What should be done with internal-subset declarations in the document is another tough question. My sense is that, in general, they shouldn't be ignored (even were it possible in the current framework to ignore them) and that their presence should be reflected in the grammar that the document is validated against. But this implies modifying a cached grammar, which is a very problematic idea... I can see two ways around this: 1. state up front that our DTD implementation will modify cached external subsets, and if the grammar pool wants to preserve a pristine grammar then it must provide a clone to the DTD validator. This implies a rather considerable loss in flexibility and also means we'll have to implement a clone() method on the grammar class; cloning a grammar will also be a mean performance hit (though better than rebuilding from scratch of course). Alternatively, we could state that, if a grammar comes from a cache, our DTD implementation will not modify it (i.e., internal decls will have no effect, although we could still send them off to the handlers). This is similarly inflexible, and means that when DTD grammar caching is employed, we are no longer a XML 1.0-compliant parser. I guess a feature could be used to select the behaviour. I'm just afraid of the number of times we'll have to check the status of this feature when grammar caching's enabled... There are a lot of cases involved, and this might not be trivial performance-wise (or implementation-wise either). Lots of questions here; thoughts, comments, answers etc. greatly appreciated! Cheers, Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
