> > >Then I'm also left with the getExternalSubset() function, which
again, I
> > >don't know where or how to implement.
> >
> > It should probably be handled like the setExternalSchemaLocation is
> > handled; when isRoot is true and the setting has been set, the
> > resolver is
> > invoked and the DTD is parsed like it was specified in the prolog.
> >
> I'm not entirely clear about that, but I'll have another look around the
> code and see if it makes any more sense.

> I've just implemented this in Xerces-J [2]. For documents which have no
> DOCTYPE declaration, the name of the root element needs to be scanned
> before invoking getExternalSubset on the EntityResolver. Values of
> attributes in the root element may contain references to entities which
> are defined in the external subset, so if the scanner expands entity
> references as they are encountered, the external subset needs to be read
> immediately after scanning the name of the root element.

This looks immensely complicated to me.  You've listed three cases:

1) Neither an external or internal subset exist.
2) Only an internal subset exists.
3) No DOCTYPE declaration in the document.

And AFAICS:

1) e.g. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN">

This seems to correspond to IGXMLScanner.cpp #1403:

    //  And now if we are looking at a >, then we are done. It is not
    //  required to have an internal or external subset, though why you
    //  would not escapes me.
    if (fReaderMgr.skippedChar(chCloseAngle)) {

At which point we note this fact and call getExternalSubset.  We have to
do this even with validation turned off, because in order to call startDTD
with the correct systemId you need to get hold of the input source (see
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html for
the example).  You also can't cache the grammar, as caching is based on
systemId.  So you would end up with not being able to cache the user
provided DTD with validation turned on, and fetching it with validation
turned off.

2. Would be in the same place.  Presumably then you resolve and process the user 
provided external subset as well, and pass systemId() from the InputSource object 
provided to startDTD (via doctypeDecl)?

3. IGXMLScanner.cpp #2291 has:

    if (isRoot
        && fDoSchema
        && (fExternalSchemaLocation || fExternalNoNamespaceSchemaLocation)) {

so presumably here is the appropriate place to attempt to call getExternalSubset().  
Doing something with it looks a lot harder, as none of the underlying support there 
expects the user to have provided anything more than a systemId.

If this seems right, I'll give it a go, but it looks totally over my head.  I've never 
even looked at a DTD before!


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to