RE: exposing setXMLEntityResolver to the SAX parser

Alberto Massari Thu, 08 Apr 2004 05:44:09 -0700

At 13.13 08/04/2004 +0100, Mark Weaver wrote:

> Hi Mark,
> I guess the reason these methods have not been added to SAX2XMLReader is
> that this interface is derived from the SAX2 specs, where they
> are missing.
> If you can provide a patch using the EntityResolver2 extension,
> that would
> be a better fix for this problem.
>
I'm not really sure how to do this.  Last time, I got as far as:

- Adding an EntityResolver2 class defined as per SAX2 spec
- Adding a getEntityResolverVersion function to EntityResolver so as to be
able to distinguish between the two at runtime

Maybe we could avoid this by adding an overloaded setEntityResolver(EntityResolver2*) that would automatically detect the new interface (and set the http://xml.org/sax/features/use-entity-resolver2 feature)

- Changing resolveEntity everywhere to take a name and baseURI parameter
- Calling the appropriate resolver (via getEntityResolverVersion and
static_cast)

The appropriate resolver should be invoked by looking at the use-entity-resolver2 feature (that the user could decide to turn off even if she provides an EntityResolver2 interface)

Where I came unstuck was the name parameter:

name - Identifies the external entity being resolved. Either "[dtd]" for the
external subset, or a name starting with "%" to indicate a parameter entity,
or else the name of a general entity. This is never null when invoked by a
SAX2 parser.

I haven't much of a clue how to do that.

It looks like the specs don't take into consideration XML Schema; what should "name" contain in that case? If this resolver should be invoked only when a DTD entity is being resolver, the informations is almost there; the LastExtEntityInfo structure needs to be extended with a "name" field, that getLastExtEntityInfo would fill by using the XMLEntityDecl currently in scope.

Now I see that we have the old style resolveEntity method, and a new one
that takes an XMLResourceIdentifier.  XMLResourceIdentifier does not include
a name, but it seems that these days the resolveEntity method is ignored
(comments indicate that it is not called, but that the other one is instead,
which seems true on a quick reading).  I guess the sensible approach would
be:

- Add a name member to XMLResourceIdentifier
- Find out what to put in the name every time one of these is created
- Have the new resolveEntity check for ER2 being installed and call the
appropriate method passing on the name parameter

Then I'm also left with the getExternalSubset() function, which again, I
don't know where or how to implement.

It should probably be handled like the setExternalSchemaLocation is handled; when isRoot is true and the setting has been set, the resolver is invoked and the DTD is parsed like it was specified in the prolog.

Alberto

There's also a problem with the ER2 vs ER method in that you are supposed to
be able to turn off using ER2 via the SAX2 feature
http://xml.org/sax/features/use-entity-resolver2.  Obviously the ER2 method
then needs some way of knowing which kind of entity resolver it is meant to
be, and the simple test above is not enough for that.  OTOMH you'd need to
do something like storing pointers to

a) an EntityResolver
and
b) an EntityResolver2

everywhere.  setEntityResolver on the SAX2 interface can then check the
feature flag and store the appropriate one (which would then pass off the
appropriate ER/ER2 to the other scanners which are invoked, e.g. for the
DTD/XSD, where each of these would need to have a slightly different
interface featuring a setEntityResolver2.

It's beginning to sound very messy (unless there is a better solution, which
there may well be), as we now have three ways of installing three different
kinds of incompatible entity resolvers.

Simply exposing setXMLEntityResolver isn't actually enough for me: I'm using
Xerces through Xalan (which I assume is fairly common), so then a chunk of
Xalan (including interfaces) needs to change to expose this as well, so
that's not ideal either.  With the old method (which admittedly was
terminally broken wrt to the spec) but did the job for me, there were no
external interface changes, and therefore no changes needed in Xalan, which
is a plus.

Any help as to which way to go would be appreciated.

Thanks,

Mark

> Alberto
>
> At 13.19 07/04/2004 +0100, Mark Weaver wrote:
> >Would this be permissible?  This is very useful, as the current
> >EntityResolver interface does not provide a base URI, leading to
> the problem
> >of it being impossible to correctly resolve a root document
> including a DTD
> >which includes another resource via a relative reference (and
> that's really
> >common -- most DTDs include other DTDs).  A trivial example of such is
> >parsing and validating something specifiying:
> >
> ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> >         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
> >
> >where xhtml1-strict.dtd has as it's first line:
> >
> ><!ENTITY % HTMLlat1 PUBLIC
> >    "-//W3C//ENTITIES Latin 1 for XHTML//EN"
> >    "xhtml-lat1.ent">
> >%HTMLlat1
> >
> >I tried to do this before by implementing EntityResolver2 (against 2.3.0)
> >but I came unstuck on the `name' parameter, where basically in a large
> >number of places I wasn't sure what to put for that.  However, the new
> >method would work just fine for me, provided that I could get at it!
> >
> >The patch below implements the change.  The code is already
> present, so it
> >just exposes it...
> >
> >Thanks,
> >
> >Mark
> >
> >diff -ur xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp
> >xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp
> >--- xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp
> 2004-02-16
> >20:52:16.000000000 +0000
> >+++ xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp       2004-04-07
> >01:41:56.583227000 +0100
> >@@ -173,6 +173,7 @@
> >
> >  class ContentHandler ;
> >  class DTDHandler;
> >+class XMLEntityResolver;
> >  class EntityResolver;
> >  class ErrorHandler;
> >  class InputSource;
> >@@ -249,6 +250,13 @@
> >      virtual EntityResolver* getEntityResolver() const = 0 ;
> >
> >      /**
> >+      * This method returns the installed entity resolver.
> >+      *
> >+      * @return A pointer to the installed entity resolver object.
> >+      */
> >+    virtual XMLEntityResolver* getXMLEntityResolver() const = 0 ;
> >+
> >+       /**
> >        * This method returns the installed error handler.
> >        *
> >        * @return A pointer to the installed error handler object.
> >@@ -338,6 +346,24 @@
> >      */
> >      virtual void setEntityResolver(EntityResolver* const resolver) = 0;
> >
> >+  /** Set the entity resolver
> >+    *
> >+    * This method allows applications to install their own entity
> >+    * resolver. By installing an entity resolver, the applications
> >+    * can trap and potentially redirect references to external
> >+    * entities.
> >+    *
> >+    * <i>Any previously set entity resolver is merely dropped, since the
> >parser
> >+    * does not own them.  If both setEntityResolver and
> >setXMLEntityResolver
> >+    * are called, then the last one is used.</i>
> >+    *
> >+    * @param resolver  A const pointer to the user supplied entity
> >+    *                  resolver.
> >+    *
> >+    * @see #getXMLEntityResolver
> >+    */
> >+    virtual void setXMLEntityResolver(XMLEntityResolver* const
> resolver) =
> >0;
> >+
> >    /**
> >      * Allow an application to register an error event handler.
> >      *
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: exposing setXMLEntityResolver to the SAX parser

Reply via email to