> Hi Mark, > I guess the reason these methods have not been added to SAX2XMLReader is > that this interface is derived from the SAX2 specs, where they > are missing. > If you can provide a patch using the EntityResolver2 extension, > that would > be a better fix for this problem. > I'm not really sure how to do this. Last time, I got as far as:
- Adding an EntityResolver2 class defined as per SAX2 spec - Adding a getEntityResolverVersion function to EntityResolver so as to be able to distinguish between the two at runtime
Maybe we could avoid this by adding an overloaded setEntityResolver(EntityResolver2*) that would automatically detect the new interface (and set the
http://xml.org/sax/features/use-entity-resolver2 feature)
- Changing resolveEntity everywhere to take a name and baseURI parameter - Calling the appropriate resolver (via getEntityResolverVersion and static_cast)
The appropriate resolver should be invoked by looking at the use-entity-resolver2 feature (that the user could decide to turn off even if she provides an EntityResolver2 interface)
Where I came unstuck was the name parameter:
name - Identifies the external entity being resolved. Either "[dtd]" for the external subset, or a name starting with "%" to indicate a parameter entity, or else the name of a general entity. This is never null when invoked by a SAX2 parser.
I haven't much of a clue how to do that.
It looks like the specs don't take into consideration XML Schema; what should "name" contain in that case?
If this resolver should be invoked only when a DTD entity is being resolver, the informations is almost there; the LastExtEntityInfo structure needs to be extended with a "name" field, that getLastExtEntityInfo would fill by using the XMLEntityDecl currently in scope.
Now I see that we have the old style resolveEntity method, and a new one that takes an XMLResourceIdentifier. XMLResourceIdentifier does not include a name, but it seems that these days the resolveEntity method is ignored (comments indicate that it is not called, but that the other one is instead, which seems true on a quick reading). I guess the sensible approach would be:
- Add a name member to XMLResourceIdentifier - Find out what to put in the name every time one of these is created - Have the new resolveEntity check for ER2 being installed and call the appropriate method passing on the name parameter
Then I'm also left with the getExternalSubset() function, which again, I don't know where or how to implement.
It should probably be handled like the setExternalSchemaLocation is handled; when isRoot is true and the setting has been set, the resolver is invoked and the DTD is parsed like it was specified in the prolog.
Alberto
There's also a problem with the ER2 vs ER method in that you are supposed to be able to turn off using ER2 via the SAX2 feature http://xml.org/sax/features/use-entity-resolver2. Obviously the ER2 method then needs some way of knowing which kind of entity resolver it is meant to be, and the simple test above is not enough for that. OTOMH you'd need to do something like storing pointers to
a) an EntityResolver and b) an EntityResolver2
everywhere. setEntityResolver on the SAX2 interface can then check the feature flag and store the appropriate one (which would then pass off the appropriate ER/ER2 to the other scanners which are invoked, e.g. for the DTD/XSD, where each of these would need to have a slightly different interface featuring a setEntityResolver2.
It's beginning to sound very messy (unless there is a better solution, which there may well be), as we now have three ways of installing three different kinds of incompatible entity resolvers.
Simply exposing setXMLEntityResolver isn't actually enough for me: I'm using Xerces through Xalan (which I assume is fairly common), so then a chunk of Xalan (including interfaces) needs to change to expose this as well, so that's not ideal either. With the old method (which admittedly was terminally broken wrt to the spec) but did the job for me, there were no external interface changes, and therefore no changes needed in Xalan, which is a plus.
Any help as to which way to go would be appreciated.
Thanks,
Mark
> Alberto > > At 13.19 07/04/2004 +0100, Mark Weaver wrote: > >Would this be permissible? This is very useful, as the current > >EntityResolver interface does not provide a base URI, leading to > the problem > >of it being impossible to correctly resolve a root document > including a DTD > >which includes another resource via a relative reference (and > that's really > >common -- most DTDs include other DTDs). A trivial example of such is > >parsing and validating something specifiying: > > > ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > > > >where xhtml1-strict.dtd has as it's first line: > > > ><!ENTITY % HTMLlat1 PUBLIC > > "-//W3C//ENTITIES Latin 1 for XHTML//EN" > > "xhtml-lat1.ent"> > >%HTMLlat1 > > > >I tried to do this before by implementing EntityResolver2 (against 2.3.0) > >but I came unstuck on the `name' parameter, where basically in a large > >number of places I wasn't sure what to put for that. However, the new > >method would work just fine for me, provided that I could get at it! > > > >The patch below implements the change. The code is already > present, so it > >just exposes it... > > > >Thanks, > > > >Mark > > > >diff -ur xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp > >xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp > >--- xerces-c-src_2_5_0\src\xercesc/sax2/SAX2XMLReader.hpp > 2004-02-16 > >20:52:16.000000000 +0000 > >+++ xml-xerces\src\xercesc/sax2/SAX2XMLReader.hpp 2004-04-07 > >01:41:56.583227000 +0100 > >@@ -173,6 +173,7 @@ > > > > class ContentHandler ; > > class DTDHandler; > >+class XMLEntityResolver; > > class EntityResolver; > > class ErrorHandler; > > class InputSource; > >@@ -249,6 +250,13 @@ > > virtual EntityResolver* getEntityResolver() const = 0 ; > > > > /** > >+ * This method returns the installed entity resolver. > >+ * > >+ * @return A pointer to the installed entity resolver object. > >+ */ > >+ virtual XMLEntityResolver* getXMLEntityResolver() const = 0 ; > >+ > >+ /** > > * This method returns the installed error handler. > > * > > * @return A pointer to the installed error handler object. > >@@ -338,6 +346,24 @@ > > */ > > virtual void setEntityResolver(EntityResolver* const resolver) = 0; > > > >+ /** Set the entity resolver > >+ * > >+ * This method allows applications to install their own entity > >+ * resolver. By installing an entity resolver, the applications > >+ * can trap and potentially redirect references to external > >+ * entities. > >+ * > >+ * <i>Any previously set entity resolver is merely dropped, since the > >parser > >+ * does not own them. If both setEntityResolver and > >setXMLEntityResolver > >+ * are called, then the last one is used.</i> > >+ * > >+ * @param resolver A const pointer to the user supplied entity > >+ * resolver. > >+ * > >+ * @see #getXMLEntityResolver > >+ */ > >+ virtual void setXMLEntityResolver(XMLEntityResolver* const > resolver) = > >0; > >+ > > /** > > * Allow an application to register an error event handler. > > * > > > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > >
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
