At 18.22 14/03/2002 -0800, Dean Roddey wrote:
>XMLScanner is an internal implementation, so you take your own chances if
>you use it since it could change any time. This is most likely why it isn't
>documented, since they don't want to accept the responsibility for people
>using it.
>
>Actually, some more abstract API could be added to the DOM parser to get the
>system id of the current entity. There are issues, since some entities are
>internal and have no id. The scanner, if I remember correctly, has a method
>that searchs back up the reader stack to find the most nested external
>entity, skipping over any internal entities.

I think you are talking about XMLScanner::getLastExtLocation.

In any case, I want to point that using a custom EntityResolver will not 
work when you will try to use an XML Schema. For example, suppose you have 
an xml file like this one

<instance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:noNamespaceSchemaLocation="http://www.myco.com/schema.xsd";>
   ....
</instance>

and the XML Schema is something like this

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";>
   <xs:include schemaLocation="subschema.xsd"/>
   ...
</xs:schema>

Now, in your C++ program you instanciate your scanner and set up the 
EntityResolver object

LocalFileInputSource inputSource(L"c:\\instance.xml");
DOMParser parser;
MyEntityResolver resolver(&parser);
parser.setEntityResolver(&resolver);
parser.parse(&inputSource);

The only ways the EntityResolver::resolveEntity method will be able to know 
the name of file currently being parsed are:
- if you are using DOM, either give him a pointer to the parser (so that it 
can call getScanner().getLastExtLocation()) or implement the EntityResolver 
interface on your own DOMParser-derived class
- if you are using SAX, implement the EntityResolver interface in the same 
object implementing DocumentHandler (DocumentHandler::setDocumentLocator 
will be called with the pointer to a Locator object, and 
EntityResolver::resolveEntity will be able to call Locator::getSystemId etc..)

But, when XMLScanner will find a reference to the XML Schema (because of an 
xsi:noNamespaceLocation, xs:import, xs:include or xs:redefine instruction) 
it will execute this code:

void XMLScanner::resolveSchemaGrammar(...)
{
...
         IDOMParser parser;
         XMLInternalErrorHandler internalErrorHandler(fErrorHandler);
         parser.setValidationScheme(IDOMParser::Val_Never);
         parser.setDoNamespaces(true);
         parser.setErrorHandler((ErrorHandler*) &internalErrorHandler);
         parser.setEntityResolver(fEntityResolver);
...
         parser.parse(*srcToFill) ;

This means that the entity resolver you specified (fEntityResolver) will be 
silently attached to a different parser (in this case, IDOMParser). 
EntityResolver::resolveEntity will be called, for instance, to open the 
schema "subschema.xsd" (because of <xs:include 
schemaLocation="subschema.xsd"/>), but it will not be able to correctly 
determine the location of the current file. It will think we are still 
parsing c:\instance.xml, instead of http://www.myco.com/schema.xsd.

This looks like a problem with the spec of the SAX interface (that define 
EntityResolver): given the current implementation, EntityResolver cannot be 
implemented by a standalone object (it needs another interface to assign 
the pointer to either a parser, scanner or reader manager object), but it 
is used in this case as it is freely usable to any parser object.

So, what is the solution? Fix the SAX interface....
I have changed EntityResolver to receive another parameter, specifying the 
name of the entity currently being parsed.

I see now that, on Jan 30, the people working on the SAX interfaces 
realized the existence of this use case, and, in the SAX2 Extensions 1.1 
(beta1), they changed the signature of the EntityResolver2::resolveEntity 
function to include the URI and the name of the current file (see 
http://sax.sourceforge.net/apidoc/org/xml/sax/ext/EntityResolver2.html )

Concluding this my long e-mail (I hope my english was readable...), my 
final question is: do you plan to add EntityResolver2 to Xerces any time soon?

P.S. My original intention was to provide my patched sources of 
EntityResolver.hpp, but now they would be non-standard...

Thanks,
Alberto
------------------------------

-------------------------------
Alberto Massari
eXcelon Corp.
http://www.StylusStudio.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to