Replying to my own post, I've tracked this down to what may be a bug in parser.c. In my source base, which is 2.7.8, it's line 3883.
/*
* This may look absurd but is needed to detect
* entities problems
*/
if ((ent->etype != XML_INTERNAL_PREDEFINED_ENTITY) &&
(ent->content != NULL)) {
rep = xmlStringDecodeEntities(ctxt, ent->content,
XML_SUBSTITUTE_REF, 0, 0, 0);
if (rep != NULL) {
xmlFree(rep);
rep = NULL;
}
}
What is this code doing? What "entities problems" is it avoiding? Shouldn't
it check ctxt->replaceEntities before replacing the entities? Note that this
only affects entities embedded in attribute values.
-Jonah
On Dec 1, 2010, at 2:51 PM, Jonah Petri wrote:
> Hello,
>
> I'm trying to use the xmlreader API to receive entity nodes (un-substituted)
> so I can do my own evaluation as I stream the document. I need to do so
> because the names of the entities are significant, as well as their decoded
> values.
>
> My XML document has an inline DTD at the top, which defines the entities I'm
> concerned about, something like:
>
> <?xml version="1.0" encoding="utf-8" ?>
> <!DOCTYPE Constants [
> <!ENTITY kConstPI "3.14">
> ]>
> <Doc>
> <Thing val="&kConstPI;" />
> </Doc>
>
> My code, boiled down, looks like:
>
> m_pReader= xmlReaderForMemory( pUTF8XMLData, uLength, strBaseURL, NULL,
> XML_PARSE_NONET );
> xmlTextReaderSetParserProp(m_pReader, XML_PARSER_SUBST_ENTITIES, 0)
> int uResult = xmlTextReaderRead( m_pReader );
> while (uResult == 1) {
>
> switch( xmlReaderTypes(xmlTextReaderNodeType(m_pReader)) ) {
>
> case XML_READER_TYPE_ELEMENT:
> case XML_READER_TYPE_END_ELEMENT:
> case XML_READER_TYPE_TEXT:
> case XML_READER_TYPE_ENTITY:
> case XML_READER_TYPE_ENTITY_REFERENCE:
> case XML_READER_TYPE_DOCUMENT:
> case XML_READER_TYPE_NONE:
> case XML_READER_TYPE_ATTRIBUTE:
> case XML_READER_TYPE_CDATA:
> case XML_READER_TYPE_PROCESSING_INSTRUCTION:
> case XML_READER_TYPE_COMMENT:
> case XML_READER_TYPE_DOCUMENT_TYPE:
> case XML_READER_TYPE_DOCUMENT_FRAGMENT:
> case XML_READER_TYPE_NOTATION:
> case XML_READER_TYPE_WHITESPACE:
> case XML_READER_TYPE_SIGNIFICANT_WHITESPACE:
> case XML_READER_TYPE_END_ENTITY:
> case XML_READER_TYPE_XML_DECLARATION:
> printf("found: %s type %d\n",
> xmlTextReaderConstName(m_pReader), xmlTextReaderNodeType(m_pReader));
> break;
> }
>
> uResult = xmlTextReaderRead(m_pReader);
> }
>
> I never see any ENTITY types coming through. I must be doing something
> wrong, as this technique is specifically called out in
> http://xmlsoft.org/xmlreader.html, but I'm at a loss.
>
> Any help would be appreciated!
>
> -Jonah
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
