Replying to my own post, I've tracked this down to what may be a bug in 
parser.c.  In my source base, which is 2.7.8, it's line 3883.

                    /*
                     * This may look absurd but is needed to detect
                     * entities problems
                     */
                    if ((ent->etype != XML_INTERNAL_PREDEFINED_ENTITY) &&
                        (ent->content != NULL)) {
                        rep = xmlStringDecodeEntities(ctxt, ent->content,
                                                  XML_SUBSTITUTE_REF, 0, 0, 0);
                        if (rep != NULL) {
                            xmlFree(rep);
                            rep = NULL;
                        }
                    }


What is this code doing?  What "entities problems" is it avoiding?  Shouldn't 
it check ctxt->replaceEntities before replacing the entities?  Note that this 
only affects entities embedded in attribute values.

-Jonah

On Dec 1, 2010, at 2:51 PM, Jonah Petri wrote:

> Hello,
> 
> I'm trying to use the xmlreader API to receive entity nodes (un-substituted) 
> so I can do my own evaluation as I stream the document.  I need to do so 
> because the names of the entities are significant, as well as their decoded 
> values.
> 
> My XML document has an inline DTD at the top, which defines the entities I'm 
> concerned about, something like:
> 
> <?xml version="1.0" encoding="utf-8" ?>
> <!DOCTYPE Constants [
>   <!ENTITY kConstPI "3.14">
> ]>
> <Doc>
> <Thing val="&kConstPI;" />
> </Doc>
> 
> My code, boiled down, looks like:
> 
>     m_pReader= xmlReaderForMemory( pUTF8XMLData, uLength, strBaseURL, NULL, 
> XML_PARSE_NONET );
>     xmlTextReaderSetParserProp(m_pReader, XML_PARSER_SUBST_ENTITIES, 0)
>     int uResult = xmlTextReaderRead( m_pReader );
>     while (uResult == 1) {
>      
>         switch( xmlReaderTypes(xmlTextReaderNodeType(m_pReader)) ) {
>                 
>             case XML_READER_TYPE_ELEMENT:
>             case XML_READER_TYPE_END_ELEMENT:
>             case XML_READER_TYPE_TEXT:
>             case XML_READER_TYPE_ENTITY:
>             case XML_READER_TYPE_ENTITY_REFERENCE:
>             case XML_READER_TYPE_DOCUMENT:
>             case XML_READER_TYPE_NONE:
>             case XML_READER_TYPE_ATTRIBUTE:
>             case XML_READER_TYPE_CDATA:
>             case XML_READER_TYPE_PROCESSING_INSTRUCTION:
>             case XML_READER_TYPE_COMMENT:
>             case XML_READER_TYPE_DOCUMENT_TYPE:
>             case XML_READER_TYPE_DOCUMENT_FRAGMENT:
>             case XML_READER_TYPE_NOTATION:
>             case XML_READER_TYPE_WHITESPACE:
>             case XML_READER_TYPE_SIGNIFICANT_WHITESPACE:
>             case XML_READER_TYPE_END_ENTITY:
>             case XML_READER_TYPE_XML_DECLARATION:
>                 printf("found: %s type %d\n", 
> xmlTextReaderConstName(m_pReader), xmlTextReaderNodeType(m_pReader));
>                 break;
>         }
>         
>         uResult = xmlTextReaderRead(m_pReader);
>     }
> 
> I never see any ENTITY types coming through.  I must be doing something 
> wrong, as this technique is specifically called out in 
> http://xmlsoft.org/xmlreader.html, but I'm at a loss.
> 
> Any help would be appreciated!
> 
> -Jonah

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to