|
Hi all, I’ve found 3 problems with xmlTextReader, used
from Python. I provide my code and a test example, so as to reproduce them or
discard them since maybe I misused the API. Some brief context: I’m interested in
processing a XML file in “semi-streaming” mode: the input XML is
copied without change to the output except for a series of sub-trees
(identified for instance by their node name, e.g. the <PAGE> nodes),
which I want to process in DOM using the expand method of the xmlTextReader
API. Sounds nice, but copying isn’t so easy in fact. The (little) problems: Pb 1 - how to process the XML declaration ,
e.g. <?xml version="1.0"?> Pb 2 – the QuoteChar() method seems to always
return “ even if a ‘ was used to enclose an attribute, e.g. a=’123’ Pb 3 – in text node and attribute values,
entities are strangely dealt with by the Value() method: for instance a
& becomes a & in the returned string
Actually a rdr.CurrentDoc().encodeEntitiesReentrant(rdr.Value()) gives a
correct output, so it’s even more strange to me Those problems are visible using the attached xmldump.py
code below which simply copies its input to its output. A test file is also
there. Thanks for your help/comments, JL |
xmldump.py
Description: xmldump.py
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE NEWSPAPER [
<!ELEMENT NEWSPAPER (ARTICLE+)> <!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)> <!ELEMENT HEADLINE (#PCDATA)> <!ELEMENT BYLINE (#PCDATA)> <!ELEMENT LEAD (#PCDATA)> <!ELEMENT BODY (#PCDATA)> <!ELEMENT NOTES (#PCDATA)> <!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED> <!ATTLIST ARTICLE EDITOR CDATA #IMPLIED> <!ATTLIST ARTICLE DATE CDATA #IMPLIED> <!ATTLIST ARTICLE EDITION CDATA #IMPLIED> <!ENTITY NEWSPAPER "Vervet Logic Times"> <!ENTITY PUBLISHER "Vervet Logic Press"> <!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press"> ]> <NEWSPAPER AUTHOR="some&one" EDITOR="pas moi"> <!-- this is my comment --> <![CDATA[ Let's write whatever we want: & < > " ' % ]]> <?test of a processing & instructions ?> <elt n="1">premier &NEWSPAPER; element du D E document</elt> <elt n="2">deuxieme element du & document</elt> <elt n="3" vide="true!"/> </NEWSPAPER>
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
