Hi all,

 

I’ve found 3 problems with xmlTextReader, used from Python. I provide my code and a test example, so as to reproduce them or discard them since maybe I misused the API.

 

Some brief context: I’m interested in processing a XML file in “semi-streaming” mode: the input XML is copied without change to the output except for a series of sub-trees (identified for instance by their node name, e.g. the <PAGE> nodes), which I want to process in DOM using the expand method of the xmlTextReader API. Sounds nice, but copying isn’t so easy in fact.

 

 

The (little) problems:

Pb 1 -  how to process the XML declaration , e.g. <?xml version="1.0"?>

Pb 2 – the QuoteChar() method seems to always return “ even if a ‘ was used to enclose an attribute, e.g. a=’123’

Pb 3 – in text node and attribute values, entities are strangely dealt with by the Value() method: for instance a &amp; becomes a & in the returned string

            Actually a rdr.CurrentDoc().encodeEntitiesReentrant(rdr.Value()) gives a correct output, so it’s even more strange to me

 

Those problems are visible using the attached xmldump.py code below which simply copies its input to its output. A test file is also there.

 

Thanks for your help/comments,

 

JL

 

 

Attachment: xmldump.py
Description: xmldump.py

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE NEWSPAPER [ 

<!ELEMENT NEWSPAPER (ARTICLE+)>
<!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)>
<!ELEMENT HEADLINE (#PCDATA)>
<!ELEMENT BYLINE (#PCDATA)>
<!ELEMENT LEAD (#PCDATA)>
<!ELEMENT BODY (#PCDATA)>
<!ELEMENT NOTES (#PCDATA)> 

<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED>
<!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>
<!ATTLIST ARTICLE DATE CDATA #IMPLIED>
<!ATTLIST ARTICLE EDITION CDATA #IMPLIED>

<!ENTITY NEWSPAPER "Vervet Logic Times">
<!ENTITY PUBLISHER "Vervet Logic Press">
<!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">


]>

<NEWSPAPER AUTHOR="some&amp;one" EDITOR="pas moi">
  <!-- this is my comment -->
  <![CDATA[ Let's write whatever we want: & < > " ' %    ]]>
  <?test of a processing &amp; instructions ?>

  <elt n="1">premier  &NEWSPAPER; element du &#x44; &#x45; document</elt>
  <elt n="2">deuxieme element du &amp; document</elt>
  <elt n="3" vide="true!"/>
</NEWSPAPER>
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to