OK. I have a patch ready to commit that does away with TextWriter and 
StringSerializer, and uses Xerces to serialize.

However, the serializer in xerces 2 is a bit over-zealous and inserts namespace 
binding nodes for any what it considers to be "undeclared" namespaces... And 
Xindice's DOM implementation's representation of namespace binding nodes are 
unfortunately not compatible with Xerces' idea of such nodes. (The real problem 
here is that DOM doesn't foresee them, and so everyone has more or less their 
own idea for adding "attributes" representing namespace binding nodes...)

Falling back to Xerces 1.4.4 fixes the problem (as 1.4.4 serializes only what 
is explicitly present in the DOM). I can explain the details for those 
interested if needed, but I want to keep this mail relatively short. 
Bottom-line: my changes work only with the older Xerces 1.4.4, making them 
largely undesirable (as one goal was to be independent of Xerces/XML parser 
version).

On the plus side, this does allow me to introduce the notion of an output 
encoding selection switch in the command-line tools for retrieving and 
exporting documents in encodings other than UTF-8.

For example, the command

   xindiceadmin rd -c xmldb:xindice-rpc://localhost:4080/db/mycollection -n 
somedoc.xml
      -f mylocalfile.xml -z iso-8859-1

will write the result in latin 1, not utf-8. If the document contains 
characters not representable in Latin-1, the serializer even writes escape 
sequences (&#xZZZZ;) for them! Pretty cool :)

Anyway, in view of the possible problems, again I'm wondering whether to go 
ahead and commit, or maybe have a more serious think about all this first?

James

Reply via email to