OK. I have a patch ready to commit that does away with TextWriter and
StringSerializer, and uses Xerces to serialize.
However, the serializer in xerces 2 is a bit over-zealous and inserts namespace
binding nodes for any what it considers to be "undeclared" namespaces... And
Xindice's DOM implementation's representation of namespace binding nodes are
unfortunately not compatible with Xerces' idea of such nodes. (The real problem
here is that DOM doesn't foresee them, and so everyone has more or less their
own idea for adding "attributes" representing namespace binding nodes...)
Falling back to Xerces 1.4.4 fixes the problem (as 1.4.4 serializes only what
is explicitly present in the DOM). I can explain the details for those
interested if needed, but I want to keep this mail relatively short.
Bottom-line: my changes work only with the older Xerces 1.4.4, making them
largely undesirable (as one goal was to be independent of Xerces/XML parser
version).
On the plus side, this does allow me to introduce the notion of an output
encoding selection switch in the command-line tools for retrieving and
exporting documents in encodings other than UTF-8.
For example, the command
xindiceadmin rd -c xmldb:xindice-rpc://localhost:4080/db/mycollection -n
somedoc.xml
-f mylocalfile.xml -z iso-8859-1
will write the result in latin 1, not utf-8. If the document contains
characters not representable in Latin-1, the serializer even writes escape
sequences (&#xZZZZ;) for them! Pretty cool :)
Anyway, in view of the possible problems, again I'm wondering whether to go
ahead and commit, or maybe have a more serious think about all this first?
James