Hello,

I should probably first issue the disclaimer that as of a few days ago I did not know any details about XML, nor had even heard of Xerces. I have however been able to very quickly integrate Xerces-C++ into my application and get some basic XML functionality working using the DOM API.

I obtained unexpected results when setting FormatPrettyPrint to serialize a document that was created from scratch within my application. Quick examination of the DomWriter implementation and searching through the archives of the xerces-c-dev list confirmed that there was no user error and it was functioning as designed. So this evening I modified DomWriter to format its output "Pretty".

The fact that I was able to get basic XML working within my application, and even add some functionality in a matter of a couple days is a testament to everyone that has worked on this project - I was very impressed at how easy it was to write code based from the provided samples and even edit the source. Everything is very well organized and documented.

My implementation of PrettyPrint seems to work with some random XML files I was able to find. But I will not begin to suggest it is a complete or working solution since my knowledge of XML is minimal.

I came up with a few rules, added to DOMWriterImpl::processNode() which seem to do the trick when PrettyPrint is enabled:

1) All text nodes that contain ONLY whitespace are ignored

2) Each tag begins on a new line, indented a variable amount based on its level. A level is defined as how many generations removed from the root element it is.

3) Closing tags for Element nodes are printed on the same line as the opening if no newlines have been output as the result of any children. Otherwise closing tags are printed on a newline indented the same level as the opening tag.

4) An empty newline is printed just before the tag for each child of the root node.


Currently I have the amount of indenting to be hard coded to two blank spaces per level. This should be user configurable in a final implementation.

Now my concern is that rule #1 may not fly. I do not know enough about XML to know if that will incorrectly ignore some valid data. From all the XML samples I could find, the only time that a text node contained only whitespace was when it was in between an element's close tag and the next element's open tag, thus providing a readable format. I decided that it is best to ignore all existing formatting when FormatPrettyPrint is enabled as any attempt to combine the two would be too complex and create an unpredictable output.

Rules 2, 3, and 4 are just my own preference in what I think looks good, and they were very easy to implement.

I do not know if anyone was working on this but the following thread seemed to indicate it was not, as the only more recent discussions were people indicating that FormatPrettyPrint produced unexpected results.
http://marc.theaimsgroup.com/?l=xerces-c-dev&m=102760381301304&w=2


I would like to hear any comments on the above. And would also not mind receiving some sample XML to run through DomWriter to see if it handles it with FormatPrettyPrint on. I am more than willing to share any of these changes, and add to them any oversights that I had.

-Kevin King


Sample output using the "personal.xml" file provided in the samples. I removed 3 of the users for a briefer sample:
"domprint.exe -wfpp=on personal.xml"

<?xml version="1.0" encoding="iso-8859-1" standalone="no" ?>
<!DOCTYPE personnel>
<!-- @version: -->


<personnel>
<person id="Big.Boss">
<name>
<family>Mr Boss</family>
<given>Big</given>
</name>
<email>[EMAIL PROTECTED]</email>
<link subordinates="one.worker two.worker"/>
</person>

<person id="one.worker">
<name>
<family>Worker</family>
<given>One</given>
</name>
<email>[EMAIL PROTECTED]</email>
<link manager="Big.Boss"/>
</person>

<person id="two.worker">
<name>
<family>Worker</family>
<given>Two</given>
</name>
<email>[EMAIL PROTECTED]</email>
<link manager="Big.Boss"/>
</person>

</personnel>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to