wrt to fitting in with spec.
If an element is defined as mixed by a DTD then you can't insert or remove white space for the purposes of being 'human readable' because if the whitespace is important then what the human reads (after pretty printing) will be different to what a machine reads.
An example (a little contrived):
<address>A person lives at <streetno>11</streetno> <street>Main Street</street></address>
<address><streetno>12</streetno> <street>Main Street</street></address>
<paragraph>Contains <bold>important</bold> information</paragraph>
<paragraph><bold>Whole paragraphs is important</bold></paragraph>
In format pretty printing the second address and second paragraph should not have whitespace inserted.
I agree that the spec is somewhat grey. However there is no implication that the transformation is allowed to make the document invalid - so should you play it safe?
wrt to the difficulty.
Agree it could be hard. However, seeing as the parser can determine the result for DOMText::getIsWhitespaceInElementContent, why can't the parser figure out if a Text node can be added with this attribute set? Is that a naive question ?;-).
I would have thought this sort of thing would have to the sorted with cononicalisation anyway ...
wrt to two features.
Feature "format-pretty-print", that plays it safe and maintains validity.
Feature "http://apache.org/xml/features/format-pretty-print-no-grammar-check" that takes a punt according to rules such as those below.
Comments:
1./
I guess the key word in the spec is 'transformation'. So as long as there are big words that say the document may be come invalid then maybe it would be OK to risk making the doucment invalid.
2./
If people do want particular formating then maybe it is not a big deal for them to iterate over the DOM and insert text nodes (ie ignoreable whitespace) as appropriate for their particular needs.
3./
Maybe having format-canonical implemented would suit people who want 'human readability' but need to maintain validity. 'Fraid I'm not up on the Canonical XML spec and/or status.
| Gareth Reakes <[EMAIL PROTECTED]>
26/11/2002 09:54 PM
|
To: [EMAIL PROTECTED] cc: Subject: Re: FormatPrettyPrint implementation |
Hi,
I don't see how we would fit that in with the spec. It states
"Formatting the output by adding whitespace to produce a pretty-printed,
indented, human-readable form. The exact form of the transformations is
not specified by this specification. Setting this feature to true will set
the feature "canonical-form" to false."
It is of course in WD so you could make a comment to the WG.
We also start to get into the world of pain that is edited
documents. My understanding of the format pretty print is that the output
may well not be valid and that this is desired. A big use case for me
is just element only documents that users want to look at.
> Maybe there needs to two features, one that only works if a DTD/Schema is
> available and one that takes a reasonable 'punt' at what is OK (such as
> the rules below) irrespective of any DTD/Schema if any?
Could you expand on this a little for me? They way I see it is
that even if you insert whitespace where mixed content is allowed you can
still print out an invalid document.
> ie if element must conform to:
> <!ELEMENT s (#PCDATA | s1 | s2)* >
>
> How would you format:
> <node>
> <s><s1/><s2>more text</s2>trailing text</s>
> <node>
The way I had been thinking (I am of course happy to be told I am
wrong :)) is that it did not really matter as long as it was consistent. I
felt that the purpose of this feature was literally to pretty print and
not to worry about validity. With the validation part of DOM Level 3 we
will be able to tell if something is valid. If it is then just printing it
out should provide us with a valid document. Deciding when it is OK to
insert whitespace and still produce a valid document sounds like it might
be quite hard.
Gareth
--
Gareth Reakes, Head of Product Development
DecisionSoft Ltd. http://www.decisionsoft.com
Office: +44 (0) 1865 203192
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
