Hi,
> -----Ursprüngliche Nachricht-----
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
> Auftrag von John Navratil
> Gesendet: Dienstag, 25. April 2006 21:50
> An: [email protected]
> Betreff: [xml] xmllint - Newbie THINKS there may be a
> whitespace error in2.6.23
>
> Greetings,
>
> Using xmllint to validate a document thusly:
>
> xmllint --schema test.xsd test.xml
>
> with schema (test.xsd):
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
> elementFormDefault="qualified" attributeFormDefault="unqualified">
> <xs:element name="A">
> <xs:annotation>
> <xs:documentation>asdf</xs:documentation>
> </xs:annotation>
> <xs:complexType>
> <xs:sequence>
> <xs:element name="B">
> <xs:complexType>
> <xs:attribute name="ID" type="xs:string" use="required"/>
> </xs:complexType>
> </xs:element>
> </xs:sequence>
> </xs:complexType>
> </xs:element>
> </xs:schema>
>
> and document (test.xml):
>
> <A>
> <B ID="1">
> </B>
> </A>
>
> I get the error:
>
> test.xml:2: element B: Schemas validity error : Element 'B':
> Character
> content is not allowed, because the content type is empty.
>
> I thought that --noblanks would strip the whitespace and
> eliminate the
> error, but find instead that I must modify the document to:
>
> <A>
> <B ID="1" />
> </A>
>
> Is this behavior correct? I observe it in 2.6.22 and 2.6.23
> on Fedora Core 4 and 5.
Yes, this behaviour is correct: there must not be any character
content inside the element "B" and, as Daniel said, the --noblanks
option won't remove such whitespace-only text-nodes. --noblanks will
remove whitespace-only text-nodes when you have mixed content;
i.e., when an element has character content *and* element content.
That's why the whitespace after "<A>" and before "</A>" is removed
in Daniel's example:
"
paphio:~/XML -> xmllint --noblanks test.xml
<?xml version="1.0"?>
<A><B>
</B></A>
"
When there's no mixed content, any whitespace is considered
significant by the --noblanks option; I think, that this assumption
could be based on the understatement that noone writes...
<B>
</B>
... if he doesn't want those space characters. You can write instead:
<B/> or
<B></B> or
<B><!-- No.1 the larch --></B> or
<B><?slide No.1 the larch ?></B>
All four cases of the element "B" have no content from
the viewpoint of W3C XML Schema.
For easier reading of the XML document by humans, people start a new
line for every new tag and indent subsequent tags. So the reason, I
think, why there's such a thing as a --noblanks option at all, is
to accommodate this pretty-printing issue by removing such
whitespace-only text nodes, since they are most likely not intended
to be part of the data.
So this:
<A>
<B/>
</A>
will be stripped to:
<A><B/></A>
However, we have also the mechanism of xml:space which could be
used to exactly define what is to be stripped and what not.
So if we had an option like --noblanksall, which would remove
*all* whitespace-only text-nodes, then you could use xsl:space
to specify where whitespace should be preserved.
Example:
<A>
<B> </B>
<C xml:space="preserve"> <D> </D> </C>
</A>
this would be whitespace-stripped with a
--noblanksall option (this option does not exist) to:
<A><B/><C xml:space="preserve"> <D> </D> </C></A>
> If I remove the required attribute ("ID") from the schema
> and the document, this behavior is not observed.
Check again please; I cannot reproduce this here.
Regards,
Kasimier
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml