Hi,

> -----Ursprüngliche Nachricht-----
> Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im 
> Auftrag von John Navratil
> Gesendet: Dienstag, 25. April 2006 21:50
> An: [email protected]
> Betreff: [xml] xmllint - Newbie THINKS there may be a 
> whitespace error in2.6.23
> 
> Greetings,
> 
> Using xmllint to validate a document thusly:
> 
> xmllint --schema test.xsd test.xml
> 
> with schema (test.xsd):
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
> elementFormDefault="qualified" attributeFormDefault="unqualified">
>  <xs:element name="A">
>   <xs:annotation>
>    <xs:documentation>asdf</xs:documentation>
>   </xs:annotation>
>   <xs:complexType>
>    <xs:sequence>
>     <xs:element name="B">
>      <xs:complexType>
>       <xs:attribute name="ID" type="xs:string" use="required"/>
>      </xs:complexType>
>     </xs:element>
>    </xs:sequence>
>   </xs:complexType>
>  </xs:element>
> </xs:schema>
> 
> and document (test.xml):
> 
> <A>
>  <B ID="1">
>  </B>
> </A>
> 
> I get the error:
> 
> test.xml:2: element B: Schemas validity error : Element 'B': 
> Character 
> content is not allowed, because the content type is empty.
> 
> I thought that --noblanks would strip the whitespace and 
> eliminate the 
> error, but find instead that I must modify the document to:
> 
> <A>
>  <B ID="1" />
> </A>
> 
> Is this behavior correct?  I observe it in 2.6.22 and 2.6.23 
> on Fedora Core 4 and 5. 

Yes, this behaviour is correct: there must not be any character
content inside the element "B" and, as Daniel said, the --noblanks
option won't remove such whitespace-only text-nodes. --noblanks will
remove whitespace-only text-nodes when you have mixed content;
i.e., when an element has character content *and* element content.
That's why the whitespace after "<A>" and before "</A>" is removed
in Daniel's example:
"
paphio:~/XML -> xmllint --noblanks test.xml
<?xml version="1.0"?>
<A><B>
</B></A>
"

When there's no mixed content, any whitespace is considered
significant by the --noblanks option; I think, that this assumption
could be based on the understatement that noone writes...
<B>
</B>
... if he doesn't want those space characters. You can write instead:
<B/> or 
<B></B> or 
<B><!-- No.1 the larch --></B> or
<B><?slide No.1 the larch ?></B>
All four cases of the element "B" have no content from
the viewpoint of W3C XML Schema.

For easier reading of the XML document by humans, people start a new
line for every new tag and indent subsequent tags. So the reason, I
think, why there's such a thing as a --noblanks option at all, is
to accommodate this pretty-printing issue by removing such
whitespace-only text nodes, since they are most likely not intended
to be part of the data.
So this:
<A>
  <B/>
</A>
will be stripped to:
<A><B/></A>

However, we have also the mechanism of xml:space which could be
used to exactly define what is to be stripped and what not.
So if we had an option like --noblanksall, which would remove
*all* whitespace-only text-nodes, then you could use xsl:space
to specify where whitespace should be preserved.
Example:
<A>
  <B> </B>
  <C xml:space="preserve"> <D> </D> </C>
</A>

this would be whitespace-stripped with a 
--noblanksall option (this option does not exist) to:
<A><B/><C xml:space="preserve"> <D> </D> </C></A>

> If I remove the required attribute ("ID") from the schema 
> and the document, this behavior is not observed.

Check again please; I cannot reproduce this here.

Regards,

Kasimier
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to