I think that the parser is correct. The example given is of two NMTOKEN
values separated by char refs that resolve to new lines. Normalization
happens after character ref expansion, and the normalization indicates that
all whitespace should be reduced to a single space.

And besides, there are LOTS of attribute normalization tests in the various
test suites and the parser doesn't have any failures on that stuff, so I'm
relatively confident that (unless something has changed for the worse
recently) its working correctly.

So this:

<normNames attr="A&#xa;&#xa;&#xa;B"/>

Becomes:

<normNames attr='A\r\r\rB"/>

after the char refs are expanded. And then the whitespace is folded down to
single spaces, which leaves:

<normNames attr="A B"/>

--------------
Dean Roddey
Software Geek Extraordinaire
Portal, Inc
[EMAIL PROTECTED]



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 05, 2001 10:58 AM
To: [EMAIL PROTECTED]
Subject: [Bug 1236] New - incorrect NMTOKENS attribute normalization


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1236

*** shadow/1236 Thu Apr  5 10:57:56 2001
--- shadow/1236.tmp.11194       Thu Apr  5 10:57:56 2001
***************
*** 0 ****
--- 1,57 ----
+
+===========================================================================
=+
+ | incorrect NMTOKENS attribute normalization
|
+
+---------------------------------------------------------------------------
-+
+ |        Bug #: 1236                        Product: Xerces-C
|
+ |       Status: NEW                         Version: 1.4
|
+ |   Resolution:                            Platform: PC
|
+ |     Severity: Critical                 OS/Version:
|
+ |     Priority:                           Component: Non-Validating Parser
|
+
+---------------------------------------------------------------------------
-+
+ |  Assigned To: [EMAIL PROTECTED]
|
+ |  Reported By: [EMAIL PROTECTED]
|
+
+---------------------------------------------------------------------------
-+
+ |          URL:
|
+
+===========================================================================
=+
+ |                              DESCRIPTION
|
+ Xerces 1.4 generates incorrect output for 
+ Normalization of Attribute that are NMTOKENS.
+ The attribute value stripped out too much character
+ reference. (re: XML Specification 1.0 section 3.3.3
+ Attribute-value normalization)
+ 
+ I compiled DOMPrint example with 
+ Xerces 1.4 using MSDev 6.0 on Windows NT:
+ 
+ using the following test case attr.xml:
+ <!DOCTYPE normNames [
+ <!ELEMENT normNames EMPTY>
+ <!ATTLIST normNames attr NMTOKENS #IMPLIED>
+ ]>
+ <normNames attr="A&#xa;&#xa;&#xa;B"/>
+ 
+ I got the following output:
+ <!DOCTYPE normNames [
+ <!ELEMENT normNames EMPTY>
+ <!ATTLIST normNames attr NMTOKENS #IMPLIED>
+ ]>
+ <normNames attr="A B"/>
+ 
+ But the expected output according to the XML Specification
+ is
+ <!DOCTYPE normNames [
+ <!ELEMENT normNames EMPTY>
+ <!ATTLIST normNames attr NMTOKENS #IMPLIED>
+ ]>
+ <normNames attr="A #A #A #A B"/>
+ 
+ In fact, Xerces 1.4 does not seem to generate the 
+ correct output for the last two examples in section 
+ 3.3.3 of XML Specification 1.0.   The last two 
+ examples are:
+ 
+ * a="&d;&d;A&a;&a;B&da;"
+ * a="&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"
+ 
+ thanks!
+ 
+ --Michele

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to