The following comment has been added to this issue:

     Author: Michael Windsor
    Created: Wed, 14 Jul 2004 5:35 AM
       Body:
Having done some further investigation (and limited testing), I believe I have located 
the cause of this problem.

The function normalizeWhiteSpace() within SchemaValidator.cpp takes "chunks" of the 
input stream (it's split into these chunks elsewhere) and does its work on them. The 
last activity is to record whether whitespace is present at the end of the chunk by 
setting a boolean (fTrailing) to be true if there is. This is then used in any 
subsequent call to this function to establish how whitespace at the head of the next 
chunk should be processed.

The problem is that this flag is set if there is a trailing space but is not cleared 
if there is not, although it is cleared when reset() or certain other functions within 
this class are invoked. There are only certain circumstances when this will be 
important because in most situations, all the text between a pair of tags will be 
processed as a single chunk and the flag is reset between tags. One of the reasons 
that a chunk may end before the start of a new tag is that an entity is used within 
the element and this was the case when I noticed the error.

This error will be quite rare because data between two tags must be split up into at 
least three chunks and there must be whitespace after one but not after some 
subsequent chunk (which is not the last one).

The fix is to add an "else" to the "if" statement at the end of the 
normalizeWhiteSpace() function:

if (fCurReader->isWhitespace(*(srcPtr-1)))
   fTrailing = true;
else
   fTrailing = false;
---------------------------------------------------------------------
View this comment:
  http://issues.apache.org/jira/browse/XERCESC-1239?page=comments#action_36655

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1239

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1239
    Summary: Schema length validation error in unions
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             Validating Parser (Schema) (Xerces 1.5 or up only)
   Versions:
             2.4.0
             2.5.0

   Assignee: 
   Reporter: Michael Windsor

    Created: Fri, 2 Jul 2004 5:24 AM
    Updated: Wed, 14 Jul 2004 5:35 AM
Environment: Tested on released Win32 execs and new 2.5.0 exec created with VC++ 7 on 
WinNT SP6a and Win XP SP1

Description:
In certain circumstances, schema validation fails to correctly calculate the length of 
a string with & (and possibly other) elements in it. The following schema and XML 
produce the error in the Sax2Print example, although I first noticed the error when 
using Xerces as a validator from within Xalan-C, so it is unlikely to be a problem 
with this example only.

Test.xml:
=========
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:noNamespaceSchemaLocation="Test.xsd">
        <flibble>curiouser &amp; curiouser&amp;curiouser</flibble>
</root>

Test.xsd:
=========
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; elementFormDefault="qualified" 
attributeFormDefault="unqualified">
        <xs:element name="root">
                <xs:complexType>
                        <xs:sequence>
                                <xs:element name="flibble">
                                        <xs:simpleType>
                                                <xs:union memberTypes="TextString 
Null"/>
                                        </xs:simpleType>
                                </xs:element>
                        </xs:sequence>
                </xs:complexType>
        </xs:element>
        <xs:simpleType name="TextString">
                <xs:restriction base="xs:string">
                        <xs:minLength value="1" />
                        <xs:maxLength value="31" />
                </xs:restriction>
        </xs:simpleType>
        <xs:simpleType name="Null">
                <xs:restriction base="xs:string">
                        <xs:length value="0" />
                </xs:restriction>
        </xs:simpleType>
</xs:schema>

Error message:
==============
Error at file F:\My Documents\Mike\Visual Studio 
Projects\xerces-c-src_2_5_0\Build\Win32\VC7\Debug/Test.xml, line 3, char 60
  Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value 
'curiouser & curiouser &curiouser' does not match any member types (of the union) .

There are a few things to note:
+ As you can see by counting the letters, the input string should fit the first member 
of the union but an extra space has been put in before the second ampersand.
+ I have not determined the exact pattern within the string that causes this, but it 
seems to require two ampersands and that the second not have a space before it
+ I do not know if this is restricted to &amp; or is general to any other type of 
escape sequence or a combination thereof (since more than one appears to be necessary.
+ This only happens for a union. If the schema simply provides a straight restriction 
on the length of the string, there is no complaint from validation.
+ Running Sax2Print with -s (i.e. no validation) prints the input document with the 
string processed correctly (i.e. the correct number of characters). It is only when 
the validator is switched on that the extra space is produced. This is also the case 
from XSLT operations within Xalan: the validator complains but if switched off, the 
string is output correctly to the correct length.

I have spent some time trying to figure out what is going on in order to produce a 
patch. I will continue to do so, but at the moment, I am not having much luck. If 
anyone else with a better understanding of the code wants to jump in and steal my 
thunder, I won't be at all offended.



---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to