The following comment has been added to this issue: Author: Michael Windsor Created: Wed, 14 Jul 2004 5:35 AM Body: Having done some further investigation (and limited testing), I believe I have located the cause of this problem.
The function normalizeWhiteSpace() within SchemaValidator.cpp takes "chunks" of the input stream (it's split into these chunks elsewhere) and does its work on them. The last activity is to record whether whitespace is present at the end of the chunk by setting a boolean (fTrailing) to be true if there is. This is then used in any subsequent call to this function to establish how whitespace at the head of the next chunk should be processed. The problem is that this flag is set if there is a trailing space but is not cleared if there is not, although it is cleared when reset() or certain other functions within this class are invoked. There are only certain circumstances when this will be important because in most situations, all the text between a pair of tags will be processed as a single chunk and the flag is reset between tags. One of the reasons that a chunk may end before the start of a new tag is that an entity is used within the element and this was the case when I noticed the error. This error will be quite rare because data between two tags must be split up into at least three chunks and there must be whitespace after one but not after some subsequent chunk (which is not the last one). The fix is to add an "else" to the "if" statement at the end of the normalizeWhiteSpace() function: if (fCurReader->isWhitespace(*(srcPtr-1))) fTrailing = true; else fTrailing = false; --------------------------------------------------------------------- View this comment: http://issues.apache.org/jira/browse/XERCESC-1239?page=comments#action_36655 --------------------------------------------------------------------- View the issue: http://issues.apache.org/jira/browse/XERCESC-1239 Here is an overview of the issue: --------------------------------------------------------------------- Key: XERCESC-1239 Summary: Schema length validation error in unions Type: Bug Status: Unassigned Priority: Major Project: Xerces-C++ Components: Validating Parser (Schema) (Xerces 1.5 or up only) Versions: 2.4.0 2.5.0 Assignee: Reporter: Michael Windsor Created: Fri, 2 Jul 2004 5:24 AM Updated: Wed, 14 Jul 2004 5:35 AM Environment: Tested on released Win32 execs and new 2.5.0 exec created with VC++ 7 on WinNT SP6a and Win XP SP1 Description: In certain circumstances, schema validation fails to correctly calculate the length of a string with & (and possibly other) elements in it. The following schema and XML produce the error in the Sax2Print example, although I first noticed the error when using Xerces as a validator from within Xalan-C, so it is unlikely to be a problem with this example only. Test.xml: ========= <?xml version="1.0" encoding="UTF-8"?> <root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Test.xsd"> <flibble>curiouser & curiouser&curiouser</flibble> </root> Test.xsd: ========= <?xml version="1.0" encoding="UTF-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:element name="root"> <xs:complexType> <xs:sequence> <xs:element name="flibble"> <xs:simpleType> <xs:union memberTypes="TextString Null"/> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:simpleType name="TextString"> <xs:restriction base="xs:string"> <xs:minLength value="1" /> <xs:maxLength value="31" /> </xs:restriction> </xs:simpleType> <xs:simpleType name="Null"> <xs:restriction base="xs:string"> <xs:length value="0" /> </xs:restriction> </xs:simpleType> </xs:schema> Error message: ============== Error at file F:\My Documents\Mike\Visual Studio Projects\xerces-c-src_2_5_0\Build\Win32\VC7\Debug/Test.xml, line 3, char 60 Message: Datatype error: Type:InvalidDatatypeValueException, Message:Value 'curiouser & curiouser &curiouser' does not match any member types (of the union) . There are a few things to note: + As you can see by counting the letters, the input string should fit the first member of the union but an extra space has been put in before the second ampersand. + I have not determined the exact pattern within the string that causes this, but it seems to require two ampersands and that the second not have a space before it + I do not know if this is restricted to & or is general to any other type of escape sequence or a combination thereof (since more than one appears to be necessary. + This only happens for a union. If the schema simply provides a straight restriction on the length of the string, there is no complaint from validation. + Running Sax2Print with -s (i.e. no validation) prints the input document with the string processed correctly (i.e. the correct number of characters). It is only when the validator is switched on that the extra space is produced. This is also the case from XSLT operations within Xalan: the validator complains but if switched off, the string is output correctly to the correct length. I have spent some time trying to figure out what is going on in order to produce a patch. I will continue to do so, but at the moment, I am not having much luck. If anyone else with a better understanding of the code wants to jump in and steal my thunder, I won't be at all offended. --------------------------------------------------------------------- JIRA INFORMATION: This message is automatically generated by JIRA. If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa If you want more information on JIRA, or have a bug to report see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]