Message:

  A new issue has been created in JIRA.

---------------------------------------------------------------------
View the issue:
  http://issues.apache.org/jira/browse/XERCESC-1226

Here is an overview of the issue:
---------------------------------------------------------------------
        Key: XERCESC-1226
    Summary: Parser reports bogus content when parsing
       Type: Bug

     Status: Unassigned
   Priority: Major

    Project: Xerces-C++
 Components: 
             SAX/SAX2
   Versions:
             Nightly build (please specify the date)

   Assignee: 
   Reporter: David Bertoni

    Created: Thu, 10 Jun 2004 9:42 AM
    Updated: Thu, 10 Jun 2004 9:42 AM
Environment: All platforms

Description:
When parsing the following document, the parser reports garbage characters.

<?xml version="1.0"?> 
<subject>Research [&#x1D538;]rticle</subject>

I traced this down to this function in XMLReader, starting on line 612:

inline bool XMLReader::isPlainContentChar(const XMLCh toCheck)
{
    return ((fgCharCharsTable[toCheck] & gPlainContentCharMask) != 0);
}

Apparently, for the character "]" (U+005D RIGHT SQUARE BRACKET), the flags in 
fgCharCharsTable indicate it's not plain content.  This causes the parser to misbehave 
badly, and deliver broken character data, including unpaired low surrogates.

When I used the debugger, and returned "true" from this function, rather than false, 
the parser delivered the correct character data.


---------------------------------------------------------------------
JIRA INFORMATION:
This message is automatically generated by JIRA.

If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa

If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to