http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1203 *** shadow/1203 Tue Apr 3 15:16:33 2001 --- shadow/1203.tmp.22490 Tue Apr 3 15:16:33 2001 *************** *** 0 **** --- 1,59 ---- + +============================================================================+ + | Control chars as element content cause SAX fatalError event | + +----------------------------------------------------------------------------+ + | Bug #: 1203 Product: Xerces-C | + | Status: NEW Version: 1.4 | + | Resolution: Platform: PC | + | Severity: Normal OS/Version: | + | Priority: Medium Component: Non-Validating Parser | + +----------------------------------------------------------------------------+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + +----------------------------------------------------------------------------+ + | URL: | + +============================================================================+ + | DESCRIPTION | + While it may be an abomination to think of sloshing around control characters + as XML element character content (shudder), my read of XML 1.0 BNF production + rules 39, 43, and 15 is such that only '<', '&', and ']]>' are invalid as + element character content (e.g. w/o escaping as < & and ]]> + respectively). + + A certain software company in the Pacific Northwest has a product which seems + to not mind the control characters (the control chars even render as box glyphs + in said company's browser). + + Is there a good reason why control characters are not supported in Xerces as + normal character content? + + I used debugger to step down through an found + + XMLReader::isXMLChar(const XMLCh toCheck) + { + return ((fgCharCharsTable[toCheck] & gXMLCharMask) != 0); + } + + which leds one to + + const XMLByte XMLReader::fgCharCharsTable[0x10000] = + { 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xD8, 0xD0, 0x00, + 0x00, 0xD0, 0x00, 0x00 + , 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00 + + So other than whitespace controls, it looks like all the other controls are + treated as "non characters" per the + + // Masks for the fgCharCharsTable array + const XMLByte gBaseCharMask = 0x1; + const XMLByte gSpecialCharDataMask = 0x2; + const XMLByte gNameCharMask = 0x4; + const XMLByte gPlainContentCharMask = 0x8; + const XMLByte gSpecialStartTagCharMask = 0x10; + const XMLByte gLetterCharMask = 0x20; + const XMLByte gXMLCharMask = 0x40; + const XMLByte gWhitespaceCharMask = 0x80; + + It looks as though adding 0x40 to each character in the first two sticks would + do the trick, but I have no idea on how big the impact would be elsewhere + throughout xerces-c. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
