http://nagoya.apache.org/bugzilla/show_bug.cgi?id=1203

*** shadow/1203 Tue Apr  3 15:16:33 2001
--- shadow/1203.tmp.22490       Tue Apr  3 15:16:33 2001
***************
*** 0 ****
--- 1,59 ----
+ +============================================================================+
+ | Control chars as element content cause SAX fatalError event                |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 1203                        Product: Xerces-C                |
+ |       Status: NEW                         Version: 1.4                     |
+ |   Resolution:                            Platform: PC                      |
+ |     Severity: Normal                   OS/Version:                         |
+ |     Priority: Medium                    Component: Non-Validating Parser   |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: [EMAIL PROTECTED]                                  |
+ |  Reported By: [EMAIL PROTECTED]                                    |
+ +----------------------------------------------------------------------------+
+ |          URL:                                                              |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ While it may be an abomination to think of sloshing around control characters 
+ as XML element character content (shudder), my read of XML 1.0 BNF production 
+ rules 39, 43, and 15 is such that only '<', '&', and ']]>' are invalid as 
+ element character content (e.g. w/o escaping as &lt; &amp; and ]]&gt; 
+ respectively).
+ 
+ A certain software company in the Pacific Northwest has a product which seems 
+ to not mind the control characters (the control chars even render as box glyphs 
+ in said company's browser).
+ 
+ Is there a good reason why control characters are not supported in Xerces as 
+ normal character content?
+ 
+ I used debugger to step down through an found 
+ 
+ XMLReader::isXMLChar(const XMLCh toCheck)
+ {
+     return ((fgCharCharsTable[toCheck] & gXMLCharMask) != 0);
+ } 
+ 
+ which leds one to
+ 
+ const XMLByte XMLReader::fgCharCharsTable[0x10000] =
+ {     0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xD8, 0xD0, 0x00, 
+ 0x00, 0xD0, 0x00, 0x00
+     , 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
+ 0x00, 0x00, 0x00, 0x00
+ 
+ So other than whitespace controls, it looks like all the other controls are 
+ treated as "non characters" per the 
+ 
+ // Masks for the fgCharCharsTable array
+ const XMLByte   gBaseCharMask               = 0x1;
+ const XMLByte   gSpecialCharDataMask        = 0x2;
+ const XMLByte   gNameCharMask               = 0x4;
+ const XMLByte   gPlainContentCharMask       = 0x8;
+ const XMLByte   gSpecialStartTagCharMask    = 0x10;
+ const XMLByte   gLetterCharMask             = 0x20;
+ const XMLByte   gXMLCharMask                = 0x40;
+ const XMLByte   gWhitespaceCharMask         = 0x80;
+ 
+ It looks as though adding 0x40 to each character in the first two sticks would 
+ do the trick, but I have no idea on how big the impact would be elsewhere 
+ throughout xerces-c.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to