At 3:07 AM -0800 12/16/01, James Kass wrote:

>Tests run on non-BMP text show no problem for Plane One using
>UTF-8 encoding but error messages are generated when these
>characters are referenced as NCRs.
>

I suspect there's a lot of random mistakes like this waiting to be 
discovered. I recently added a Plane-1 musical symbol to a book I'm 
working on, and watched Xerces's XMLSerializer class trip over it. It 
emitted the character as two character references, one for each half 
of the surrogate pair, rather than one, thus producing malformed 
HTML. It worked when I switched to UTF-8 encoding though.

I suspect a lot of our tools haven't been thoroughly tested with 
PLane-1 and are likely to have these sorts of bugs in them.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|              http://www.ibiblio.org/xml/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/     |
+----------------------------------+---------------------------------+

Reply via email to