At 3:07 AM -0800 12/16/01, James Kass wrote: >Tests run on non-BMP text show no problem for Plane One using >UTF-8 encoding but error messages are generated when these >characters are referenced as NCRs. >
I suspect there's a lot of random mistakes like this waiting to be discovered. I recently added a Plane-1 musical symbol to a book I'm working on, and watched Xerces's XMLSerializer class trip over it. It emitted the character as two character references, one for each half of the surrogate pair, rather than one, thus producing malformed HTML. It worked when I switched to UTF-8 encoding though. I suspect a lot of our tools haven't been thoroughly tested with PLane-1 and are likely to have these sorts of bugs in them. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | [EMAIL PROTECTED] | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+