Frank Yung-Fong Tang
> As long as a product support UTF-8 and pass the test with MES-1, I can 
> pretty sure that no code in between strip off any non ISO-8859-1 
> characters, regardless they support MES-2 or MES-3.
> 
> Of course, that does not guarantee surrogate characters won't get 
> damanaged, but just as someone believe, it will be <1% of efforts for me 
> to fix it later, right? :)

As MES-1, MES-2 or MES-3 do not use any character out of the BMP, their
support not enough to test compliance and support of surrogates... You can't
assume that adding later a feature that was never tested in a previous
distribution will require <1% work.

The only way to ensure it is to start supporting some character blocks out
of the BMP (for example language tags, or musical notation, or Deseret, or
other special characters in plane 14, notably the extended variation
selectors which should be easy to support and test so that they will not
break the normal rendering of characters not currently known to use them).

I do think that adding at least the correct and fully compliant support for
variation selectors 17 to 256 would be a more definitive proof that they
aren't broken as they do need to be coded by surrogates in UTF-16, and they
do need the 4 bytes encoding in UTF-8. Just adding this small subset of
characters out of the BMP will not require you to implement all scripts of
the BMP. But at least it preserves you from later bad surprises when you'll
see that you apps need a major revision with lots of change spreaded
everywhere in its code, to make it work with non-BMP characters.


__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE!  http://www.ellaforspam.com

<<attachment: winmail.dat>>

Reply via email to