Frank Yung-Fong Tang > As long as a product support UTF-8 and pass the test with MES-1, I can > pretty sure that no code in between strip off any non ISO-8859-1 > characters, regardless they support MES-2 or MES-3. > > Of course, that does not guarantee surrogate characters won't get > damanaged, but just as someone believe, it will be <1% of efforts for me > to fix it later, right? :)
As MES-1, MES-2 or MES-3 do not use any character out of the BMP, their support not enough to test compliance and support of surrogates... You can't assume that adding later a feature that was never tested in a previous distribution will require <1% work. The only way to ensure it is to start supporting some character blocks out of the BMP (for example language tags, or musical notation, or Deseret, or other special characters in plane 14, notably the extended variation selectors which should be easy to support and test so that they will not break the normal rendering of characters not currently known to use them). I do think that adding at least the correct and fully compliant support for variation selectors 17 to 256 would be a more definitive proof that they aren't broken as they do need to be coded by surrogates in UTF-16, and they do need the 4 bytes encoding in UTF-8. Just adding this small subset of characters out of the BMP will not require you to implement all scripts of the BMP. But at least it preserves you from later bad surprises when you'll see that you apps need a major revision with lots of change spreaded everywhere in its code, to make it work with non-BMP characters. __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>