Kenneth Whistler wrote: > Yes, it is true. All the standard *mandates* is what I quoted > previously in this thread: > > "C12a When a process interprets a code unit sequence which purports > to be in a Unicode character encoding form, it shall treat > ill-formed code unit sequences as an error condition, and > shall not interpret such sequences as characters." > > > Is it ok then, if I detect an unpaired surrogate, mutter > > "oops I have an error" and then drop that surrogate and continue processing > > the file, resulting in a valid utf-8 file? > > Hmm, I think you may be mixing the UTF-16 case with the UTF-8 > case, but...
Ken, thanks for the reply. I thought at some point along the way this thread was discussing utf-16 to utf-8 conversion, which is where I was coming from. (Must've glommed some threads or even some lists together.) I certainly agree that reporting an error is the right design. However, there is software out there that didn't anticipate an error could be generated during the conversion. With the advent of surrogates and the clarification of how UTF-8 is to be generated for surrogates, it becomes an issue, but can be difficult to address when the upper layers aren't prepared for it. Anyway, for some reason I thought the situation was also counter to the standard. Now I know it is just bad design. tex -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:[EMAIL PROTECTED] Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------