On 2013/01/08 3:27, Markus Scherer wrote:
Also, we commonly read code points from 16-bit Unicode strings, and unpaired surrogates are returned as themselves and treated as such (e.g., in collation). That would not be well-formed UTF-16, but it's generally harmless in text processing.
Things like this are called "garbage in, garbage-out" (GIGO). It may be harmless, or it may hurt you later.
Regards, Martin.