> Sorry, but I have to disagree here. If a list of strings contains items > with lone surrogates (garbage), then sorting them doesn't make the > garbage go away, even if the items may be sorted in "correct" order > according to some criterion.
Well, yeah, I wasn't claiming that the principled, "correct" output made the garbage go away. Let me put it this way: if my choices are 1) garbage in, garbage reliably sorted out into garbage bin, versus 2) garbage in, sorting fails with exception, then I'll pick #1. ;-) To give a concrete example, my implementation of UCA reliably passes the SHIFTED test cases in the conformance test, even though those test cases (deliberately) contain some ill-formed strings. If I instead did validation testing on input strings in my base implementation, it would be slower, *and* to pass the conformance test I would have to add a separate preprocessing stage that probed all the input data for ill-formed strings and filtered those cases out before engaging the test, so that it wouldn't fail with an exception when it hit the bad data. --Ken