On 09/07/2004 15:40, Michael (michka) Kaplan wrote:

From: "Peter Kirk" <[EMAIL PROTECTED]>



But Kaplan is referring to something quite different, optionally
ignoring diacritics in search operations. This is indeed desirable, so
that a single search can match both Dvorak and DvoÅÃk for example, and
so that the one doing the search does not need to remember exactly which
diacritics are used in the name. And it is already covered by the
Unicode collation algorithm and default table, in which diacritics are
distinguished only at the second level and so folded by a top level only
collation.



(a) If this were true and it were the only need, then case folding would
also just be "a UCA issue", yet case folding is in the document.



I didn't say it was the only need, but it did seem to be the need you were highlighting, whereas Everson was highlighting a very different need.


And of course companies are free to use algorithms other than the UCA, but they shouldn't expect Unicode to define more than one way of doing the same thing - although to an extent there seems to be that kind of duplication between the UCA and the folding mechanism. I wonder if it would have been better to define the UCA explicitly as one or more foldings followed by a comparison operation, which might make it easier for implementers to combine Unicode standard foldings with their existing comparison mechanisms. But I don't wish to destabilise what is already defined.

...

Does diacritic folding destroy information provided by the distinctions that
diacritcs provide? Of course it does. But then again, the same can be said
of all foldings. This does not diminish their potential usefulness in
specific tasks/operations.



Agreed. It's just that I don't agree that preparing texts for typesetting (at least within my European context) is one of those specific tasks/operations for diacritic folding.


-- Peter Kirk [EMAIL PROTECTED] (personal) [EMAIL PROTECTED] (work) http://www.qaya.org/




Reply via email to