Re: What are the issues in having U+FB06 fold to U+FB05?

Mark Davis ☕ Wed, 06 Jul 2011 13:50:18 -0700

Mark
*— Il meglio è l’inimico del bene —*


On Sat, Jun 11, 2011 at 08:04, Karl Williamson <[email protected]>wrote:

> On 06/08/2011 03:33 PM, Mark Davis ☕ wrote:
>
>> As to the first, it would seem reasonable. The simple folding is not
>> covered by the following stability policies:
>>
>> http://www.unicode.org/**policies/stability_policy.**html#Case_Folding<http://www.unicode.org/policies/stability_policy.html#Case_Folding>
>> http://www.unicode.org/**policies/stability_policy.**html#Case_Pair<http://www.unicode.org/policies/stability_policy.html#Case_Pair>
>>
>> However, the committee may be leery of changing these even though they
>> are not covered by those policies. You can file a request form for the
>> committee to consider it, at 
>> http://unicode.org/reporting.**html<http://unicode.org/reporting.html>
>>
>> The other two are special cases; they casefold together because of the
>> way that the full case mapping is computed. Their equivalence is
>> normally captured by a canonical-equivalent folding. Because the simple
>> folding is only codepoint by codepoint, and only resulting in single
>> code points, they can't be added.
>>
>>  I didn't understand the sentence above.  But would it be fair to say that
> a plausible case could be made for FB06 folding to FB05 simply, but that
> there really shouldn't be a simple fold for the other two cases?
>

Yes, that's what I mean. You can propose all three if you want, via the
reporting form, but I think only #1 is a real possibility (IMO).


>
>  Mark
>>
>> /— Il meglio è l’inimico del bene —/
>>
>>
>> On Sun, Jun 5, 2011 at 08:17, Karl Williamson <[email protected]
>> <mailto:public@khwilliamson.**com <[email protected]>>> wrote:
>>
>>    There are three pairs of characters in Unicode 6.0 in which each
>>    member of the pair has a full fold to the same sequence, yet there
>>    is no simple fold relation between them.  They are:
>>
>>    U+FB05 LATIN SMALL LIGATURE LONG S T and
>>    U+FB06 LATIN SMALL LIGATURE ST
>>    both fold to 'st';
>>
>>    U+0390 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
>>    U+1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
>>    both fold to the sequence "U+03B9 U+0308 U+0301" or (the dot
>>    standing for concatenation)
>>    GREEK SMALL LETTER IOTA . COMBINING DIAERESIS . COMBINING ACUTE ACCENT
>>
>>    U+03B0 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
>>    U+1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
>>    both fold to the sequence "U+03C5 U+0308 U+0301" or
>>    GREEK SMALL LETTER UPSILON . COMBINING DIAERESIS . COMBINING ACUTE
>>    ACCENT
>>
>>    Under full case folding rules, each member of one of these pairs is
>>    caselessly equivalent to the other member, even without adding NFD
>>    rules.  Correct me if I'm wrong, but shouldn't they also be
>>    caselessly equivalent under simple folding rules?  If so, I'm
>>    wondering what issues there would be in creating an S rule for these
>>    pairs in CaseFolding.txt, so that they would be considered
>>    caselessly equivalent even for applications that don't do full case
>>    folding?
>>
>>
>>
>>
>>
>>
>>
>
>

Re: What are the issues in having U+FB06 fold to U+FB05?

Reply via email to