We do have xml:lang, don't we?Unforunately, it doesn't help in all cases. It's perfectly fine to write a message with xml:lang="en": "chlapec" is "boy" in slowak This is 27 grapheme clusters, but I guess most western people would count it as 28.
But the recipient would be able to apply the same rules regarding localization as the sender when counting grapheme clusters.
Let us ignore grapheme clusters for a moment and focus on XEP-0426: Have you considered Unicode normalization? Especially when a text that was originally in decomposed form is normalized to composed form. This would corrupt the code point indexes. [..] I think that due to this, XEP-0426 should specify that counting happens with the text in NFC form. Or am I missing something?I could imagine going for something like:
Yes, that definitely goes into the right direction.
Receiving or intermediary entities SHOULD not apply Unicode normalization to the text referenced from character counting.
I am not sure that you can (or that we should) put normative text that applies to intermediate hops into XEP-0426. The XEP could/should limit itself to describe normative clauses for the point end-points exchanging character counting data.
If entities apply Unicode normalization, they SHOULD update all positions, indices and lengths derived from character counting if required.
As above. I think this would need at least a discoverable disco#info feature. But even then, I doubt that this is useful in a normative form. However, it probably can not hurt to have XEP-0426 spell this out as recommendation in an informative way.
It is RECOMMENDED that entities creating the original stanzas use NFC form.
Now that is the part I really like and which I believe to be missing from XEP-0426. +1
I also suggest that the receiving side is considered. For example: "Entities that receive character counted text should normalize the counted text to Unicode Normalization Form C (NFC) [1] form prior evaluating the character indexes."
1: https://unicode.org/reports/tr15/ - Florian
OpenPGP_signature
Description: OpenPGP digital signature
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________