Atomicity of Grapheme Clusters

Richard Wordingham via Unicode Wed, 13 Dec 2017 10:41:04 -0800

I have been reviewing UAX#29 Unicode Text Segmentation because I have a
feeling we will be trying to do too much with the concept of grapheme
clusters, even with tailoring, when we extend it to include whole
aksharas.


What is the meaning of "Word boundaries, line boundaries, and sentence
boundaries should not occur within a grapheme cluster: in other words,
a grapheme cluster should be an atomic unit with respect to the process
of determining these other boundaries"?  In particular, whom is it
directed to?

Now, once quadrate support is added and we are able to write Ancient
Egyptian in Unicode, we will probably have two very significant
languages that regularly breach parts of that rule.  (At least, I
assume a whole Egyptian quadrate would be included in a dropped
capital.) Sanskrit word boundaries frequently occur within *legacy*
grapheme clusters, and sentence boundaries may occur within quadrates.
I presume UAX#29 does not intend that we should use means other than
Unicode to write samhita Sanskrit and Ancient Egyptian.

Richard.

Atomicity of Grapheme Clusters

Reply via email to