I have been reviewing UAX#29 Unicode Text Segmentation because I have a feeling we will be trying to do too much with the concept of grapheme clusters, even with tailoring, when we extend it to include whole aksharas.
What is the meaning of "Word boundaries, line boundaries, and sentence boundaries should not occur within a grapheme cluster: in other words, a grapheme cluster should be an atomic unit with respect to the process of determining these other boundaries"? In particular, whom is it directed to? Now, once quadrate support is added and we are able to write Ancient Egyptian in Unicode, we will probably have two very significant languages that regularly breach parts of that rule. (At least, I assume a whole Egyptian quadrate would be included in a dropped capital.) Sanskrit word boundaries frequently occur within *legacy* grapheme clusters, and sentence boundaries may occur within quadrates. I presume UAX#29 does not intend that we should use means other than Unicode to write samhita Sanskrit and Ancient Egyptian. Richard.

