So you mean, for example, that patah - hiriq normalises not to hiriq - patah but to patah - CGJ - patah? This certainly looks like an interesting idea.One thought: Ken has suggested CGJ be used to prevent reordering of combining marks in fixed position classes such as the Hebrew vowels, and also suggested that users should not need to be aware of the need for CGJ for this purpose but that software can be implemented in a way that hides that detail. I'm not sure how that will work, but it's making me wonder if effectively we'd be looking at some amendment to the normalization algorithms to insert CGJ in certain enumerated contexts.
- Peter
--------------------------------------------------------------------------- Peter Constable
Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485
As hiriq - patah remains a valid normalised form, normalisation stability is not compromised. I wonder if it might violate the following extract from the stability policy because CGJ is not a valid character in Unicode 3.1, and so introducing it into the string during normalisation means that the string is not valid in Unicode 3.1:
If a string contains only characters from a given version* of the Unicode Standard (e.g., Unicode 3.1.1), and it is put into a normalized form in accordance with that version of Unicode, then it will be in normalized form according to any past or future versions of Unicode.
But the problem is that this paragraph is self-contradictory, or else it implies that no characters may be added to Unicode. For take any string containing only characters from Unicode 4.0, some of which are new in Unicode 4.0, and normalise it according to Unicode 4.0. This string will not be normalised according to versions of Unicode before 4.0 because it includes characters not defined in those previous versions.
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/