On 24/07/2003 05:31, [EMAIL PROTECTED] wrote:

One thought: Ken has suggested CGJ be used to prevent reordering of
combining marks in fixed position classes such as the Hebrew vowels, and
also suggested that users should not need to be aware of the need for CGJ
for this purpose but that software can be implemented in a way that hides
that detail. I'm not sure how that will work, but it's making me wonder if
effectively we'd be looking at some amendment to the normalization
algorithms to insert CGJ in certain enumerated contexts.


- Peter



--------------------------------------------------------------------------- Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485







So you mean, for example, that patah - hiriq normalises not to hiriq - patah but to patah - CGJ - patah? This certainly looks like an interesting idea.

As hiriq - patah remains a valid normalised form, normalisation stability is not compromised. I wonder if it might violate the following extract from the stability policy because CGJ is not a valid character in Unicode 3.1, and so introducing it into the string during normalisation means that the string is not valid in Unicode 3.1:

If a string contains only characters from a given version* of the Unicode Standard (e.g., Unicode 3.1.1), and it is put into a normalized form in accordance with that version of Unicode, then it will be in normalized form according to any past or future versions of Unicode.

But the problem is that this paragraph is self-contradictory, or else it implies that no characters may be added to Unicode. For take any string containing only characters from Unicode 4.0, some of which are new in Unicode 4.0, and normalise it according to Unicode 4.0. This string will not be normalised according to versions of Unicode before 4.0 because it includes characters not defined in those previous versions.

--
Peter Kirk
[EMAIL PROTECTED]
http://web.onetel.net.uk/~peterkirk/





Reply via email to