Peter wrote: > One thought: Ken has suggested CGJ be used to prevent reordering of > combining marks in fixed position classes such as the Hebrew vowels, and > also suggested that users should not need to be aware of the need for CGJ > for this purpose but that software can be implemented in a way that hides > that detail. I'm not sure how that will work,
Details TBD, of course, but the essence of it is that you want the user experience of inserting patah + hiriq to correspond to the backing store insertion of <patah, CGJ, hiriq>, without making them explicitly have to know about or type a "CGJ" key. There are various input and editing strategies to accomplish this -- effectively the problem is similar to other needs to tuck hidden characters away in the backing store for bidirectional text. The situation for searching is a little different. While the editing tools may be smart about the Biblical Hebrew points, a typical query widget might not, so in that instance, you want a query on <patah, hiriq> to match the repository store instance of <patah, CGJ, hiriq>. Well, format controls and some other characters (including CGJ) are ordinarily supposed to be ignored for searching -- unless you have specialized tailorings for them. So the ordinary strategy would be to keep the repository normalized, and then before local comparison against the query string, strip out the CGJ for the match. The situation is more complicated if the query string doesn't use a CGJ *and* gets normalized. In that situation, you lose the distinction in order, of course, but the search strategy should be to strip out the CGJ locally and renormalize. That could result in false positive matches, of course, but at least you will find what you were looking for. > but it's making me wonder if > effectively we'd be looking at some amendment to the normalization > algorithms to insert CGJ in certain enumerated contexts. No. --Ken