Peter wrote:

> One thought: Ken has suggested CGJ be used to prevent reordering of
> combining marks in fixed position classes such as the Hebrew vowels, and
> also suggested that users should not need to be aware of the need for CGJ
> for this purpose but that software can be implemented in a way that hides
> that detail. I'm not sure how that will work, 

Details TBD, of course, but the essence of it is that you 
want the user experience of inserting patah + hiriq 
to correspond to the backing store insertion of <patah, CGJ, hiriq>,
without making them explicitly have to know about or type a "CGJ"
key. There are various input and editing strategies to accomplish
this -- effectively the problem is similar to other needs to
tuck hidden characters away in the backing store for bidirectional
text.

The situation for searching is a little different. While the
editing tools may be smart about the Biblical Hebrew points,
a typical query widget might not, so in that instance, you
want a query on <patah, hiriq> to match the repository store
instance of <patah, CGJ, hiriq>. Well, format controls and
some other characters (including CGJ) are ordinarily supposed to
be ignored for searching -- unless you have specialized tailorings
for them. So the ordinary strategy would be to keep the
repository normalized, and then before local comparison against
the query string, strip out the CGJ for the match. The
situation is more complicated if the query string doesn't
use a CGJ *and* gets normalized. In that situation, you lose
the distinction in order, of course, but the search strategy
should be to strip out the CGJ locally and renormalize. That
could result in false positive matches, of course, but at
least you will find what you were looking for.

> but it's making me wonder if
> effectively we'd be looking at some amendment to the normalization
> algorithms to insert CGJ in certain enumerated contexts.

No.

--Ken



Reply via email to