[Inadvertently sent just to me; forwarded with Philippe's permission]

On Wednesday, July 02, 2003 7:03 AM, [EMAIL PROTECTED] 
<[EMAIL PROTECTED]> wrote:
> Philippe Verdy wrote on 06/28/2003 02:48:01 AM:
> 
> > If the user strikes the two keys <patah> and <hiriq>, the input
> > method for Traditional Hebrew will generate <patah,CGJ,hiriq>
> 
> That requires* an input method that is aware of the input context (or
> of what has already been input -- but awareness of context is far more
> reliable).

Not necessarily: the keyboard driver may return host-specific PUA for the 
vowels, and these will be mapped visually to render them with CGJ on the 
display interface, and the edited file can then be saved to standard 
Unicode by remapping them to the standard Unicode sequences, and an editor 
aware of this use of CGJ can also recreate these vowels by remapping 
<CGJ+hebrew vowel> to a single PUA during the edition, as this facilitates 
the internal implementation of character selection and string 
search/replace operations.

Yes it requires some knowledge of this particular encoding in the editor, 
but it's not impossible. So in Traditional Hebrew mode, the vowel 
keystrokes could either be returned all with <CGJ+vowel> codepoints (not 
<vowel+CGJ> as it would be incorrect), or as PUA if this facilitates the 
implementation (notably for mouse selection), and unnecessary extra CGJ 
codepoints can easily be removed when saving the file.

An alternative method may also be to use a single PUA instead of CGJ in 
the edited text, if one wants to preserve CGJ codepoints present in the 
input stream. This PUA would be mapped by the editor as meaning: "don't 
reorder the following combining character when serializing the text, so 
that the following combining character will keep its relative order after 
normalization", and it could then be completely language neutral.



Reply via email to