On Wed, 16 May 2018 17:41:12 -0500 Anshuman Pandey via Unicode <[email protected]> wrote:
> > 3. Keyboard design is more difficult because consonants like ক্ষ > > are encoded as conjunct forms instead of atomic characters. > > Ignorant question on my part: is it difficult to use character > sequences as labels for keys? I see keys for both क्ष and ज्ञ on the > iOS Hindi keyboard, and त्र is tucked away under त. It can be. It depends on the technology. Pure X seems to be the worst. At the basic level, one has a bewildering map of key plus active modifier key to a single Unicode character. (The space also include function keys.) An *application* can map keys to strings, but I know of no way of doing that to all of a user's applications, both those running and those that will run. Even the logic for dead keys has to be applied by the application, though I believe there are standard libraries that will handle that. The old method on Windows uses sets of data tables that may be termed keyboards. Populated sets are saved as DLLs, and there are limits on what they can contain. Windows' Microsoft Keyboard Layout Creator (MSKLC) is a popular tool for creating and packaging these DLLs. A key plus it modifiers can be mapped to: 1) A sequence of UTF-16 code units. The documented limit is, I believe 4, but there are reports of people being able to use 6. The four sequences listed above each constitute a sequence of 3 code units, so they can be readily accommodated. This technique may well not work for a script in the SMP, and I think one cannot use the MSKLC simply to create the DLLs storing long sequences. So here is an added layer of complexity, though not relevant to the Bengali script. 2) A key can be designated a 'dead key'. I think it has to have a fallback to a BMP character, or rather, a single UTF-16 code unit. On then pressing a key that maps to a single code unit, this is converted to another single code unit, which is the character that the combination types. The restriction is built into the data structure. There is a technique to chain dead keys, but that is not relevant to the difficulty or ease of typing ligatures. The next level up I am acquainted with is the level of input methods. Here, one types a sequence of characters on a 'simple' keyboard, and this sequence controls the derivation of characters being input to the application. Modifier keys may be available to influence this derivation. Now, some of these input methods may be unreliable, and there may be problems for users who can switch between simple keyboards, e.g. US and British, or US and Hindi. If this type of method works, then inputting sequences in response to a single keystroke is not a problem. Multiple key strokes can be a different matter, as the interface with applications may be ill-defined or broken. I have found this a problem with using the backslash key to cycle through candidate characters, and deleting SMP characters in LibreOffice has in the past resulted in the creation of lone surrogates. Now, writing these input methods can be easy. I have fairly simple input methods for inputting both true characters and sequences perceived as characters for Emacs, ibus (using KMfL) and fcitx (using M17n). However, the ibus method has been unreliable in the past, and I have fallen back to a simple X keyboard map. When I do that, I lose the ability to input sequences by a single keystroke. Richard.

