I'm preparing to share a spell-checker for Northern Thai in the Tai Tham script, and I'm having difficulty deciding whether to offer corrections in NFC/NFD or unnormalised.
The problem arises in closed syllables with tone marks. For example, ᨠᩥ᩠᩵ᨶ /kin/ 'smell', has two canonically equivalent encodings respecting the principle of phonetic ordering: the unnormalised <U+1A20 HIGH KA, U+1A65 SIGN I, U+1A75 TONE-1, U+1A60 SAKOT, U+1A36 NA>, which matches the glyph structure of four glyphs: <U+1A20>, <U+1A65>, <U+1A75> and <U+1A60, U+1A36>, and the NFC and NFD form <U+1A20, U+1A65, U+1A60, U+1A75, U+1A36>. The issues I see are: 1) The unnormalised form is a natural and easy form to type. To type the normalised form character by character does not come naturally, and an input method would be more complex. 2) The unnormalised form is easier for a rendering engine. HarfBuzz actually presents the font with a non-standard canonical form so that the invisible stacker, SAKOT, is reordered to before the subscrpt consonant. The USE of Microsoft would more naturally accommodate the unnormalised form, which would have a natural unit of '<halant, consonant>' as an alternative to an indivisible final consonant. The USE is not designed to respect canonical equivalence. 3) The normalised form is the form preferred for the Web, but the pressure to use it has decreased. 4) The pressure on search tools to respect canonical equivalence is now relatively low. Some editors do (e.g. LibreOffice); others don't (e.g Emacs, so far as I am aware). Therefore, the dictionary suggestions should match what the input method produces. So, should I offer normalised corrections or unnormalised corrections? Should the spell-checker accept spellings with the dispreferred state (normalised v. unnormalised)? Richard.

