On Sat, 27 Jan 2018 14:13:40 -0800 Shervin Afshar <[email protected]> wrote:
> On Mon, Jan 22, 2018 at 2:08 PM, Richard Wordingham via Unicode < > [email protected]> wrote: > > On Mon, 22 Jan 2018 at 16:39:57, Andre Schappo via Unicode < > > [email protected]> wrote: > > > By way of example, one programming challenge I set to students a > > > couple of weeks ago involves diacritics. Please see > > > jsfiddle.net/coas/wda45gLp<https://jsfiddle.net/coas/wda45gLp/> > > Did any of them come up with the idea of using traces instead of > > strings? > Care to elaborate? Are you referring to sequence alignment methods? No, I'm thinking of the trace monoid (see e.g. https://en.wikipedia.org/wiki/Trace_monoid). One way of thinking of strings is as concatenations of the NFD decompositions of their constituent characters. Then the canonical equivalence classes of these strings form the trace monoid of indecomposable characters. The theory of regular expressions (though you may not think that mathematical regular expressions matter) extends to trace monoids, with the disturbing exception that the Kleene star of a regular language is not necessarily regular. (The prototypical example is sequences (xy)^n where x and y are distinct and commute, i.e. xy and yx are canonically equivalent in Unicode terms. A Unicode example is the set of strings composed only of U+0F73 TIBETAN VOWEL SIGN II - there is no FSM that will recognise canonically equivalent strings). One consequence of this view is that one has to think of U+1EAD LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW (ậ) beinɡ both composed of the Vietnamese vowel letter U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX (â) and tone mark U+0323 COMBINING DOT BELOW and also composed of, in the spirit of Thai ISO 11940 transliteration, of the transliterated Thai vowel U+1EA1 LATIN SMALL LETTER A WITH DOT BELOW (ạ), corresponding to U+0E31 THAI CHARACTER MAI HAN-AKAT, and the tone mark U+0302 COMBINING CIRCUMFLEX ACCENT, corresponding to U+0E49 THAI CHARACTER MAI THO. (In ISO 11940 as specified, the tone mark is actually written on the immediately preceding consonant, not on the vowel.) Richard.

