Mark On Fri, Jan 13, 2017 at 7:19 PM, Richard Wordingham < richard.wording...@ntlworld.com> wrote:
> On Fri, 13 Jan 2017 10:38:30 +0100 > Mark Davis ☕️ <m...@macchiato.com> wrote: > > > On Thu, Jan 12, 2017 at 10:26 PM, Richard Wordingham < > > richard.wording...@ntlworld.com> wrote: > > > > Using Script_Extensions to document the international > > > combining characters that are used, for example, with Thai bases > > > could have all sorts of undesirable knock-on effects. > > > If you know of combining marks whose scx values should include Thai, > > please let us know. > > If you refer to the end of TUS 9.0 Section 16.1 you will find mention > of U+0331 COMBINING MACRON BELOW and U+0303 COMBINING TILDE, which are > thus candidates for scx ∍ Latn. One might also consider U+0359 > COMBINING ASTERISK BELOW; I have seen the combination ช͙ <U+0E0A THAI > CHARACTER CHO CHANG, U+0359> used in a phonetic symbol for English, > representing [ʒ]. > > As their scx values are 'Inherited', should their values not be treated > as though they already included Thai? I suppose, though, that they > do not in fact match "p(scx=Thai)". There does seem to be a view that > scx=inherited is shorthand for some list of European scripts. > The distinction between sc=inherited and sc=common is an unfortunate one, a remnant from when we first added the script data. The distinction for a character C is purely derivable from whether gc(C) ∈ [[:mn:][:me:]] or not, so it is of little value — and with the advantage of hindsight, mostly just gets in the way. scx=inherited is *not* a shorthand for some list of European scripts. Rather, C ∈ [ [: scx=inherited:] [: scx=inherited:] ] means that either 1. we don't have enough information about usage to be able to list the scripts that C is used with, or 2. C can be used with so many scripts that it is not particularly productive to list them all. > Richard. > >