I like the more descriptive names, but I'd like to have this data available in some supplementary table available anyway, regardless of the naming scheme.
2015-12-16 16:17 GMT-02:00 Garth Wallace <gwa...@gmail.com>: > On Wed, Dec 9, 2015 at 7:55 AM, Nicolas Tranter > <n.tran...@sheffield.ac.uk> wrote: > > I comment as a western Japanologist who teaches and researches using > > hentaigana. I have published with hentaigana using image files > (resulting in > > two publisher errors) and will publish next year with hentaigana using > the > > Koin Hentaigana font (Koin変体仮名外字明朝.tte), and anticipate typesetting > > problems. I refer to the 2015 proposal L2/15-239 to include hentaigana, > > including the appended paper by Takada Tomokazu, Yada Tsutomu and Saito > > Tatsuya ('The past, present and future of Hentaigana Standardization for > > Information Interchange'). I also refer to Yada Tsutomu's support of the > > proposal ('About the inclusion of standardized codepoints for > Hentaigana', > > L2/15-318). As the names and numbering of proposed characters is an > issue I > > deal with below, I also refer to individual hentaigana in the proposal by > > their MJ-codes as used in the proposers' own websites (e.g. > > http://mojikiban.ipa.go.jp/xb164/). > > > > > > > > SELECTION: The selection is good, consisting of 286 forms, although this > > would be realised as 299 characters. The earlier 2009 proposal referred > to > > was based on the Mojikyo M113.ttf font, which has 213 hentaigana > characters > > and includes a few major basic gaps. The Koin Hentaigana font has 549 > > characters, which excluding separate forms with voicing and > 'half-voicing' > > diacritics consists of 330 hentaigana, but includes some very rare forms, > > including ones that do not occur in late period texts. > > > > > > > > The selection of 'academic' hentaigana is appropriate and lacks major > gaps. > > On the other hand, the Ministry of Justice hentaigana requirements are > ones > > that have been decided by the Ministry of Justice in 2004 for name > > registration purposes, and so, although one could argue easily with their > > 2004 decision (and I would), the fact that they are already official > means > > it is pointless to argue with their inclusion in Unicode. > > > > > > > > It's been noted that a few hentaigana are almost identical to normal > > hiragana, especially e HENTAIGANA LETTER E VARIANT 4 = MJ090017 (cf. え), > shi > > HENTAIGANA LETTER SI VARIANT 2 = MJ090072 (cf. HIRAGANA LETTER SI し) and > nu > > HENTAIGANA LETTER NU VARIANT 2 = MJ090149 (cf. HIRAGANA LETTER NU ぬ): > their > > differences are solely that the 'brush' is removed from the paper on a > > downward rather than a rightward flourish, reflecting vertical > handwriting. > > Ordinarily I would argue against including them, but since the MoJ has > > recognised them as official variants they need to be included. > > > > > > > > The decision to propose in most cases one codepoint for the hentaigana > > derived from a single Chinese character is sensible, as also is the > decision > > to allow multiple codepoints in certain cases where manuscripts use > > side-by-side significantly distinct forms derived from the same Chinese > > character and with the same value. An example of the latter is HENTAIGANA > > LETTER KA VARIANT 3 = MJ090025and KA VARIANT 4 = MJ090026, both > pronounced > > ka and both derived from the Chinese character 可, but which are routinely > > both found in the same manuscript by the same hand as if they were > separate > > graphemes from the Heian to the Meiji periods. > > > > > > > > POLYPHONY. Several hentaigana are truly polyphonous (e.g. the 子-derived > > hentaigana = ne MJ090151 or MJ090059 ko, or the 馬-derived hentaigana = me > > MJ090222 or ma MJ090205). In particular, those hentaigana derived from 无 > and > > associated with n (MJ090298, MJ090299) historically (also the source of > > HIRAGANA LETTER N ん) are also used for mu (MJ090214, MJ090215) and mo > > (MJ090224, MJ090223). Diachronically, n in native Japanese words is > usually > > derived from an earlier mu. Takada et al. includes a list of 10 kanji > > sources that this applies to in the proposed repertoire. (Strictly, this > > affects 11 hentaigana, because the proposal has two forms for 无-derived > > characters.) The proposal's solution is to assign different identifiers, > > e.g. 子 = HENTAIGANA LETTER NE VARIANT 1 and HENTAIGANA LETTER KO VARIANT > 2, > > 馬 = HENTAIGANA LETTER ME VARIANT 3 and HENTAIGANA LETTER MA VARIANT 7, > and > > the two derived from 无 = HENTAIGANA LETTER N VARIANT 1, N VARIANT 2, MU > > VARIANT 1, MU VARIANT 2, MO VARIANT 1 and MO VARIANT 2. This means that > > there would be characters that are given more than one codepoint and > > identifier but are formally and etymologically identical, adding 13 > > unnecessary repetitions to the character set. I would favour Yada's > naming > > system, where the polyphonous characters are given a single codepoint and > > identifier, e.g. 子 = HENTAIGANA LETTER NE-KO, 馬 = HENTAIGANA ME-MA, and > two > > 无-derived forms = HENTAIGANA LETTER N-MU-MO 1 and N-MU-MO 2. > > Is there a reason for sticking with the "VARIANT 1"/"VARIANT 2" naming > convention? The previous proposal was for standardized variation > sequences, so this opaque numbering made sense (since "VARIANT 1" > meant "using the first variation selector"), but the current one is to > encode them all as atomic characters. Wouldn't it be more helpful to > give them more descriptive names, possibly by identifying the > particular ideographs each is derived from? For example, instead of > HENTAIGANA LETTER E VARIANT 2, it could be HENTAIGANA LETTER E FROM > CJK-76C8. This doesn't help with same-source variants, but physical > features could work for that, e.g. > > HENTAIGANA LETTER YO VARIANT4 -> HENTAIGANA LETTER YO FROM CJK-8207 > WITH CROSSBAR > HENTAIGANA LETTER YO VARIANT5 -> HENTAIGANA LETTER YO FROM CJK-8207 WITH > LOOP > HENTAIGANA LETTER YO VARIANT6 -> HENTAIGANA LETTER YO FROM CJK-8207 WITH > ZIGZAG > > It's more verbose but it seems like it would be useful to be able to > identify which variant is which from the name instead of having to > consult the code charts (which IIRC aren't normative) or some > supplementary table. > >