Peter Kirk pointed out: > This is not some kind of unusual orthography but a > specialist scientific notation. It is the same notation as h1, h2, h3 or > ha, hb, hc etc (the second character subscripted in each case) used in > all kinds of notational conventions but primarily mathematical and > scientific ones. Some lingustics textbooks are full of this kind of > notation. For an example chosen almost at random, I found the following > in an old paper by Kenneth Pike (in Ruth M. Brend ed. "Advances in > Tagmemics", North-Holland 1974, p.238): > > (2) eMk = eaTCaf, eaTCgf, egTCaf, egTCgf > > where all the lower case letters are subscripted, and examples of this > in which the word "catch" is followed by subscript af or gf.
And I agree with him. The Indo-Europeanist usage is just a very restricted subset that fades off, even in historical linguistic usage, to general conventions of mathematical and logical formulation to express relationships of various sorts. Another example in a paper about morphological analysis which clearly involves mathematical formulations: http://www-2.cs.cmu.edu/~alavie/Sem-MT-wshp/ltai+Segal_paper.pdf You could start down the road of thinking that the formulations of T<sub>1</sub>, T<sub>2</sub>, and so on should just use the compatibility subscript digits in Unicode. Then you hit T<sub>i</sub>. Is that actually U+1D62 LATIN SUBSCRIPT SMALL LETTER I or just a subscripted U+0069? And then you clearly run out of gas when you hit: t<sub>nm<sub>n</sub></sub> with recursive subscripting. > My point here is that if we once start on encoding subscript letters > used in specialist scientific notation, there is no easy place to stop. > Either we need to accept the principle that subscripts are encodable and > set aside space for a whole alphabet of them (and an upper case alphabet > and a Greek alphabet as well, plus punctuation); or else we need to say > from the start that these things are not plain text and should not be > encoded in Unicode. It may be reasonable for Michael to argue for the subscript a, e, and o for Indo-European, since he already got a subscript i and u encoded for the UPA. Arguably, the subscript a, e, and o *are* phonetic modifier letters, since they represent hypothesized vowel-coloring of the laryngeal symbol. The subscript x is trickier, since it is an algebraic substitution for (a ~ e ~ o), so we are skating on thin ice there, with a notation that is arguably not a phonetic modifier letter. And the subscript / is over the edge, as far as I am concerned. It clearly is introducing a generic notational convention into the realm where we are expecting only discrete modifier letters to require encoding as separate characters. And if I run into an Indo-Europeanist notation of the alternations such as: *h<sub>1/3</sub> or *dhug'hH(<sub>e/o</sub>)ter what is to guarantee that I won't find alternative representations of such formulations using "~" instead of "/", for example? Do we then also need a subscript tilde to handle that? Furthermore, Michael carefully dodged the point that all of these Indo-European sources are *already* fonted, styled text. They are *not* plain text, but mix italic citations with Roman forms. Unless we are going to also head down the road of plain text italic letter clones for Indo-European, all of this material already has to be dealt with as rich text. The proposal states: "Styled text is not seen as appropriate for these; Indo-Europeanists already make use of the subscript digits, and superscript h and w and so on, already encoded. The characters proposed here are required for plain-text representation of Indo-European reconstructed material." I concur that superscript h and w and so on are o.k. -- they truly are modifier letters and appropriate in transcriptional plain text. Nobody is arguing about that point. But I think it is a mistake to be using the compatibility subscript digits for generic subscripting. Of course, I can't help it if people are already doing so, but it gets us into this conundrum of people expecting any subscripted expression to be expressible in plain text, and that is just clearly wrong -- it isn't generic or scalable. And it results in people coming back to the table asking for more of them every time some community is found making some other use of them. As Peter Kirk pointed out, this kind of use of subscripting in linguistic material is widespread. Take an example, pulled more or less at random off the web, Topics in Tiberian Biblical Hebrew Metrical Phonology and Prosodics, by Henry Churchyard (a 1999 Ph.D. dissertation). http://www.crossmyt.com/hc/linghebr/ (in case anyone wishes to check up on me) This uses conventions fairly widespread in metrical phonology, where F stands for foot, lowercase-sigma stands for syllable, and lowercase-mu stands for mora. If you examine the document, you find instances of all 3 subscripted in various combinations, in addition to the typical usage of subscripted numbers and subscripted i to indicate particular consonants and matching consonants: -C<sub>i</sub>C<sub>i</sub># So you find constructs like: [<sub>F</sub>[<sub>sigma</sub>mu<sub>sigma</sub>] [<sub>sigma</sub>mumu<sub>sigma</sub>]<sub>F</sub>] And: sigma-with-combining-breve<sub>mu</sub> to represent: "a light syllable which is not a bimoraic-trochee reduction structure head" Now, if, as Michael subsequently claimed: > Or we do what we have done so far. Encode what people have been using. Are we missing subscript-F, subscript-sigma, and subscript-mu for metrical phonologists? In case you missed it, that was a rhetorical question, and the answer to it should be no. :-) By the way, as I indicated, the case for the subscript-a, e, and o seem better to me. The above dissertation, for example, makes use of the subscript-a as a transcriptional notation for the furtive patah -- the kind of evidence that argues *for* such a character as useful for a plain text representation of linguistic transcription. --Ken