Re: (SC2WG2.609) New contribution N2705

Kenneth Whistler Tue, 17 Feb 2004 17:58:59 -0800

Peter Kirk pointed out:

> This is not some kind of unusual orthography but a 
> specialist scientific notation. It is the same notation as h1, h2, h3 or 
> ha, hb, hc etc (the second character subscripted in each case) used in 
> all kinds of notational conventions but primarily mathematical and 
> scientific ones. Some lingustics textbooks are full of this kind of 
> notation. For an example chosen almost at random, I found the following 
> in an old paper by Kenneth Pike (in Ruth M. Brend ed. "Advances in 
> Tagmemics", North-Holland 1974, p.238):
> 
> (2) eMk = eaTCaf, eaTCgf, egTCaf, egTCgf
> 
> where all the lower case letters are subscripted, and examples of this 
> in which the word "catch" is followed by subscript af or gf.


And I agree with him. The Indo-Europeanist usage is just a
very restricted subset that fades off, even in historical
linguistic usage, to general conventions of mathematical
and logical formulation to express relationships of various sorts.

Another example in a paper about morphological analysis which
clearly involves mathematical formulations:

http://www-2.cs.cmu.edu/~alavie/Sem-MT-wshp/ltai+Segal_paper.pdf

You could start down the road of thinking that the formulations
of T<sub>1</sub>, T<sub>2</sub>, and so on should just use
the compatibility subscript digits in Unicode. Then you hit
T<sub>i</sub>. Is that actually U+1D62 LATIN SUBSCRIPT SMALL LETTER I
or just a subscripted U+0069? And then you clearly run out of
gas when you hit:

     t<sub>nm<sub>n</sub></sub>
     
with recursive subscripting.


> My point here is that if we once start on encoding subscript letters 
> used in specialist scientific notation, there is no easy place to stop. 
> Either we need to accept the principle that subscripts are encodable and 
> set aside space for a whole alphabet of them (and an upper case alphabet 
> and a Greek alphabet as well, plus punctuation); or else we need to say 
> from the start that these things are not plain text and should not be 
> encoded in Unicode.

It may be reasonable for Michael to argue for the subscript a, e, and
o for Indo-European, since he already got a subscript i and u encoded
for the UPA. Arguably, the subscript a, e, and o *are* phonetic
modifier letters, since they represent hypothesized vowel-coloring
of the laryngeal symbol. The subscript x is trickier, since it
is an algebraic substitution for (a ~ e ~ o), so we are skating
on thin ice there, with a notation that is arguably not a
phonetic modifier letter. And the subscript / is over the edge,
as far as I am concerned. It clearly is introducing a generic
notational convention into the realm where we are expecting only
discrete modifier letters to require encoding as separate
characters. And if I run into an Indo-Europeanist notation of
the alternations such as:

*h<sub>1/3</sub>

or

*dhug'hH(<sub>e/o</sub>)ter

what is to guarantee that I won't find alternative representations
of such formulations using "~" instead of "/", for example? Do
we then also need a subscript tilde to handle that?

Furthermore, Michael carefully dodged the point that all of these
Indo-European sources are *already* fonted, styled text. They
are *not* plain text, but mix italic citations with Roman forms.
Unless we are going to also head down the road of plain text
italic letter clones for Indo-European, all of this material already
has to be dealt with as rich text.

The proposal states:

"Styled text is not seen as appropriate for these; Indo-Europeanists
already make use of the subscript digits, and superscript h and w
and so on, already encoded. The characters proposed here are
required for plain-text representation of Indo-European reconstructed
material."

I concur that superscript h and w and so on are o.k. -- they truly
are modifier letters and appropriate in transcriptional plain
text. Nobody is arguing about that point.

But I think it is a mistake to be using the compatibility
subscript digits for generic subscripting. Of course, I can't
help it if people are already doing so, but it gets us into this
conundrum of people expecting any subscripted expression to
be expressible in plain text, and that is just clearly wrong --
it isn't generic or scalable. And it results in people coming
back to the table asking for more of them every time some
community is found making some other use of them. As Peter Kirk
pointed out, this kind of use of subscripting in linguistic
material is widespread.

Take an example, pulled more or less at random off the web,
Topics in Tiberian Biblical Hebrew Metrical Phonology and
Prosodics, by Henry Churchyard (a 1999 Ph.D. dissertation). 

http://www.crossmyt.com/hc/linghebr/

(in case anyone wishes to check up on me)

This uses conventions fairly widespread in metrical phonology,
where F stands for foot, lowercase-sigma stands for syllable,
and lowercase-mu stands for mora. If you examine the document, you
find instances of all 3 subscripted in various combinations,
in addition to the typical usage of subscripted numbers and
subscripted i to indicate particular consonants and matching
consonants:

   -C<sub>i</sub>C<sub>i</sub>#
   
So you find constructs like:

   [<sub>F</sub>[<sub>sigma</sub>mu<sub>sigma</sub>]
      [<sub>sigma</sub>mumu<sub>sigma</sub>]<sub>F</sub>]
      
And:

   sigma-with-combining-breve<sub>mu</sub>
   
   to represent: "a light syllable which is not a bimoraic-trochee
                  reduction structure head"
                  
Now, if, as Michael subsequently claimed:

> Or we do what we have done so far. Encode what people have been using.

Are we missing subscript-F, subscript-sigma, and subscript-mu for
metrical phonologists?

In case you missed it, that was a rhetorical question, and the
answer to it should be no. :-)

By the way, as I indicated, the case for the subscript-a, e, and o
seem better to me. The above dissertation, for example, makes use
of the subscript-a as a transcriptional notation for the furtive
patah -- the kind of evidence that argues *for* such a character
as useful for a plain text representation of linguistic
transcription.

--Ken

Re: (SC2WG2.609) New contribution N2705

Reply via email to