On 11/18/2010 11:15 PM, Peter Constable wrote:
If you'd like a precedent, here's one:

Yes, I think discussion of precedents is important - it leads to the formulation of encoding principles that can then (hopefully) result in more consistency in future encoding efforts.

Let me add the caveat that I fully understand that character encoding doesn't work by applying cook-book style recipes, and that principles are better phrased as criteria for weighing a decision rather than as formulaic rules.

With these caveats, then:
  IPA is a widely-used system of transcription based primarily on the Latin 
script. In comparison to the Janalif orthography in question, there is far more 
existing data. Also, whereas that Janalif orthography is no longer in active 
use--hence there are not new texts to be represented (there are at best only 
new citations of existing texts), IPA is as a writing system in active use with 
new texts being created daily; thus, the body of digitized data for IPA is 
growing much more that is data in the Janalif orthography. And while IPA is 
primarily based on Latin script, not all of its characters are Latin 
characters: bilabial and interdental fricative phonemes are represented using 
Greek letters beta and theta.

IPA has other characteristics in both its usage and its encoding that you need to consider to make the comparison valid.

First, IPA requires specialized fonts because it relies on glyphic distinctions that fonts not designed for IPA use will not guarantee. (Latin a with and without hook, g with hook vs. two stories are just two examples). It's also a notational system that requires specific training in its use, and it is caseless - in distinction to ordinary Latin script.

While several orthographies have been based on IPA, my understanding is that some of them saw the encoding of additional characters to make them work as orthographies.

Finally, IPA, like other phonetic notations, uses distinctions between letter forms on the character level that would almost always be relegated to styling in ordinary text.

Because of these special aspects of IPA, I would class it in its own category of writing systems which makes it less useful as a precedent against which to evaluate general Latin-based orthographies.

Given a precedent of a widely-used Latin writing system for which it is 
considered adequate to have characters of central importance represented using 
letters from a different script, Greek, it would seem reasonable if someone 
made the case that it's adequate to represent an historic Latin orthography 
using Cyrillic soft sign.

I think the question can and should be asked, what is adequate for a historic orthography. (I don't know anything about the particulars of Janalif, beyond what I read here, so for now, I accept your categorization of it as if it were fact).

The precedent for historic orthographies is a bit uneven in Unicode. Some scripts have extensive collection of characters (even duplicates or near duplicates) to cover historic usage. Other historic orthographies cannot be fully represented without markup. And some are now better supported than at the beginning because the encoding has plugged certain gaps.

A helpful precedent in this case would be that of another minority or historic orthography, or historic minority orthography for which the use of Greek or Cyrillic characters with Latin was deemed acceptable. I don't think Janalif is totally unique (although the others may not be dead). I'm thinking of the Latin OU that was encoded based on a Greek ligature, and the perennial question of the Kurdish Q an W (Latin borrowings into Cyrillic - I believe these are now 051A and 051C). Again, these may be for living orthographies.

   /Against this backdrop, it would help if WG2 (and UTC) could point
   to agreed upon criteria that spell out what circumstances should
   favor, and what circumstances should disfavor, formal encoding of
   borrowed characters, in the LGC script family or in the general case./


That's the main point I'm trying to make here. I think it is not enough to somehow arrive at a decision for one orthography, but it is necessary for the encoding committees to grab hold of the reasoning behind that decision and work out how to apply consistent reasoning like that in future cases.

This may still feel a little bit unsatisfactory for those whose proposal is thus becoming the test-case to settle a body of encoding principles, but to that I say, there's been ample precedent for doing it that way in Unicode and 10646.

So let me ask these questions:

   A. What are the encoding principles that follow from the disposition
   of the Janalif proposal?

   B. What precedents are these based on resp. what precedents are
   consciously established by this decision?


A./


Reply via email to