Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

Peter Kirk Sat, 05 Jun 2004 05:33:17 -0700

On 04/06/2004 18:11, Kenneth Whistler wrote:

...
There ARE cases in which entire alphabets have been given compatibility decompositions to other alphabets.

^^^^^^^^^ The operative word here is alphabets, as should be obvious. These are not separate scripts. It they *had* been treated as separate scripts, they would have been named differently and would *not* have gotten compatibility decompositions.

Well, we come back to the irreconcilable difference. You and Michael assert that Phoenician is a separate script from Hebrew. The scholars of Semitic writing who have written to this list or been quoted on it disagree, some in strong words, although some of them also agree with me that a mechanism for making plain text distinctions is also required. For example, Patrick Durusau, although in general terms he supports the proposal, wrote:

All Hudson is pointing out is that long PRIOR to Unicode, Semitic scholars reached the conclusion all Semitic languages share the same 22 characters. A long standing and quite useful conclusion that has nothing at all to do with your proposal.

But I dispute his last sentence. If the writing systems of these languages share the same abstract characters, they form a single script, which conflicts with the proposal to encode Phoenician as a separate script.

For example there are the Mathematical Alphanumeric Symbols, the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as well as superscripts, subscripts, modifier letters etc. These symbols have these compatibility decompositions because they are not considered to form a separate script,
True.
but rather to be glyph variants of characters in Latin, Greek, Katakana etc script.

Not a complete characterization. They are "presentation variants" or other specifically styled versions of the alphabets, encoded distinctly for one or another compatibility reason presented in the history of the encoding. Nearly all of them have a preexisting encoded (or named entity) existence that the standard required mapping to.

Well, the Mathematical Alphanumeric Symbols did not have a previous separate existence. And I am arguing that Phoenician and Hebrew are presentation variants of a single script, although not so specifically styled.

...
If Phoenician (~ Old Canaanite) is separately encoded from Hebrew,
compatibility decompositions will *not* go in for folding them.
If Phoenician (~ Old Canaanite) is *not* separately encoded from Hebrew, then there won't be any separately encoded Phoenician characters to have compatibility decompositions for, and thus, again, compatibility decompositions will *not* go in for folding them.

This is in fact my preferred solution, because Phoenician or Old Canaanite is not a separate script and so should not be encoded as such. A mechanism is required for making a plain text distinction, but without defining a separate script. The Unicode standard recognised this situation:

Occasionally the need arises in text processing to restrict or change the set of glyphs that are to be used to represent a character. Normally such changes are indicated by choice of font or style in rich text documents. In special circumstances, such a variation from the normal range of appearance needs to be expressed side-by-side in the same document in plain text contexts, where it is impossible or inconvenient to exchange formatted text.

That is a quote from TUS 4.0.1 section 15.6, p.397. In that section a mechanism, variation selectors, is defined for making such distinctions in plain text. You, Ken, have argued that this mechanism is inappropriate for a situation like this one. In that case, if we accept for the time being my premise that Phoenician is not a separate script but requires a plain text mechanism to distinguish it from Hebrew, there is a need for an alternative mechanism for such a situation.

And just as you claim my statement below begs the question
regarding the status of Phoenician, so does your characterization
of "Phoenician and Hebrew variant glyphs of the same script".

I accept that what I wrote depends on the understanding of Semitic scholars that these are the same script.

Cross-script equivalencing is done by transliteration algorithms, not by normalization algorithms.

This begs the question. Scholars of Semitic languages do not accept that

^^^^^^^^ Recte: Some scholars

Well, I have not seen any scholars of Semitic languages state that Phoenician is in principle a separate script from Hebrew, although some have accepted the proposal, from a misunderstanding of the character-glyph model and because it would be convenient in practice for their work.

this is a cross-script issue. They do not accept that representation of

^^^^ Recte: Some scholars of Semitic languages

No, "They" is quite adequate because if "Scholars" is corrected in the previous sentence as you already requested there is no need for further correction here. Stop picking the same nit twice.

a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is transliteration. Rather, for them it is a matter of replacing an obsolete or non-standard glyph by a modern standard glyph for the same character - just as one would not describe as transliteration

^^^^^^^ Bogus analogy alert.

I accept that this analogy depends on my understanding of Phoenician as a script variety, like Fraktur, rather than a separate script.

representation in Times New Roman of a Latin script text in mediaeval handwriting or in Fraktur.
...
The issue is simply the difficulty of coming to consensus for
certain archaic collections of writing systems, what constitutes
an encodable script boundary and what does not.
And that, my friend, was obvious a *MONTH* ago in this dicussion.

Agreed. The problem comes from the continued inability of some to accept that the scholars of these scripts are in the best position to judge what constitutes a script boundary and what does not, or even to accept that their views are worthy of consideration, that there is some relevance to the fact that "long PRIOR to Unicode, Semitic scholars reached the conclusion all Semitic languages share the same 22 characters".

...
ut I accept that this Coptic to Greek compatibility has a few problems because not all characters have mappings. However, this is not a problem for Phoenician, because *every* Phoenician character has an unambiguous compatibility mapping to an existing Hebrew character.

No Phoenician encoded character has a *compatibility* mapping to an existing Hebrew character until the UTC says that it does.

Well, I'll drop Coptic, and agree that my terminology was lacking at this point.

The mappings are unambiguous and obvious, I grant you.
But the chance that those mappings will be instantiated as compability
decomposition mappings in the standard is zero.
I don't like the notion of interleaving in the default weighting table, and have spoken against it, but as John Cowan has pointed out, it is at least feasible. It doesn't have the ridiculousness factor of the compatibility decomposition approach.

If what I have suggested is ridiculous, so is what the UTC has already

^^^^^ Bogus analogy alert.

defined for Mathematical Alphanumeric Symbols.

The analogy is only bogus if you presuppose that Phoenician is a separate script.

... The equivalencing of 22 Phoenician letters, one-to-one against Hebrew characters, where the mapping is completely known and uncontroversial, is a minor molehill.

Well, why not make these uncontroversial equivalents, between variant glyphs for the same script, compatibility decompositions?
Well, if you don't understand the technical issues, Peter, how
about this for a reason why not:
Because even if you came personally to the UTC and argued the
case for your position at the meeting, I predict that your
proposal would be turned down by a 0 For, 12 Against vote.
Of course, if you also brought along the Patron Saint of Lost Causes to help you, the vote might turn out 1 For, 11 Against.

Well, I still have some hope that the UTC might base their decisions on the theoretical character-glyph model and on the long-standing judgment of scholars of Semitic writing, rather than on the views of generalists and Indo-Europeanists supported by some who do not understand the character-glyph model. But I accept your argument that compatibility equivalence is not the best way to go.

Now I've really had my fill of rehashing and regurgitation on Phoenician. If anybody wants to have any *actual* impact on the encoding decisions, I would suggest they finish writing up and submitting formal documents for the UTC discussion, instead of spending the weekend boring the rest of this list with a further commodious vicus of recirculation....

I was in the middle of preparing a formal submission when I realised that my arguments were tending towards compatibility equivalence and so felt the need to explore this avenue in more detail. I now accept that this is in fact a dead end.

On 04/06/2004 21:50, Simon Montagu wrote:

Peter Kirk wrote:
But I accept that this Coptic to Greek compatibility has a few problems because not all characters have mappings. However, this is not a problem for Phoenician, because *every* Phoenician character has an unambiguous compatibility mapping to an existing Hebrew character.
As I've said before, final forms in Hebrew make this not 100% true, and I have seen both mappings in use in practice. For example http://he.wikipedia.org/wiki/%D7%9E%D7%A6%D7%91%D7%AA_%D7%9E%D7%99%D7%A9%D7%A2 shows the text of the Mesha stele beginning "×××. ×××. ××. ×××.. . ×××. ×××", and I have a book (2 Kings in the "Olam Hatanach" series) which shows it beginning "×××. ×××. ××. ×××[××] . ×××. ×××"

Understood, and thank you for the point. I might argue that this ambiguity is not a real one but has been introduced by Unicode because of the choise it made to encode separately Hebrew final forms, but not Arabic ones. Separate encoding of Hebrew final forms is also a violation of the character-glyph model as these are variant glyphs for the same abstract character. But I accept that in modern Hebrew and Yiddish final forms have taken on some life of their own, justifying their separate encoding - although I might want to argue that they should have been defined as compatibility equivalents of non-final forms.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

Reply via email to