On 05/06/2004 08:25, John Hudson wrote:

Peter Kirk wrote:

All Hudson is pointing out is that long PRIOR to Unicode, Semitic scholars reached the conclusion all Semitic languages share the same 22 characters. A long standing and quite useful conclusion that has nothing at all to do with your proposal.


But I dispute his last sentence. If the writing systems of these languages share the same abstract characters, they form a single script, which conflicts with the proposal to encode Phoenician as a separate script.


Did you read, also, my messages regarding the perception of instances of a script continuum? Restating your perception that the instances of Phoenician and Hebrew represent the same 'script' for Unicode purposes is just reverting to the fundamental disagreement with those who have stated a desire or need to distinguish such instances in plain text. 'Script' in Unicode is a generic term that does not necessarily relate to notions of script outside Unicode. The determining feature of a Unicode script, i.e. a labelled subset of characters, is that it is something that can be differentiated from other subsets of characters *in plain text*. Whether things so-differentiated are considered individual scripts outside of Unicode isn't very relevant to this usage. Indeed, Unicode might have avoided all this debate by not using the term script at all.


Well, I tend to agree that the word "script" has not helped. It doesn't help that the definition you use here conflicts with the one Michael Everson uses when he insists that Phoenician is a separate script. On your definition it is clearly not one until the UTC defines that it is. So we end up with a circular argument.

On your definition, the set of fullwidth forms FF01-FF5E is a separate script, because it is a labelled subset of characters which can be differentiated from any other such set in plain text. So are each of the subsets of mathematical alphanumeric symbols. But they have compatibility decompositions to regular Latin script. If these are separate scripts, I might accept that Phoenician should also be one. But Ken Whistler disagrees: he wrote yesterday "These are not separate scripts."

So let's drop "script" for now. My basic contention is that each letter of the Phoenician abjad is not a separate abstract character, but that it and the corresponding square Hebrew letter are glyph variants of the same abstract characters. And this is clearly the understanding of Semitic scholars, as summarised by Patrick Durusau and quoted above. On the other hand, nearly everyone agrees that there should be a mechanism for distinguishing them in plain text.

Is this a novel situation? No, for Unicode has clearly recognised this kind of situation in TUS section 15.6 which I quoted earlier. And Unicode has defined a mechanism for dealing with the situation, variation selectors. If this mechanism is not appropriate in this particular case, let the UTC come up with another mechanism to meet the user requirement. To define a new set of abstract characters for what are actually glyph variants is to ignore the character-glyph model.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/




Reply via email to