[hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

Jim Allan Fri, 26 Dec 2003 23:35:54 -0800

Mark E. Shoulson wrote:

This is a particularly cogent point. The Mishna (c. 1st century C.E.) does explicitly distinguish between Paleo-Hebrew and Square Hebrew (tractate Yadayim 4:5). That's not a font-difference, that's a script-difference, I think.

There were no such things as fonts in the 1st century C.E. So it would have to be a script-difference. But what is a "script"?

"Script", as I pointed out previously, is a word of wide meaning. The difference between Paleo-Hebrew and Square Hebrew is a script difference. But the word "script" is also used for different varieties of the Square Hebrew script. Check in Google for ["rashi script"] or ["ari script"] . There is a two-volume book: _Specimans of Medieval Hebrew Scripts_ by Malachi beit Arie. See http://www.bookgallery.co.il/content/english/static/book8177.asp

Check also in Google for ["italic script"], ["uncial script"], ["blackletter script" OR "black letter script"].

We are talking about exactly the same alphabet (or abjad) here, twenty-two letters in the same order with identical meaning originating from the same sources recording the identical text with identical spelling.

Compare the gradual change from blackletter "scripts" to Antiqua style Latin characters (including the italic script) in Renaissance and post-Renaissance Europe. This is similar to the change from Phoenician style to Aramaic style.

This is the other really significant point: Semitic scholars may all agree, but all the world is not Semitic scholarship, and non-{Semitic scholars} have to be satisfied as well. Since the Semitic scholars are also getting what they want, where's the harm in encoding more alphabets?

Who are these non-scholars who want the Palmyrene script (for example) to be encoded separately from other Aramaic scripts? Who are the scholars who want this? How many persons in the world want Palmyrene to be encoded separately? As many as fifty? Or is there just Michael Everson?

There may be some such scholars, and if so I would like to hear the arguments they would bring forth. I'm willing to be convinced by arguments. I'm not an *expert* in Aramaic scripts. There aren't that many who are.

As to harm, where's the harm in encoding Japanese kanzi separately, or Latin uncial, or a complete set of small capitals as a third case? Where's the harm in encoding Latin Renaissance scripts separately?

No harm perhaps, but no good either. There is no need or use for such encodings. Scholars using Latin letters and non-scholars using Latin letters are not asking for separate coding of the script used in the Beowulf manuscript and so forth. They don't want every Latin "script" variation encoded separately.

It's not *that* simple: one could argue (as is being done) that more alphabets would lead to confusion about which one should be used, and mess up searches. I guess we'd just have to make sure that people doing scholarly work in Semitic languages know to use Hebrew all the time (they already know that), no matter what the language.

But the point is that many of these Semitic language use the *same* abjad with different styling, one such styling being the letters encoded in Unicode as Hebrew letters with default glyphs of modern Hebrew form. Only the letter shapes are different. But between some northwest Semitic "scripts" they are not very different, less so than between Latin "script" and Latin "script".

Second, people doing work in Semitic languages using the Latin alphabet do also often use Latin transliterations (which do not all agree). I assume that there are also standard Cyrillic transliterations used by scholars using the Cyrillic alphabet and so forth.

Such things are not for Unicode to regulate.

And in cases where material is to be incorporated from non-scholarly sources who used another alphabet, that can be transcoded when entered into databases to keep them uniform if that's what's necessary, but presumably that wouldn't happen often.

What non-scholarly sources? Why would a non-scholar *need* or *desire* Palmyrene Aramaic encoded separately while a scholar would not? A change to a Palmyrene Aramaic font would do the job as well, for Palmyrene Aramaic and any of the various Aramaic "scripts" or "styles" just as a font change does for historical styles for European scripts if someone want to print of display them. In fact such fonts do poorly, just as a general black letter medieval font will do poorly for anything but the exact manuscript on which it was based, if based on a particular manuscript. There are no fonts before modern times, no exactly standardized characters, no exactly standardized type styles. Every scribe has a different hand. Characters in simple charts of Semitic scripts are often deceptive just as charts of forms taken by medieval Latin characters in particular "scripts"/"styles" are deceptive, often being a choice made by a scholar from many variants.

Coding Aramaic generally as a single script in Unicode would code all the "script" variations. This has already been done by encoding the square Aramaic letters in their "modern Hebrew" forms. What more is needed for encoding? Similarly Latin has been encoded with modern Latin letter forms as the default glyphs and Greek has been encoded with modern Greek letter forms as the default glyphs. One might want some further final forms and additional punctuation for Aramaic styles (or might not). That can be decided. Otherwise, there is nothing much more to do, save perhaps add a matrix somewhere showing variant glyphs in different Aramaic "scripts"/"styles".

To take another example, all runic "scripts" have been unified in Unicode, though the runic "scripts" vary greatly in the number of letters used and in the values of the letters as well as in their appearance. There is more *reason* to produce separate encodings for the various runic scripts then for northwest Semitic "scripts", though I've heard no complaints about the unification of runic "scripts" and I have no complaints myself.

Indeed, there is no *reason* when looking at the values of the characters of the Semitic "scripts" related to Phoenician that there could not have been a single encoding for the consonants for *all* these supposed "scripts" (with separate encodings for the pointings).

A common Semitic encoding *could* still be added to Unicode, with individuals deciding whether or not to use that coding also for Arabic, Hebrew and Syriac.

I am not recommending this.

I am pointing out how much these scripts are seen to be stylistic variants of one another to one who can to some extent read them.

If one must split them up, charts and scholarly books do provide normal divisions of "scripts" or "styles" which correspond to those given by Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf

All that has been well worked out for the common "scripts". A normal division is:

1.) Proto-Sinaitic and other early pictographs. 2.) Old Arabic "scripts" (Old South Arabic and Old North Arabic). 3.) Northwest Semitic (the 22-character abjad including Phoenician scripts, descendant Aramaic scripts such as square Aramaic used for Hebrew and also including Syriac). 4.) Arabic (which though descended from Nabatean Aramaic became so different that it might be better encoded separately, perhaps to be compared to the Aramaic scripts in somewhat the same way as Latin might be compared to early Greek scripts).

The common 22-character Northwest Semitic abjad can be broken down into:

1.) Phoenician/Canaanite scripts including Paleo-Hebrew and its descendant Samaritan and also Paleo-Aramaic. 2.) Later Aramaic scripts. 3.) Syriac scripts which differ greatly in appearance from the other Aramaic scripts.

Note: special appearance and pointing for Hebrew and Syriac is really the only reason to distinguish these particularly. The letters are the same in origin and are more the same in meaning than between Greek script and variant Greek script. Greek letters in variant Greek scripts however are (generally) far more alike in appearance than the characters of the various early northwest Semitic "scripts"/"styles".

But should a difference in appearance count in a decision to code separately within Unicode when *every* other feature of two "scripts" is identical, including origin?

Hebrew scriptures were first written in the Phoenician script (= Paleo-Hebrew), then in Aramaic script which developed *very* slightly in medieval times to the normal modern Hebrew script. Emerson's division would suggest four different scripts ought to be used for coding the same texts with the same logical characters with the same names, that texts should be encoded as Phoenician or Aramaic or Hebrew or Samaritan depending on style, when when letter-by-letter the same.

Cursive Hebrew still retains for some letter forms the Phoenician shapes (which is very strange). Should cursive Hebrew therefore be encoded separately?

I don't see any purpose in encoding these scripts differently in Unicode when they represent *exactly* the same abjad with only different styling of the characters.

Michael Everson at http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2311.pdf could only say:

<< Note that Jony Rosenne once suggested that we should not encode Phoenician because it is a glyph variant of Hebrew. This is not true, despite the one-to-one correspondence of character entities. In the Dead Sea Scrolls, for instance, where the Tetragrammaton is written with Paleo-Hebrew letters, it is (in UCS encoding terms) the Phoenician script in which the Name is written. >>

First, there is not *just* a one-to-one correspondence of character entities but also one-to-one correspondence of the characters in respect to their origin and names. They *are* the same abjad in all but style.

Second, if it is argued that the use of Phoenician script for the Tetragrammaton in some texts otherwise written in square Aramaic characters indicates that Phoenician and square Aramaic characters must be encoded separately within Unicode, should not one make the same argument for medieval texts with a headline "script" imitating traditional Roman square capitals, initial paragraphs in uncial "script" and the main text in Carolingian "script" including majuscule and miniscule letters?

If Everson's argument is applied to medieval manuscripts, uncial "script" and Carolingian "script" and Roman capitals should be encoded separately within Unicode.

Also, the Tetragrammaton is represented in the English King James translation of Hebrew scriptures and in some more recent translations by the word LORD and sometimes GOD in which all but the first letter is printed in small capitals. Should small capitals therefore be encoded separately in Unicode?

(Note: these small capitals are the small capitals normally used for emphasis and usually appear slightly higher than the normal lowercase characters lacking ascenders. They are not the same as the lower case small capital characters coded in Unicode as phonetic characters which properly appear as identical in height to other lower case characters.)

That characters of one style are used in a text written predominately in another style does not indicate that the "script" or "style" to which they belong needs to be coded independently. That is what markup is for.

Peter Kirk has already made this point in part.

There seems to me *no* reason why most of Aramaic "scripts" should not be unified within Unicode with Hebrew and almost *no* reason why Phoenician and Samaritan should not be unified.

And there seems to me *little* reason why Hebrew/Aramaic "scripts" and Phoenician/Samaritan "scripts" should not be unified. The two families of styles use the same abjad though with differences in appearance too great for most of the letters to be seen as the same letters between the two families by appearance alone.

But how much should visual distinction count when it is the *sole* difference? It appears to me that this is where dispute lies mostly, despite the precedent of the Unicode encoding of runic "scripts".

There may also be some thinking of HTML/XML/XHTML web display of characters where forcing of font is not reliable. One would not want a discussion of ancient Phoenician characters to display modern Hebrew forms! But this same problem currently applies to runes, medieval Latin characters, Han characters and so forth. One shouldn't let the current shortcomings of one display method among many dictate Unicode encodings.

Jim Allan

[hebrew] Re: Ancient Northwest Semitic Script (was Re: why Aramaic now)

Reply via email to