Hi Zdeněk,

Checking the Unicode character database[1], U+0587 is listed as having a *compatibility* decomposition to <0565,0582> (not 0587):

0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;

Likewise, the SpecialCasing.txt file[2] that defines case mappings other than simple 1:1 substitutions shows the same decomposition for the uppercase form:

0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN

So if I understand correctly, what \text_uppercase:n is doing is simply implementing what the Unicode standard defines.

If this isn't the appropriate behavior, at least for some locales, I believe that will need custom programming at some level, but I don't know enough about it to get into any details.

As for whether xelatex (or other engines) form a ligature from one (or other) of the decomposed sequences, that would be entirely in the hands of the font developer. I guess such ligatures are not implemented widely (if at all).

JK

[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
[2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt

On 01/05/2022 12:50, Zdenek Wagner wrote:
Hi David,

when trying to explain it in a greater detail I found that the situation is even more complex. As I wrote, I follow Elena Yerevan on youtube and facebook so all what I know, I know from her videos, from her name written in both alphabets, from Wikipedia and from https://omniglot.com/writing/armenian.htm <https://omniglot.com/writing/armenian.htm> which means that I know generally nothing. We need clarification from people who know Armenian (հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the authors of te ArmTeX project (hopefully at least one of the addresses still exists).

I will start with the typical use case. The title of a chapter in the book class is written in lowercase and displayed that way in the chapter title as well as in the table of contents but appears in uppercase in the running head. This is why it should work.

The case of ligatures is different. My fonts have not only ff, fl, and fi ligatures but even ffi and ffl. If I find a word "difficult" on a web using a serif font, I see the ffi ligature but the source shows that it has the individual characters f, f, i and the ligature was created by the shaping engine. If I copy it and paste into a text editor such as vim or notepad, I will get the three characters. If I use it as a TeX source and typeset it withComputer Modern or Latin Modern, I will get the ffi ligature and \uppercase will work. If I copy U+0587 from a web page and copy it to a text editor, I will get U+0587. I tried both U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is considered ECH and YIWN but it seems that it is more historical and bound to the shape. If I understand it well, sun is pronounced in Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical spelling (as given in Wictionary) and probably also in the Western variant. As you can see on Omniglot, the Armenian names of Eastern/Western Armenian start with "arew" with these two spellings. Even "hayeren" (Armenian) has different spelling in the Eastern/Western variants (I have included both at the beginning of this mail). Having found the informatin on variants I saw that polyglossia supports variant=western. I tried to specify variant=eastern but it did not help. If you look at ot6enc.def, it defines uppercase variants at the end of the file where the uppercase version of \armew is \Arm@yechvev which is \Armyech\Armvev. I cannot try because I do not know the transliteration but just from the names of the characters it seems to me that it works correctly while \text_uppercase:n does not. It should know that U+0587 shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am not sure whether there are other issues and where exactly to fix it.

Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml <http://ttsm.icpf.cas.cz/team/wagner.shtml>

Reply via email to