Hi Zdeněk,
Checking the Unicode character database[1], U+0587 is listed as having a
*compatibility* decomposition to <0565,0582> (not 0587):
0587;ARMENIAN SMALL LIGATURE ECH YIWN;Ll;0;L;<compat> 0565 0582;;;;N;;;;;
Likewise, the SpecialCasing.txt file[2] that defines case mappings other
than simple 1:1 substitutions shows the same decomposition for the
uppercase form:
0587; 0587; 0535 0582; 0535 0552; # ARMENIAN SMALL LIGATURE ECH YIWN
So if I understand correctly, what \text_uppercase:n is doing is simply
implementing what the Unicode standard defines.
If this isn't the appropriate behavior, at least for some locales, I
believe that will need custom programming at some level, but I don't
know enough about it to get into any details.
As for whether xelatex (or other engines) form a ligature from one (or
other) of the decomposed sequences, that would be entirely in the hands
of the font developer. I guess such ligatures are not implemented widely
(if at all).
JK
[1] https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
[2] https://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt
On 01/05/2022 12:50, Zdenek Wagner wrote:
Hi David,
when trying to explain it in a greater detail I found that the situation
is even more complex. As I wrote, I follow Elena Yerevan on youtube and
facebook so all what I know, I know from her videos, from her name
written in both alphabets, from Wikipedia and from
https://omniglot.com/writing/armenian.htm
<https://omniglot.com/writing/armenian.htm> which means that I know
generally nothing. We need clarification from people who know Armenian
(հայերէն and/or հայերեն), therefore I am sending Cc to Arthur and the
authors of te ArmTeX project (hopefully at least one of the addresses
still exists).
I will start with the typical use case. The title of a chapter in the
book class is written in lowercase and displayed that way in the chapter
title as well as in the table of contents but appears in uppercase in
the running head. This is why it should work.
The case of ligatures is different. My fonts have not only ff, fl, and
fi ligatures but even ffi and ffl. If I find a word "difficult" on a web
using a serif font, I see the ffi ligature but the source shows that it
has the individual characters f, f, i and the ligature was created by
the shaping engine. If I copy it and paste into a text editor such as
vim or notepad, I will get the three characters. If I use it as a TeX
source and typeset it withComputer Modern or Latin Modern, I will get
the ffi ligature and \uppercase will work. If I copy U+0587 from a web
page and copy it to a text editor, I will get U+0587. I tried both
U+0565 U+0582 (եւ) and U+0565 U+057E (եվ) but non of them form the
U+0587 (և) ligature in XeLaTeX. I did not understand why the ligature is
considered ECH and YIWN but it seems that it is more historical and
bound to the shape. If I understand it well, sun is pronounced in
Armenian as "arew" but արև (U+0561 U+0580 U+0587) is the Eastern
spelling but արեւ (U+0561 U+0580 U+0565 U+0582) is the classical
spelling (as given in Wictionary) and probably also in the Western
variant. As you can see on Omniglot, the Armenian names of
Eastern/Western Armenian start with "arew" with these two spellings.
Even "hayeren" (Armenian) has different spelling in the Eastern/Western
variants (I have included both at the beginning of this mail). Having
found the informatin on variants I saw that polyglossia supports
variant=western. I tried to specify variant=eastern but it did not help.
If you look at ot6enc.def, it defines uppercase variants at the end of
the file where the uppercase version of \armew is \Arm@yechvev which is
\Armyech\Armvev. I cannot try because I do not know the transliteration
but just from the names of the characters it seems to me that it works
correctly while \text_uppercase:n does not. It should know that U+0587
shiould be decomposed to U+0565 U+057E (not U+0582) and then uppercase
it to U+0535 U+054E (not U+0552), at least for the Eastern variant. I am
not sure whether there are other issues and where exactly to fix it.
Zdeněk Wagner
http://ttsm.icpf.cas.cz/team/wagner.shtml
<http://ttsm.icpf.cas.cz/team/wagner.shtml>