On 7/5/15 09:34, Philip Taylor wrote:


Apostolos Syropoulos wrote:

The only mark that remains when making all capitals is the dieredis
(dialytika). All other vanish. This is common knowledge for people who
speak and write Greek.

Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
a native Greek speaker and Director of the Hellenic Institute.  This is
why I asked whether it was a universally-agreed truism or simply a
matter of opinion, and in view of the fact that both Dr Dendrinos (in
private correspondence) and Julian Bradfield (on this list) have offered
the alternative perspective to your own, it would seem to be a matter of
opinion rather than one of fact.  If you look at the opening folio of
George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :

        
http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/

you will see a number of Greek majuscules with either psilí or daseîa,
including the very combination under discussion (GREEK CAPITAL LETTER
EPSILON WITH PSILI, on line 2), suggesting that the combination of
breathing and majuscule was common at that time.

I think there may be some confusion as to exactly what this discussion is about. Certainly, "the combination of breathing and majuscule" occurs in mixed-case polytonic text, as shown in your example. However, Apostolos is (I think) addressing the case of all-uppercase text, in which case the usual practice is to drop all marks except dieresis.

See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html; note the presence of breathing marks on initial capitals within the text, but note also their complete absence in the ALL-CAPS title.

So if a lower-to-uppercase mapping is used just to Capitalize Initial Letters, it perhaps should not discard breathing marks; but if it is used to turn a passage of text into ALL UPPERCASE, then it probably should discard them.

But things are actually trickier than that. AIUI, the most correct polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.

The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no matter what code assignments are chosen; neither can the per-character properties in Unicode. It requires a more powerful approach to case transforms.

So I still maintain that the default code values assigned in formats such as xe(la)tex should be based directly on the Unicode properties. It would be great to have a Greek package that implements proper Greek uppercasing, but this level of language- and orthography-specific behavior does not belong in the base format.

JK



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to