Re: [XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-07 Thread Philip Taylor


Jonathan Kew wrote:

> I still maintain that the default code values assigned in formats
> such as xe(la)tex should be based directly on the Unicode properties. It
> would be great to have a Greek package that implements proper Greek
> uppercasing, but this level of language- and orthography-specific
> behaviour does not belong in the base format.

I would strongly support that proposal.
Philip Taylor


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-07 Thread Joseph Wright
On 07/05/2015 10:56, Jonathan Kew wrote:
> On 7/5/15 09:34, Philip Taylor wrote:
>>
>>
>> Apostolos Syropoulos wrote:
>>
>>> The only mark that remains when making all capitals is the dieredis
>>> (dialytika). All other vanish. This is common knowledge for people who
>>> speak and write Greek.
>>
>> Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
>> a native Greek speaker and Director of the Hellenic Institute.  This is
>> why I asked whether it was a universally-agreed truism or simply a
>> matter of opinion, and in view of the fact that both Dr Dendrinos (in
>> private correspondence) and Julian Bradfield (on this list) have offered
>> the alternative perspective to your own, it would seem to be a matter of
>> opinion rather than one of fact.  If you look at the opening folio of
>> George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :
>>
>> 
>> http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/
>>
>>
>> you will see a number of Greek majuscules with either psilí or daseîa,
>> including the very combination under discussion (GREEK CAPITAL LETTER
>> EPSILON WITH PSILI, on line 2), suggesting that the combination of
>> breathing and majuscule was common at that time.
> 
> I think there may be some confusion as to exactly what this discussion
> is about. Certainly, "the combination of breathing and majuscule" occurs
> in mixed-case polytonic text, as shown in your example. However,
> Apostolos is (I think) addressing the case of all-uppercase text, in
> which case the usual practice is to drop all marks except dieresis.
> 
> See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html;
> note the presence of breathing marks on initial capitals within the
> text, but note also their complete absence in the ALL-CAPS title.
> 
> So if a lower-to-uppercase mapping is used just to Capitalize Initial
> Letters, it perhaps should not discard breathing marks; but if it is
> used to turn a passage of text into ALL UPPERCASE, then it probably
> should discard them.
> 
> But things are actually trickier than that. AIUI, the most correct
> polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only
> is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.
> 
> The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no
> matter what code assignments are chosen; neither can the per-character
> properties in Unicode. It requires a more powerful approach to case
> transforms.
> 
> So I still maintain that the default code values assigned in formats
> such as xe(la)tex should be based directly on the Unicode properties. It
> would be great to have a Greek package that implements proper Greek
> uppercasing, but this level of language- and orthography-specific
> behavior does not belong in the base format.

Indeed, whilst not what I was after here (which as you say is about
defaults for the formats), in the expl3 code I've written for case
changing the idea of positional dependence is built it. There's no
question that the TeX 1-1 mapping for case changing is not applicable to
many situations, not just the case of Greek text. I'll ask a separate
question about Greek case mapping for the expl3 context later on as it
seems to have people's attention.
--
Joseph Wright




--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-07 Thread Jonathan Kew

On 7/5/15 09:34, Philip Taylor wrote:



Apostolos Syropoulos wrote:


The only mark that remains when making all capitals is the dieredis
(dialytika). All other vanish. This is common knowledge for people who
speak and write Greek.


Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
a native Greek speaker and Director of the Hellenic Institute.  This is
why I asked whether it was a universally-agreed truism or simply a
matter of opinion, and in view of the fact that both Dr Dendrinos (in
private correspondence) and Julian Bradfield (on this list) have offered
the alternative perspective to your own, it would seem to be a matter of
opinion rather than one of fact.  If you look at the opening folio of
George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :


http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/

you will see a number of Greek majuscules with either psilí or daseîa,
including the very combination under discussion (GREEK CAPITAL LETTER
EPSILON WITH PSILI, on line 2), suggesting that the combination of
breathing and majuscule was common at that time.


I think there may be some confusion as to exactly what this discussion 
is about. Certainly, "the combination of breathing and majuscule" occurs 
in mixed-case polytonic text, as shown in your example. However, 
Apostolos is (I think) addressing the case of all-uppercase text, in 
which case the usual practice is to drop all marks except dieresis.


See, for example, http://unicode.org/udhr/d/udhr_ell_polytonic.html; 
note the presence of breathing marks on initial capitals within the 
text, but note also their complete absence in the ALL-CAPS title.


So if a lower-to-uppercase mapping is used just to Capitalize Initial 
Letters, it perhaps should not discard breathing marks; but if it is 
used to turn a passage of text into ALL UPPERCASE, then it probably 
should discard them.


But things are actually trickier than that. AIUI, the most correct 
polytonic UPPERCASE transform for "μάιος" would be "ΜΑΪΟΣ" -- not only 
is the accent on ά gone, but ι has acquired a dieresis and become Ϊ.


The \uccode/\lccode tables in (Xe)TeX cannot fully capture this, no 
matter what code assignments are chosen; neither can the per-character 
properties in Unicode. It requires a more powerful approach to case 
transforms.


So I still maintain that the default code values assigned in formats 
such as xe(la)tex should be based directly on the Unicode properties. It 
would be great to have a Greek package that implements proper Greek 
uppercasing, but this level of language- and orthography-specific 
behavior does not belong in the base format.


JK



--
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex


Re: [XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-07 Thread Philip Taylor


Apostolos Syropoulos wrote:

> The only mark that remains when making all capitals is the dieredis
> (dialytika). All other vanish. This is common knowledge for people who
> speak and write Greek.

Well, this is not the opinion of (for example) Dr Charalambos Dendrinos,
a native Greek speaker and Director of the Hellenic Institute.  This is
why I asked whether it was a universally-agreed truism or simply a
matter of opinion, and in view of the fact that both Dr Dendrinos (in
private correspondence) and Julian Bradfield (on this list) have offered
the alternative perspective to your own, it would seem to be a matter of
opinion rather than one of fact.  If you look at the opening folio of
George Etheridge's Encomium on Henry VIII, addressed to Elizabeth I :


http://hellenic-institute.rhul.ac.uk/research/Etheridge/Electronic-Edition/

you will see a number of Greek majuscules with either psilí or daseîa,
including the very combination under discussion (GREEK CAPITAL LETTER
EPSILON WITH PSILI, on line 2), suggesting that the combination of
breathing and majuscule was common at that time.

** Phil.


--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex


[XeTeX] Σχετ: Re: Assignment of codes (particularly \catcode) based on Unicode data

2015-05-06 Thread Apostolos Syropoulos
The only mark that remains when making all capitals is the dieredis 
(dialytika). All other vanish. This is common knowledge for people who speak 
and write Greek.


AS

Στάλθηκε από το Ταχυδρομείο Yahoo στο Android 






--
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex