for Normalization.
Mark
__
http://www.macchiato.com
Eppur si muove
- Original Message -
From: Philippe Verdy [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Monday, July 14, 2003 11:13
Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish
- Original Message -
From: Philippe Verdy [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, July 12, 2003 14:45
Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in Turkish
and Azeri, was: Accented ij ligatures)
On Saturday, July 12, 2003 4:17 PM, Jony Rosenne
[EMAIL PROTECTED
Where does the fact of saying that a Grapheme Disjoiner...
The character you should be referring to is not a new character GDJ, but
rather is the existing ZWNJ, the functions of which include prevention of
a ligature.
- Peter
On Saturday, July 12, 2003 6:51 AM, Doug Ewell [EMAIL PROTECTED] wrote:
Philippe Verdy verdy_p at wanadoo dot fr wrote:
Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
On 11/07/2003 11:18, Philippe Verdy wrote:
# T: special case for uppercase I and dotted uppercase I
#- For non-Turkic languages, this mapping is normally not used.
#- For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.
snip
Is
At 03:25 -0700 2003-07-12, Peter Kirk wrote:
Does anyone know of a good resource on the web, or elsewhere,
listing the alphabets used for different languages around the world?
I know a project was attempted a few years ago at least for Europe.
It would be useful to have this kind of data
Samedi 12 juillet 6h51, Doug Ewell [EMAIL PROTECTED] crivit :
The codes iw for Hebrew and in for Indonesian were deprecated
FOURTEEN YEARS AGO. It is not accurate or fair to refer to them as
duplicates of he and id. The Registration Authority deprecates
such codes, rather than deleting
On 12/07/2003 04:18, Michael Everson wrote:
At 03:25 -0700 2003-07-12, Peter Kirk wrote:
Does anyone know of a good resource on the web, or elsewhere, listing
the alphabets used for different languages around the world? I know a
project was attempted a few years ago at least for Europe. It
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Patrick Andries
Sent: Saturday, July 12, 2003 2:12 PM
To: Philippe Verdy; Doug Ewell
Cc: [EMAIL PROTECTED]
Subject: Re: ISO 639 duplicate codes (was: Re: Ligatures in
Turkish and Azeri, was: Accented ij
Michael Everson [EMAIL PROTECTED] écrivit :
At 08:11 -0400 2003-07-12, Patrick Andries wrote:
Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine
to
me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?
Iwrit (iw), being a German transliteration of the name of
On Saturday, July 12, 2003 4:17 PM, Jony Rosenne [EMAIL PROTECTED] wrote:
What has iw to with Hebrew?
I wasn't involved with the change, but I'm glad it was done. Java and
other systems probably still use it because they never bothered to
check the latest version of 639. I know for certain
Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
Adding to the long, and unfortunately getting longer, list of misleading
statements from Philippe! No, the reason for the Soft_Dotted property
was/is to mark which characters (regardless of
On Friday, July 11, 2003 1:12 PM, Kent Karlsson [EMAIL PROTECTED] wrote:
Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
Adding to the long, and unfortunately getting longer, list of
misleading statements from Philippe! No, the reason for
On 11/07/2003 05:56, Philippe Verdy wrote:
Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
Whatever it was that was specially created or adjusted for Turkish and
Azeri, was it specifically restricted to these two languages? These are
I
On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
So I hope that what is fixed by Unicode is the name not
of two languages but of an extensible family of scripts.
I think you speak about family of languages?
Good luck with ISO language codes which does not even
define them,
On 11/07/2003 08:51, Philippe Verdy wrote:
On Friday, July 11, 2003 3:50 PM, Peter Kirk [EMAIL PROTECTED] wrote:
So I hope that what is fixed by Unicode is the name not
of two languages but of an extensible family of scripts.
I think you speak about family of languages?
Not really. A set
On Friday, July 11, 2003 6:43 PM, Peter Kirk [EMAIL PROTECTED] wrote:
Agreed. But does Unicode actually treat them as non-normative samples?
Note clear here: the reference documents say that these tables are
normative for applications that want to implement a conforming
case folding. But UTR#30
Philippe Verdy verdy_p at wanadoo dot fr wrote:
Good luck with ISO language codes which does not even
define them, and contain many duplicate codes even in
the Alpha-2 space (he/iw, in/id), or unprecize codes
matching sometimes very imprecize families of languages
overlapping other language
On Thursday, July 10, 2003 12:08 PM, Peter Kirk [EMAIL PROTECTED] wrote:
On 1st July Philippe Verdy wrote:
If fonts still want to display dots on these characters, that's a
rendering problem: there already exists a lot of fonts used for
languages other than Turkish and Azeri, which do
On 10/07/2003 08:21, Philippe Verdy wrote:
In Turkish and Azeri the sequences f - i and f - dotless i both occur,
and are fairly frequent. So it is inappropriate in these languages to
use fi ligatures in which the dot on the i is lost or invisible, at
least where the second character is a dotted
On Thursday, July 10, 2003 5:41 PM, Peter Kirk [EMAIL PROTECTED] wrote:
Isn't there a Grapheme Disjoiner format control character to
force the absence of a ligature like fi, i.e. f, GDJ, i?
Maybe, but it is hardly realistic to expect all existing Turkish and
Azeri text to be recoded to
On 10/07/2003 09:34, Stefan Persson wrote:
Peter Kirk wrote:
Maybe, but it is hardly realistic to expect all existing Turkish and
Azeri text to be recoded to insert a character in the middle of each f
- i sequence.
Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar
code
Peter Kirk wrote:
Maybe, but it is hardly realistic to expect all existing Turkish and
Azeri text to be recoded to insert a character in the middle of each f -
i sequence.
Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar code
pages? I that case, it would be enough to add
On Thursday, July 10, 2003 6:42 PM, Peter Kirk [EMAIL PROTECTED] wrote:
Anyway, I understood from the recent discussion of Hebrew that it is
Unicode policy not to do anything which could theoretically invalidate
existing text even if it could be proved that no such text existed.
Where does
Peter Kirk asked:
In Turkish and Azeri the sequences f - i and f - dotless i both occur,
and are fairly frequent. So it is inappropriate in these languages to
use fi ligatures in which the dot on the i is lost or invisible, at
least where the second character is a dotted i. Has any
On Thursday, July 10, 2003 8:37 PM, Kenneth Whistler [EMAIL PROTECTED] wrote:
Peter Kirk asked:
In Turkish and Azeri the sequences f - i and f - dotless i both
occur, and are fairly frequent. So it is inappropriate in these
languages to use fi ligatures in which the dot on the i is
Philippe Verdy scripsit:
Where does the fact of saying that a Grapheme Disjoiner can be used
in Turkish to avoid that the f collapses the dot above a next lowercase i?
It is settled that ZWNJ is the correct character to break ligatures.
ZWJ means make a ligature if you can; if not, shape
On 10/07/2003 11:37, Kenneth Whistler wrote:
At Peter pointed out, however, it is neither expected or reasonable
to have to go back through and drop in ZWNJ's at every relevant
location in existing Turkish or Azeri text, simply to prevent
fi ligation. Such use of ZWNJ is intended to be
See also
http://www.microsoft.com/typography/developers/opentype/detail.htm
which explains how ligatures can be turned off on a language-dependent basis.
Laurentiu
Peter Kirk asked:
In Turkish and Azeri the sequences f - i and f - dotless i both occur,
and are fairly frequent. So it is
and Philippe Verdy responded with another question:
Isn't there a Grapheme Disjoiner format control character to
force the absence of a ligature like fi, i.e. f, GDJ, i?
The answer to Philippe's rejoinder question is no, there is not
a Grapheme Disjoiner format control
Peter == Peter Kirk [EMAIL PROTECTED] writes:
Peter Maybe, but it is hardly realistic to expect all existing
Peter Turkish and Azeri text to be recoded to insert a character in
Peter the middle of each f - i sequence.
But a lot of it already does do that. In TeX Turkish uses f{}i to
block the
31 matches
Mail list logo