On 2003.07.07, 00:25, Peter Kirk <[EMAIL PROTECTED]> wrote:
> Maybe originally U+044B (cyrillic "y", "yery") was two separate
> letters,
It sure it (though I should provide some references to back this up? Hm,
later...)
> but it is certainly considered and used as one letter in Cyrillic
> langua
Phillipe wrote:
>>I hae tried several times to do it. It does not work: you may
>>effectively remove some tables your don't need, but trying
>>to extract just the normalizer is a real nightmare. I tried it
>>in the past, and abondonned: too tricky to maintain, and I
>>retried it recently (one mont
Phillipe wrote:
>>I hae tried several times to do it. It does not work: you may
>>effectively remove some tables your don't need, but trying
>>to extract just the normalizer is a real nightmare. I tried it
>>in the past, and abondonned: too tricky to maintain, and I
>>retried it recently (one mont
ubject: Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish
and Azeri, was: Accented ij ligatures)
> On Monday, July 14, 2003 5:34 AM, Mark Davis <[EMAIL PROTECTED]>
wrote:
>
> > ...
> > > Of course
> > > Java already includes some parts
On Monday, July 14, 2003 5:34 AM, Mark Davis <[EMAIL PROTECTED]> wrote:
> ...
> > Of course
> > Java already includes some parts of ICU, but other things are in
> > ICU4J are difficult now to integrate in Java, simply because IBM
> > forgot to modularize ICU so that it can be integrated slowly.
>
___
http://www.macchiato.com
► “Eppur si muove” ◄
- Original Message -
From: "Philippe Verdy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Saturday, July 12, 2003 14:45
Subject: Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish
a
On Saturday, July 12, 2003 4:17 PM, Jony Rosenne <[EMAIL PROTECTED]> wrote:
> What has "iw" to with Hebrew?
>
> I wasn't involved with the change, but I'm glad it was done. Java and
> other systems probably still use it because they never bothered to
> check the latest version of 639. I know for
► “Eppur si muove” ◄
- Original Message -
From: "Philippe Verdy" <[EMAIL PROTECTED]>
To: "Doug Ewell" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, July 12, 2003 00:27
Subject: Re: ISO 639 "duplicate" codes (was: Re: Ligatures
Michael Everson" <[EMAIL PROTECTED]> écrivit :
> At 08:11 -0400 2003-07-12, Patrick Andries wrote:
>
> >Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine
to
> >me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?
>
> Iwrit (iw), being a German transliteration of t
es (was: Re: Ligatures in
> Turkish and Azeri, was: Accented ij ligatures)
>
>
>
>
> Samedi 12 juillet à 6h51, Doug Ewell <[EMAIL PROTECTED]> écrivit :
>
> > The codes "iw" for Hebrew and "in" for Indonesian were deprecated
> > FOURTEEN YE
At 08:11 -0400 2003-07-12, Patrick Andries wrote:
Just out of curiosity, why was « iw » deprecated ? Seems perfectly fine to
me. And why was « he » chosen (Herero, Hemba, Hellenic Greek) ?
Iwrit (iw), being a German transliteration of the name of the Hebrew
language, and Jiddisch (ji) were both t
On 12/07/2003 04:18, Michael Everson wrote:
At 03:25 -0700 2003-07-12, Peter Kirk wrote:
Does anyone know of a good resource on the web, or elsewhere, listing
the alphabets used for different languages around the world? I know a
project was attempted a few years ago at least for Europe. It woul
Samedi 12 juillet à 6h51, Doug Ewell <[EMAIL PROTECTED]> écrivit :
> The codes "iw" for Hebrew and "in" for Indonesian were deprecated
> FOURTEEN YEARS AGO. It is not accurate or fair to refer to them as
> "duplicates" of "he" and "id". The Registration Authority deprecates
> such codes, rathe
At 03:25 -0700 2003-07-12, Peter Kirk wrote:
Does anyone know of a good resource on the web, or elsewhere,
listing the alphabets used for different languages around the world?
I know a project was attempted a few years ago at least for Europe.
It would be useful to have this kind of data availa
On 11/07/2003 11:18, Philippe Verdy wrote:
# T: special case for uppercase I and dotted uppercase I
#- For non-Turkic languages, this mapping is normally not used.
#- For Turkic languages (tr, az), this mapping can be used instead of the normal mapping for these characters.
Is that wh
On Saturday, July 12, 2003 6:51 AM, Doug Ewell <[EMAIL PROTECTED]> wrote:
> Philippe Verdy wrote:
>
> > Good luck with ISO language codes which does not even
> > define them, and contain many duplicate codes even in
> > the Alpha-2 space (he/iw, in/id), or unprecize codes
> > matching sometimes
> Where does the fact of saying that a Grapheme Disjoiner...
The character you should be referring to is not a new character GDJ, but
rather is the existing ZWNJ, the functions of which include prevention of
a ligature.
- Peter
---
Philippe Verdy wrote:
> Good luck with ISO language codes which does not even
> define them, and contain many duplicate codes even in
> the Alpha-2 space (he/iw, in/id), or unprecize codes
> matching sometimes very imprecize families of languages
> overlapping other language codes...
The codes "
On Friday, July 11, 2003 6:43 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> Agreed. But does Unicode actually treat them as non-normative samples?
Note clear here: the reference documents say that these tables are
normative for applications that want to implement a conforming
case folding. But UTR#
On 11/07/2003 08:51, Philippe Verdy wrote:
On Friday, July 11, 2003 3:50 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
So I hope that what is fixed by Unicode is the name not
of two languages but of an extensible family of scripts.
I think you speak about family of languages?
Not really. A se
On Friday, July 11, 2003 3:50 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> So I hope that what is fixed by Unicode is the name not
> of two languages but of an extensible family of scripts.
I think you speak about family of languages?
Good luck with ISO language codes which does not even
define th
On 11/07/2003 05:56, Philippe Verdy wrote:
Note also: the Soft_Dotted property was created and considered
specially for Turkish and Azeri.
Whatever it was that was specially created or adjusted for Turkish and
Azeri, was it specifically restricted to these two languages? These are
I think
On Friday, July 11, 2003 1:12 PM, Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > Note also: the Soft_Dotted property was created and considered
> > specially for Turkish and Azeri.
>
> Adding to the long, and unfortunately getting longer, list of
> misleading statements from Philippe! No, the reas
> Note also: the Soft_Dotted property was created and considered
> specially for Turkish and Azeri.
Adding to the long, and unfortunately getting longer, list of misleading
statements from Philippe! No, the reason for the Soft_Dotted property
was/is to mark which characters (regardless of langua
> "Peter" == Peter Kirk <[EMAIL PROTECTED]> writes:
Peter> Maybe, but it is hardly realistic to expect all existing
Peter> Turkish and Azeri text to be recoded to insert a character in
Peter> the middle of each f - i sequence.
But a lot of it already does do that. In TeX Turkish uses f{}i to
> > and Philippe Verdy responded with another question:
> >
> > > Isn't there a "Grapheme Disjoiner" format control character to
> > > force the absence of a ligature like , i.e. ?
> >
> > The answer to Philippe's rejoinder question is no, there is not
> > a "Grapheme Disjoiner" format control c
See also
http://www.microsoft.com/typography/developers/opentype/detail.htm
which explains how ligatures can be turned off on a language-dependent basis.
Laurentiu
Peter Kirk asked:
> In Turkish and Azeri the sequences f - i and f - dotless i both occur,
> and are fairly frequent. So it is inap
On 10/07/2003 11:37, Kenneth Whistler wrote:
At Peter pointed out, however, it is neither expected or reasonable
to have to go back through and drop in ZWNJ's at every relevant
location in existing Turkish or Azeri text, simply to prevent
fi ligation. Such use of ZWNJ is intended to be exceptional
Philippe Verdy scripsit:
> Where does the fact of saying that a Grapheme Disjoiner can be used
> in Turkish to avoid that the f collapses the dot above a next lowercase i?
It is settled that ZWNJ is the correct character to break ligatures.
ZWJ means "make a ligature if you can; if not, shape cha
On Thursday, July 10, 2003 8:37 PM, Kenneth Whistler <[EMAIL PROTECTED]> wrote:
> Peter Kirk asked:
>
> > > In Turkish and Azeri the sequences f - i and f - dotless i both
> > > occur, and are fairly frequent. So it is inappropriate in these
> > > languages to use fi ligatures in which the dot on
Peter Kirk asked:
> > In Turkish and Azeri the sequences f - i and f - dotless i both occur,
> > and are fairly frequent. So it is inappropriate in these languages to
> > use fi ligatures in which the dot on the i is lost or invisible, at
> > least where the second character is a dotted i. Has any
On Thursday, July 10, 2003 6:42 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> Anyway, I understood from the recent discussion of Hebrew that it is
> Unicode policy not to do anything which could theoretically invalidate
> existing text even if it could be proved that no such text existed.
Where doe
Peter Kirk wrote:
> Maybe, but it is hardly realistic to expect all existing Turkish and
Azeri text to be recoded to insert a character in the middle of each f -
i sequence.
Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar code
pages? I that case, it would be enough to add t
On 10/07/2003 09:34, Stefan Persson wrote:
Peter Kirk wrote:
> Maybe, but it is hardly realistic to expect all existing Turkish and
Azeri text to be recoded to insert a character in the middle of each f
- i sequence.
Aren't most Turkish and Azeri text coded as ISO-8859-9 and similar
code page
On Thursday, July 10, 2003 5:41 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> > Isn't there a "Grapheme Disjoiner" format control character to
> > force the absence of a ligature like , i.e. ?
> >
> Maybe, but it is hardly realistic to expect all existing Turkish and
> Azeri text to be recoded to i
On 10/07/2003 08:21, Philippe Verdy wrote:
In Turkish and Azeri the sequences f - i and f - dotless i both occur,
and are fairly frequent. So it is inappropriate in these languages to
use fi ligatures in which the dot on the i is lost or invisible, at
least where the second character is a dotted i
On Thursday, July 10, 2003 12:08 PM, Peter Kirk <[EMAIL PROTECTED]> wrote:
> On 1st July Philippe Verdy wrote:
>
> > If fonts still want to display dots on these characters, that's a
> > rendering problem: there already exists a lot of fonts used for
> > languages other than Turkish and Azeri, wh
On 1st July Philippe Verdy wrote:
If fonts still want to display dots on these characters, that's a
rendering problem: there already exists a lot of fonts used for
languages other than Turkish and Azeri, which do not display any
dot on a lowercase ASCII i or j (dotted), and display a dot on their
Maybe originally U+044B (cyrillic "y", "yery") was two separate letters,
but it is certainly considered and used as one letter in Cyrillic
languages today. Encoding it as two letters would be about as sensible
as insisting that w should be encoded as two u's or that i should be
encoded as dotl
On 2003.07.01, 15:09, Pim Blokland <[EMAIL PROTECTED]> wrote:
> Maybe it was a bad idea to include ? as a character in Unicode at all,
> but now it's there, there's no reason to ignore it when refining the
> rules, to deprecate it practically.
Food for thought: How would you compare U+0133 ("ij"
Kent Karlsson wrote:
>> Believe it or not, the IJ and ij digraphs *were* included for
>> compatibility with an 8-bit legacy character set (ISO 6937).
>
> 6937 is a multibyte encoding (one or two bytes per character).
> There are no combining characters at all in 6937, even though
> there is a com
> In either cases, the "Soft_Dotted" property is probably overkill on
> the existing or ligatures (should should have been better
There is no point in having a soft-dotted property for the capital
letter...
> named "letters" and not "ligatures") for Dutch. Or is this update
> needed to docume
> Believe it or not, the IJ and ij digraphs *were* included for
> compatibility with an 8-bit legacy character set (ISO 6937).
6937 is a multibyte encoding (one or two bytes per character).
There are no combining characters at all in 6937, even though
there is a common misunderstanding that there
Philippe Verdy wrote:
>> Maybe it was a bad idea to include ij as a character in Unicode at
>> all, but now it's there, there's no reason to ignore it when
>> refining the rules, to deprecate it practically.
>
> No, that was needed for correct Dutch support. Look at the case
> conversion of into
On Tuesday, July 01, 2003 4:09 PM, Pim Blokland <[EMAIL PROTECTED]> wrote:
> Maybe it was a bad idea to include ij as a character in Unicode at
> all, but now it's there, there's no reason to ignore it when
> refining the rules, to deprecate it practically.
No, that was needed for correct Dutch sup
Pim Blokland wrote:
When putting accents on the ij (which does happen!), the dots must
go. Simple as that.
Where should the accent be placed in that case? Should the accent be
centered over "ij"? Should there be one accent over "i" and then the
same over "j"? Or should the accent only be an ac
Michael Everson schreef:
> I think the answer is, regarding the soft dot property, please
leave
> the ij ligature alone.
And I think not.
When putting accents on the ij (which does happen!), the dots must
go. Simple as that.
Maybe it was a bad idea to include ij as a character in Unicode at
all, bu
On Tuesday, July 01, 2003 1:55 PM, Kent Karlsson <[EMAIL PROTECTED]> wrote:
> > My feeling about the proposed "Public Review" document should
> > exclude the ligature, waiting for the decision about the new
> > ligature approved in the first rounds by UTC and
> > waiting for approval by ISO JTC.
> > I don't know of any instances where a ij digraph would keep the dots
> > AND get additional accent marks, nor of any where the ij would
> > appear with a dotless i and dotless j and a single dot above,
> > centered between them. Can you give examples?
>
> No of course:
So why do you care?
>
I think the answer is, regarding the soft dot property, please leave
the ij ligature alone.
--
Michael Everson * * Everson Typography * * http://www.evertype.com
On Monday, June 30, 2003 9:13 PM, James H. Cloos Jr. <[EMAIL PROTECTED]> wrote:
> So if you want two dots and an acute use ‹ij, U+0308, U+0301›: ij̈́
>
> Of course a given font’s diaeresis will often not line up with the
> stems of its ij, and a custom one should be used instead. Or
> features an
> "Philippe" == Philippe Verdy <[EMAIL PROTECTED]> writes:
Philippe> But if one wants to restore the preious visual behavior,
Philippe> even if it's incorrect for languages using this digraph as a
Philippe> letter, what would be the behavior of using the following
Philippe> sequence:
Philippe
On Monday, June 30, 2003 1:58 PM, Pim Blokland <[EMAIL PROTECTED]> wrote:
> Philippe Verdy schreef:
>
> > Interesting issue for the Latin Small "ij" Ligature (U+0133):
> > Normally the Soft_Dotted issupposed to make disappear one dot when
> > there's and additional diacritic above, but many appli
Philippe Verdy schreef:
> Interesting issue for the Latin Small "ij" Ligature (U+0133):
> Normally the Soft_Dotted issupposed to make disappear one dot when
> there's and additional diacritic above, but many applications may
> keep these two dots above, fitting the diacritic in the middle.
>
> Thi
54 matches
Mail list logo