Re: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Doug Ewell
Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote: Every language, whose speaking community ever conteacted others, does it. , f.i., is the Chuvash name for neighbouring , which is probably still known in English as Gorky, a clumsy transcription of the 1934-1991 name .

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Well Outlook 2000 is unable to represent any e with ogonek and trema of your example. So, despite they are canonically equivalent, they are rendered differently: Everything rendered perfectly over here, on Windows 95 and Outlook

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Jacobi
[EMAIL PROTECTED] wrote: [...] Note that ß (sharp s) casefolds to ss, and Å¿ (long s) casefolds to s. So straße, straÅ¿se, and strasse also both map to the same (strasse) subname. [...] According to my Duden, sharp-s doesn't uppercases to SS, when it is in a name. So 'Großmann' and

Stability of scientific names, was Stability of WG2

2003-12-17 Thread Curtis Clark
on 2003-12-16 15:27 Peter Kirk wrote: I'm no expert on this... I am. :-) but I thought that species could be transferred from genus to genus as knowledge advances. As John pointed out, the epithet stays the same. And presumably obvious spelling mistakes are corrected (contrast FHTORA in

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Marco Cimarosti
Doug Ewell wrote: I'll go farther than that. It's always bothered me that speakers of European languages, including English but especially French, have seen fit to rename the cities and internal subdivisions of other countries. Rightly said! There is reason to rename Colonia to Kln, Augusta

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Kirk
On 16/12/2003 17:21, Kenneth Whistler wrote: Correcting myself: Note that none of the 3 sets of equivalence classes violates *canonical* equivalence, because none of the 8 sequences involved is canonically equivalent to any other. In other words, no matter which of the 3 approaches you take

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Kirk
On 16/12/2003 19:28, John Cowan wrote: Philippe Verdy scripsit: If we just remove any 0307 from the Turkic texts, there is absolutely no problem with Turkic CaseFolding, provided that we also define Turkic-specific uppercase mappings as done above, and don't use the default locale-neutral

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread jon
There's no reason to expect that there will be any 0307 whatever in Turkish/Azeri texts: it's not a diacritic those languages use, AFAIK. There's no reason to expect that there won't be, particularly if they quote a piece in a language which does use U+0307. -- Jon Hanna |

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Philippe Verdy
Doug Ewell Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Well Outlook 2000 is unable to represent any e with ogonek and trema of your example. So, despite they are canonically equivalent, they are rendered differently: Everything rendered perfectly over here, on Windows 95

Re: Stability of WG2

2003-12-17 Thread Peter Kirk
On 16/12/2003 19:58, John Cowan wrote: Peter Kirk scripsit: I'm no expert on this... but I thought that species could be transferred from genus to genus as knowledge advances. True enough, but the specific epithet remains the same, and the old names are still available (as the jargon

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Kirk
On 16/12/2003 14:59, Kent Karlsson wrote: ... Peter Kirk wrote: If the Swedish registry allows all the letters used in Swedish and Sami, and far eastern registries allow Chinese characters, the Turkish and Azerbaijani registries should allow, and be allowed to allow, all the letters of the

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread jon
Quoting Marco Cimarosti [EMAIL PROTECTED]: Doug Ewell wrote: I'll go farther than that. It's always bothered me that speakers of European languages, including English but especially French, have seen fit to rename the cities and internal subdivisions of other countries. Rightly said!

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Philippe Verdy
Peter Kirk wrote: This implies (since there are no decomposition exclusions) that NFD, used on Turkic text, violates the very sensible rule DO NOT USE COMBINING DOTS WITH I's, and leads to all sorts of potential confusion e.g. that both simple and full case folding and lowercasing applied

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Philippe Verdy
Marco Cimarosti wrote: Doug Ewell wrote: I'll go farther than that. It's always bothered me that speakers of European languages, including English but especially French, have seen fit to rename the cities and internal subdivisions of other countries. Rightly said! There is reason to

Re: Stability of scientific names, was Stability of WG2

2003-12-17 Thread Alexander Savenkov
Hello, 2003-12-17T11:06:32Z Curtis Clark [EMAIL PROTECTED] wrote: on 2003-12-16 15:27 Peter Kirk wrote: I'm no expert on this... I am. :-) but I thought that species could be transferred from genus to genus as knowledge advances. As John pointed out, the epithet stays the same.

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread jarkko.hietaniemi
Or even Aix-la-Chapelle to Aachen because that's its _current_ German name (the French name was official in the history, and is still used in French). You better tell the Bundespost about this :-) AFAIK (not being a German) Aachen is very much the current German name. (go to

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Kent Karlsson
The difference here is that Germans recognise ss and sharp s as variant spellings in the same words, Not altogether, taking into account spelling rules. They are *ordered* the same, but that is another matter. whereas in Turkish i and dotless i are quite different letters, just as in

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Arcane Jill
Far be it from me to stir things up even further, but... QUESTION - Is the rendering of {U+0065} {U+0302} (that's i, combining circumflex above) locale-dependent? I may have got this totally wrong, but it occurs to me that in non-Turkic fonts, U+0065 is "soft-dotted". That is, the dot

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Michael Everson
At 11:30 + 2003-12-17, [EMAIL PROTECTED] wrote: I doubt Christians mean offence when they refer to Jesus through any of the countless transcriptions, spellings and pronunciations used in various languages. It's odd that in English Judas and Jude are distinguished; in the original they are

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Michael Everson
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote: There is reason to rename Colonia to Köln, Augusta to Augsburg, Eboraco to York, Provincia to Provence, and so on. Nicely said. Subtle irony tends to go over some people's heads on this list though. Eboraco is called Eabhrac in Irish. :-) --

Re[2]: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Alexander Savenkov
Hello, 2003-12-17T14:36:37Z Philippe Verdy [EMAIL PROTECTED] wrote: Marco Cimarosti wrote: Doug Ewell wrote: I'll go farther than that. It's always bothered me that speakers of European languages, including English but especially French, have seen fit to rename the cities and internal

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Kent Karlsson
[resending; better set the encoding to UTF-8...] Peter Kirk wrote: ... used on Turkic text, violates the very sensible rule DO NOT USE COMBINING DOTS WITH I's, and leads to all sorts of potential confusion e.g. that both simple and full case folding and lowercasing applied to NFD

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Arcane Jill
Would it not make more sense to have not two, but three different kinds of lowercase i: non-dotted i, soft-dotted i and hard-dotted i?. (And similarly for uppercase). Of course, then you might as well invent COMBINING SOFT DOT ABOVE so we can use it elsewhere. I should have mentioned

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Kent Karlsson
Peter Kirk wrote: ... used on Turkic text, violates the very sensible rule DO NOT USE COMBINING DOTS WITH I's, and leads to all sorts of potential confusion e.g. that both simple and full case folding and lowercasing applied to NFD Turkic text generate the nonsensical i, dot above.

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Kent Karlsson
Philippe Verdy wrote: I do hope that dotless-j and dotted-J ... Dotless j. That's in the works. A precomposed dotted uppercase J? No, I think I can predict that there will be no such encoded character. If you want a dotted uppercase J, use J, combining-dot-above. /kent k

Arabic Presentation Forms-A

2003-12-17 Thread Philippe Verdy
I was validating some internal processing of strings, and I found these intrigating decompositions for Arabic Presentation forms-A. I was surprised to see that they are compatibility decomposed in (isolated) rows from bottom to top, in a distinct reading order from normal Arabic reading order for

Re: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread jcowan
Alexander Savenkov scripsit: You mixed everything up, Phillippe. As we say in America, General Grant [1822-1885] Still Dead. -- Do what you will, John Cowan this Life's a Fiction[EMAIL PROTECTED] And is made up of

June Ashton 1999 thesis U Sydney

2003-12-17 Thread Elaine Keown
Elaine Keown in Austin Hi, I wanted to bring the following dissertation--listed at the bottom--to the attention of the e-discussion groups. I'm going to try to have some American research library or University Microfilms make it available here in the U.S. Apparently Dr. Ashton,

Re: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread jcowan
Michael Everson scripsit: It's odd that in English Judas and Jude are distinguished; in the original they are not. Or for that matter that Jesus and Joshua are distinguished, but here we can lay the blame on Greek vs. Hebrew. -- Well, I'm back. --SamJohn Cowan [EMAIL PROTECTED]

RE: [OT] CJK - CJC (Re: Corea?)

2003-12-17 Thread Marco Cimarosti
Michael Everson wrote: At 11:04 +0100 2003-12-17, Marco Cimarosti wrote: There is reason to rename Colonia to Köln, Augusta to Augsburg, Eboraco to York, Provincia to Provence, and so on. Nicely said. Subtle irony tends to go over some people's heads on this list though. Especially if

Re: Stability of WG2

2003-12-17 Thread Doug Ewell
Peter Kirk peterkirk at qaya dot org wrote: Nobody would call chimps Homo troglodytes, or orangs Simia satyrus, today, but those names can't ever be assigned to other species in future. (If chimps were folded into Homo, they would be H. troglodytes again.) And that is more or less what I

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Kirk
On 17/12/2003 05:24, Kent Karlsson wrote: ... There was never an intent do deny Turkey anything. The thing was that the uppercase of i is I (usually) and the uppercase of is also I, so i, I, and used to be folded together (to i) in the drafts for IDN. Apparently that was deemed to harsh and

RE: Arabic Presentation Forms-A

2003-12-17 Thread Marco Cimarosti
Philippe Verdy wrote: #code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?; # RIAL SIGN fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?; The Arial Unicode MS font does not have a glyph for the Rial currency sign so I won't comment lots about it, even if it's a special ligature of

Cuneiform Base Signs Plus Modifiers

2003-12-17 Thread Dean Snyder
[I am sending this email to both the Initiative for Cuneiform Encoding email list, [EMAIL PROTECTED], and the general Unicode email list, [EMAIL PROTECTED], in order to get comments from both the cuneiform and Unicode communities.] From the very first Initiative for Cuneiform Encoding conference

Re: Stability of WG2

2003-12-17 Thread Jim Allan
Doug Ewell wrote: But apparently, for whatever reason, it IS very important to some programmers and programs, and they have made it very clear for years and years now that the names *must not change* in the interest of stability. On the other hand, there is nothing to prevent the Unicode

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Peter Kirk
On 17/12/2003 05:30, Arcane Jill wrote: Far be it from me to stir things up even further, but... QUESTION - Is the rendering of {U+0065} {U+0302} (that's i, combining circumflex above) locale-dependent? I may have got this totally wrong, but it occurs to me that in non-Turkic fonts, U+0065 is

RE: Arabic Presentation Forms-A

2003-12-17 Thread Philippe Verdy
Philippe Verdy wrote: #code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?; # RIAL SIGN fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?; I should have disabled temporarily my email filter to send this one. All UTF-8 codes were replaced by ISO-8859-1 characters, substituing '?'

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Philippe Verdy
Peter Kirk wrote: Conclusion: the right thing even for Turkish is to drop the dot on i before a circumflex. I agree. The letter is rare enough to not create an exception here for the removal of dot on the soft-dotted i followed by circumflex (which is needed much more often in other languages

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Chris Jacobs
To display a dot, one can use one of the four canonical eqquivalents: LATIN CAPITAL LETTER I WITH DOT ABOVE, COMBINING CIRCUMFLEX LATIN CAPITAL LETTER I WITH CIRCUMFLEX, COMBINING DOT ABOVE LATIN CAPITAL LETTER I, COMBINING DOT ABOVE, COMBINING CIRCUMFLEX LATIN CAPITAL LETTER I, COMBINING

RE: Case mapping of dotless lowercase letters

2003-12-17 Thread Philippe Verdy
Chris Jacobs wrote: To display a dot, one can use one of the four canonical eqquivalents: LATIN CAPITAL LETTER I WITH DOT ABOVE, COMBINING CIRCUMFLEX LATIN CAPITAL LETTER I WITH CIRCUMFLEX, COMBINING DOT ABOVE LATIN CAPITAL LETTER I, COMBINING DOT ABOVE, COMBINING CIRCUMFLEX LATIN

American English translation of character names (was Re: Stability of WG2)

2003-12-17 Thread Kenneth Whistler
Jim Allan noted: On the other hand, there is nothing to prevent the Unicode consortium or any other body or any single person from creating a new *additional* corrected set of names if the Unicode consortium or any other body or any single person wishes to do so. That would just be an

Re: Arabic Presentation Forms-A

2003-12-17 Thread Kenneth Whistler
Philippe asked: The Arial Unicode MS font does not have a glyph for the Rial currency sign so I won't comment lots about it, even if it's a special ligature of its component letters: it's just regrettable that it's not found in Arial Unicode MS (unless this Rial sign is traditional and no

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Christopher John Fynn
However, could there be an encoding for: LATIN CAPITAL LETTER DOTLESS J with a lowercase mapping to the new: LATIN SMALL LETTER DOTLESS J Of course the former would look exactly the same as the ASCII uppercase J, except that it would have a distinct case mapping. This would avoid, for j/J

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread Christopher John Fynn
Philippe Verdy [EMAIL PROTECTED] wrote: Ohhh... I admit this is hypothetic for a possible use, but the candrabindu case is a precedent coming from romanization of non-Latin scripts: what if there's a combining x above used to interact over a diacritic and mark its suppression in corrected

Re: Cuneiform Base Signs Plus Modifiers

2003-12-17 Thread Christopher John Fynn
Dean Snyder [EMAIL PROTECTED] wrote: Recently I have had second thoughts about encoding complex signs. Modification of base, or simple, signs was a productive process for making new signs in the earlier periods of cuneiform usage, and included such modifications as adding or subtracting

Re: Stability of WG2

2003-12-17 Thread Christopher John Fynn
Jim Allan [EMAIL PROTECTED] wrote: On the other hand, there is nothing to prevent the Unicode consortium or any other body or any single person from creating a new *additional* corrected set of names if the Unicode consortium or any other body or any single person wishes to do so. That

Re: Case mapping of dotless lowercase letters

2003-12-17 Thread John Cowan
Christopher John Fynn scripsit: It introduces another difficulty though - If there are languages using a LATIN SMALL LETTER DOTLESS J There aren't. Dotless j as a character (as opposed to a glyph used with various accents above) is only used in non-IPA phonetic alphabets. I think Latin has