Anto'nio Martins-Tuva'lkin antonio at tuvalkin dot web dot pt wrote:
Every language, whose speaking community ever conteacted others, does
it. , f.i., is the Chuvash name for neighbouring
, which is probably still known in English as Gorky, a clumsy
transcription of the 1934-1991 name .
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
Well Outlook 2000 is unable to represent any e with ogonek and trema
of your example. So, despite they are canonically equivalent, they are
rendered differently:
Everything rendered perfectly over here, on Windows 95 and Outlook
[EMAIL PROTECTED] wrote:
[...]
Note that ß (sharp s) casefolds to ss, and Å¿ (long s) casefolds to s. So
straße, straÅ¿se, and strasse also both map to the same (strasse)
subname.
[...]
According to my Duden, sharp-s doesn't uppercases to SS, when it is in
a name. So 'Großmann' and
on 2003-12-16 15:27 Peter Kirk wrote:
I'm no expert on this...
I am. :-)
but I thought that species could be transferred
from genus to genus as knowledge advances.
As John pointed out, the epithet stays the same.
And presumably obvious
spelling mistakes are corrected (contrast FHTORA in
Doug Ewell wrote:
I'll go farther than that. It's always bothered me that speakers of
European languages, including English but especially French, have seen
fit to rename the cities and internal subdivisions of other countries.
Rightly said!
There is reason to rename Colonia to Kln, Augusta
On 16/12/2003 17:21, Kenneth Whistler wrote:
Correcting myself:
Note that none of the 3 sets of equivalence classes violates
*canonical* equivalence, because none of the 8 sequences involved
is canonically equivalent to any other. In other words, no matter
which of the 3 approaches you take
On 16/12/2003 19:28, John Cowan wrote:
Philippe Verdy scripsit:
If we just remove any 0307 from the Turkic texts, there is absolutely no
problem with Turkic CaseFolding, provided that we also define
Turkic-specific uppercase mappings as done above, and don't use the default
locale-neutral
There's no reason to expect that there will be any 0307 whatever in
Turkish/Azeri texts: it's not a diacritic those languages use, AFAIK.
There's no reason to expect that there won't be, particularly if they quote a
piece in a language which does use U+0307.
--
Jon Hanna |
Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:
Well Outlook 2000 is unable to represent any e with ogonek and trema
of your example. So, despite they are canonically equivalent, they are
rendered differently:
Everything rendered perfectly over here, on Windows 95
On 16/12/2003 19:58, John Cowan wrote:
Peter Kirk scripsit:
I'm no expert on this... but I thought that species could be transferred
from genus to genus as knowledge advances.
True enough, but the specific epithet remains the same, and the old names
are still available (as the jargon
On 16/12/2003 14:59, Kent Karlsson wrote:
...
Peter Kirk wrote:
If the Swedish registry allows all the letters used in Swedish and Sami,
and far eastern registries allow Chinese characters, the Turkish and
Azerbaijani registries should allow, and be allowed to allow, all the
letters of the
Quoting Marco Cimarosti [EMAIL PROTECTED]:
Doug Ewell wrote:
I'll go farther than that. It's always bothered me that speakers of
European languages, including English but especially French, have seen
fit to rename the cities and internal subdivisions of other countries.
Rightly said!
Peter Kirk wrote:
This implies (since there are no decomposition exclusions) that NFD,
used on Turkic text, violates the very sensible rule DO NOT USE
COMBINING DOTS WITH I's, and leads to all sorts of potential confusion
e.g. that both simple and full case folding and lowercasing applied
Marco Cimarosti wrote:
Doug Ewell wrote:
I'll go farther than that. It's always bothered me that speakers of
European languages, including English but especially French, have seen
fit to rename the cities and internal subdivisions of other countries.
Rightly said!
There is reason to
Hello,
2003-12-17T11:06:32Z Curtis Clark [EMAIL PROTECTED] wrote:
on 2003-12-16 15:27 Peter Kirk wrote:
I'm no expert on this...
I am. :-)
but I thought that species could be transferred
from genus to genus as knowledge advances.
As John pointed out, the epithet stays the same.
Or even Aix-la-Chapelle to Aachen because that's its _current_ German name (the
French name was official in the history, and is still used in French).
You better tell the Bundespost about this :-) AFAIK (not being a German)
Aachen is very much the current German name.
(go to
The difference here is that Germans recognise ss and sharp s
as variant spellings in the same words,
Not altogether, taking into account spelling rules.
They are *ordered* the same, but that is another matter.
whereas in Turkish i and dotless i are
quite different letters, just as in
Far be it from me to stir things up even further, but...
QUESTION - Is the rendering of {U+0065} {U+0302} (that's i,
combining circumflex above) locale-dependent?
I may have got this totally wrong, but it occurs to me that in
non-Turkic fonts, U+0065 is "soft-dotted". That is, the dot
At 11:30 + 2003-12-17, [EMAIL PROTECTED] wrote:
I doubt Christians mean offence when they refer to Jesus through any of the
countless transcriptions, spellings and pronunciations used in various
languages.
It's odd that in English Judas and Jude are distinguished; in the
original they are
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote:
There is reason to rename Colonia to Köln, Augusta to Augsburg,
Eboraco to York, Provincia to Provence, and so on.
Nicely said. Subtle irony tends to go over some
people's heads on this list though.
Eboraco is called Eabhrac in Irish. :-)
--
Hello,
2003-12-17T14:36:37Z Philippe Verdy [EMAIL PROTECTED] wrote:
Marco Cimarosti wrote:
Doug Ewell wrote:
I'll go farther than that. It's always bothered me that speakers of
European languages, including English but especially French, have seen
fit to rename the cities and internal
[resending; better set the encoding to UTF-8...]
Peter Kirk wrote:
...
used on Turkic text, violates the very sensible rule DO NOT USE
COMBINING DOTS WITH I's, and leads to all sorts of potential
confusion
e.g. that both simple and full case folding and lowercasing
applied to
NFD
Would it not make more sense to have not two, but three
different kinds of lowercase i: non-dotted i, soft-dotted
i and hard-dotted i?. (And similarly for uppercase). Of
course, then you might as well invent COMBINING SOFT DOT ABOVE so we
can use it elsewhere.
I should have mentioned
Peter Kirk wrote:
...
used on Turkic text, violates the very sensible rule DO NOT USE
COMBINING DOTS WITH I's, and leads to all sorts of potential
confusion
e.g. that both simple and full case folding and lowercasing
applied to
NFD Turkic text generate the nonsensical i, dot above.
Philippe Verdy wrote:
I do hope that dotless-j and dotted-J ...
Dotless j. That's in the works.
A precomposed dotted uppercase J? No, I think I can predict
that there will be no such encoded character. If you want a
dotted uppercase J, use J, combining-dot-above.
/kent k
I was validating some internal processing of strings, and I found these
intrigating decompositions for Arabic Presentation forms-A. I was surprised
to see that they are compatibility decomposed in (isolated) rows from bottom
to top, in a distinct reading order from normal Arabic reading order for
Alexander Savenkov scripsit:
You mixed everything up, Phillippe.
As we say in America, General Grant [1822-1885] Still Dead.
--
Do what you will, John Cowan
this Life's a Fiction[EMAIL PROTECTED]
And is made up of
Elaine Keown
in Austin
Hi,
I wanted to bring the following dissertation--listed
at the bottom--to the attention of the e-discussion
groups. I'm going to try to have some American
research library or University Microfilms make it
available here in the U.S.
Apparently Dr. Ashton,
Michael Everson scripsit:
It's odd that in English Judas and Jude are distinguished; in the
original they are not.
Or for that matter that Jesus and Joshua are distinguished, but here we
can lay the blame on Greek vs. Hebrew.
--
Well, I'm back. --SamJohn Cowan [EMAIL PROTECTED]
Michael Everson wrote:
At 11:04 +0100 2003-12-17, Marco Cimarosti wrote:
There is reason to rename Colonia to Köln, Augusta to
Augsburg,
Eboraco to York, Provincia to Provence, and so on.
Nicely said. Subtle irony tends to go over some
people's heads on this list though.
Especially if
Peter Kirk peterkirk at qaya dot org wrote:
Nobody would call chimps Homo troglodytes, or orangs Simia satyrus,
today, but those names can't ever be assigned to other species in
future. (If chimps were folded into Homo, they would be H.
troglodytes again.)
And that is more or less what I
On 17/12/2003 05:24, Kent Karlsson wrote:
...
There was never an intent do deny Turkey anything. The thing was that
the uppercase of i is I (usually) and the uppercase of is also I, so i, I,
and used to be folded together (to i) in the drafts for IDN. Apparently
that was deemed to harsh and
Philippe Verdy wrote:
#code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?;
# RIAL SIGN
fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?;
The Arial Unicode MS font does not have a glyph for the
Rial currency sign so I won't comment lots about it, even if
it's a special ligature of
[I am sending this email to both the Initiative for Cuneiform Encoding
email list, [EMAIL PROTECTED], and the general Unicode email list,
[EMAIL PROTECTED], in order to get comments from both the cuneiform and
Unicode communities.]
From the very first Initiative for Cuneiform Encoding conference
Doug Ewell wrote:
But apparently, for whatever reason, it IS very important to some
programmers and programs, and they have made it very clear for years and
years now that the names *must not change* in the interest of stability.
On the other hand, there is nothing to prevent the Unicode
On 17/12/2003 05:30, Arcane Jill wrote:
Far be it from me to stir things up even further, but...
QUESTION - Is the rendering of {U+0065} {U+0302} (that's i, combining
circumflex above) locale-dependent?
I may have got this totally wrong, but it occurs to me that in
non-Turkic fonts, U+0065 is
Philippe Verdy wrote:
#code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?;
# RIAL SIGN
fdfc;;;isolated 0631 06cc 0627 0644; # ??; ?; ?;
I should have disabled temporarily my email filter to send this one. All
UTF-8 codes were replaced by ISO-8859-1 characters, substituing '?'
Peter Kirk wrote:
Conclusion: the right thing even for Turkish is to drop the dot on i
before a circumflex.
I agree. The letter is rare enough to not create an exception here for
the removal of dot on the soft-dotted i followed by circumflex (which
is needed much more often in other languages
To display a dot, one can use one of the four canonical eqquivalents:
LATIN CAPITAL LETTER I WITH DOT ABOVE, COMBINING CIRCUMFLEX
LATIN CAPITAL LETTER I WITH CIRCUMFLEX, COMBINING DOT ABOVE
LATIN CAPITAL LETTER I, COMBINING DOT ABOVE, COMBINING CIRCUMFLEX
LATIN CAPITAL LETTER I, COMBINING
Chris Jacobs wrote:
To display a dot, one can use one of the four canonical eqquivalents:
LATIN CAPITAL LETTER I WITH DOT ABOVE, COMBINING CIRCUMFLEX
LATIN CAPITAL LETTER I WITH CIRCUMFLEX, COMBINING DOT ABOVE
LATIN CAPITAL LETTER I, COMBINING DOT ABOVE, COMBINING CIRCUMFLEX
LATIN
Jim Allan noted:
On the other hand, there is nothing to prevent the Unicode consortium or
any other body or any single person from creating a new *additional*
corrected set of names if the Unicode consortium or any other body or
any single person wishes to do so.
That would just be an
Philippe asked:
The Arial Unicode MS font does not have a glyph for the Rial currency sign
so I won't comment lots about it, even if it's a special ligature of its
component letters:
it's just regrettable that it's
not found in Arial Unicode MS (unless this Rial sign is traditional and no
However, could there be an encoding for:
LATIN CAPITAL LETTER DOTLESS J
with a lowercase mapping to the new:
LATIN SMALL LETTER DOTLESS J
Of course the former would look exactly the same as the
ASCII uppercase J, except that it would have a distinct
case mapping. This would avoid, for j/J
Philippe Verdy [EMAIL PROTECTED] wrote:
Ohhh... I admit this is hypothetic for a possible use, but the candrabindu
case is a precedent coming from romanization of non-Latin scripts: what if
there's a combining x above used to interact over a diacritic and mark its
suppression in corrected
Dean Snyder [EMAIL PROTECTED] wrote:
Recently I have had second thoughts about encoding complex signs.
Modification of base, or simple, signs was a productive process for
making new signs in the earlier periods of cuneiform usage, and included
such modifications as adding or subtracting
Jim Allan [EMAIL PROTECTED] wrote:
On the other hand, there is nothing to prevent the Unicode consortium or
any other body or any single person from creating a new *additional*
corrected set of names if the Unicode consortium or any other body or
any single person wishes to do so.
That
Christopher John Fynn scripsit:
It introduces another difficulty though - If there are languages using a
LATIN SMALL LETTER DOTLESS J
There aren't. Dotless j as a character (as opposed to a glyph used with
various accents above) is only used in non-IPA phonetic alphabets.
I think Latin has
47 matches
Mail list logo