Re: UCA and Russian letter Ё

2012-12-30 Thread Leo Broukhis
On Wed, Dec 26, 2012 at 11:18 AM, Whistler, Ken ken.whist...@sap.com wrote: Leo asked: My question was narrower: assuming that the strings being compared are words, could it be supported without any markup? ... where it refers to conditional weighting based on the (identified) word

RE: UCA and Russian letter Ё

2012-12-26 Thread Whistler, Ken
The UCA algorithm itself has no opinion on this issue. It is simply a specification of *how* to compare strings at multiple levels, given a multi-level collation weight table. The UCA *does* have a default behavior, of course, based on the DUCET table. And the DUCET table puts all Unicode

RE: UCA and Russian letter Ё

2012-12-26 Thread Whistler, Ken
Leo asked: My question was narrower: assuming that the strings being compared are words, could it be supported without any markup? ... where it refers to conditional weighting based on the (identified) word boundary. And the answer to that is no, unless the word boundary was explicitly

Re: UCA and Russian letter Ё

2012-12-23 Thread Otto Stolz
Hello, Leo Broukhis hatte geschrieben: In Russian, the difference between Е and Ё is primary at the beginning of a word as they are considered distinct letters of the alphabet, yet secondary in the middle of a word, as the dieresis over Ё is not mandatory. As an example, ель ёлка, but тёлка

Re: UCA and Russian letter Ё

2012-12-23 Thread Leif H Silli
Ken, A basic question: does the UCA algorithm consider the Russian Ye and the Russian Yo as equal with regard to sort order? Or is it not meant to solve that issue? Leif Halvard Silli --- Opprinnelig melding --- Fra: Whistler, Ken ken.whist...@sap.com Til: l...@mailcom.com,

Re: UCA and Russian letter Ё

2012-12-23 Thread Philippe Verdy
My opinion is that BOTH the UCA algorithm AND the LDML formal decription of collations are just Best known practices to accomodate the collation (i.e. dictionary ordering AND string searches AND string comparisons). But neither of them can accomodate all possible orders or weak comparisons

Re: UCA and Russian letter Ё

2012-12-22 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 1:49 PM, Whistler, Ken ken.whist...@sap.com wrote: Leo Broukhis said: Granted, not yet, but by itself the argument is invalid. Unicode collation rules are descriptive; I'm not sure what you mean by that. UTS #10 is a *specification* of an algorithm, with various

UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
In Russian, the difference between Е and Ё is primary at the beginning of a word as they are considered distinct letters of the alphabet, yet secondary in the middle of a word, as the dieresis over Ё is not mandatory. As an example, ель ёлка, but тёлка тель, see

Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 01:31:18 -0800: In Russian, the difference between Е and Ё is primary at the beginning of a word as they are considered distinct letters of the alphabet, yet secondary in the middle of a word, as the dieresis over Ё is not mandatory. As an example, ель ёлка,

Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
[Philippe tells me that his message that I'm quoting could have been rejected by the mailing list as spam; my answer is below.] On Fri, Dec 21, 2012 at 5:13 AM, Philippe Verdy verd...@wanadoo.fr wrote: This is an interesting case. A solution would be to be able define a distinct collation

Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 4:56 AM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: You say that the difference is primary in the beginning of a word but elsewhere secondary. And yes, that orthographic dictionary that you link to above, looks as you describe. However, in reality, the

Re: UCA and Russian letter Ё

2012-12-21 Thread Markus Scherer
Resending my earlier reply. Apparently, by default, Gmail sends subject lines in KOI8-R if they contain Cyrillic, and unicode.org rejects those as likely spam. I just changed my Gmail settings to Use Unicode (UTF-8) encoding for outgoing messages and hope this goes through. (*Please change the

Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 08:57:11 -0800: On Fri, Dec 21, 2012 at 4:56 AM, Leif Halvard Silli wrote: You say that the difference is primary in the beginning of a word but elsewhere secondary. And yes, that orthographic dictionary that you link to above, looks as you describe.

Re: UCA and Russian letter Ё

2012-12-21 Thread Jukka K. Korpela
2012-12-21 21:05, Leif Halvard Silli wrote: My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian Dictionary from 2003 agree that both list words on Ё and Е under the same category – namely, under the letter Е. This appears to be the case in any serious dictionary. The use of

RE: UCA and Russian letter Ё

2012-12-21 Thread Joe
Fact is, again, that ёлка - in the wild - can be written ёлка and елка Though you need a better dictionary: it's the diminutive of ель (as in Yel'tsin) meaning fir tree, and is the 4-letter word for Christmas tree. С Рождеством, Joe

Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 11:35 AM, Jukka K. Korpela jkorp...@cs.tut.fi wrote: 2012-12-21 21:05, Leif Halvard Silli wrote: My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian Dictionary from 2003 agree that both list words on Ё and Е under the same category – namely, under the

Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Jukka K. Korpela, Fri, 21 Dec 2012 21:35:16 +0200: 2012-12-21 21:05, Leif Halvard Silli wrote: My Moscow Russian-Norwegian from 1987 and my Pocket Oxford Russian Dictionary from 2003 agree that both list words on Ё and Е under the same category – namely, under the letter Е. This appears

Re: UCA and Russian letter Ё

2012-12-21 Thread Leo Broukhis
On Fri, Dec 21, 2012 at 1:08 PM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: In «Tolkovïj slovar’ sovremennogo russkogo jazïka» from 2005 («Dictionary over contempary Russian language»), has located words on Ё in its a separate category, consisting of exactly one word: Ёмкость.

RE: UCA and Russian letter Ё

2012-12-21 Thread Whistler, Ken
Leo Broukhis said: Granted, not yet, but by itself the argument is invalid. Unicode collation rules are descriptive; I'm not sure what you mean by that. UTS #10 is a *specification* of an algorithm, with various options for tailoring and parameterization which make it possible to

RE: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Joe, Fri, 21 Dec 2012 12:48:47 -0800: Fact is, again, that ёлка - in the wild - can be written ёлка and елка Though you need a better dictionary: it's the diminutive of ель (as in Yel'tsin) meaning fir tree, and is the 4-letter word for Christmas tree. The dictionary of Dal,[1] says:

Re: UCA and Russian letter Ё

2012-12-21 Thread Leif Halvard Silli
Leo Broukhis, Fri, 21 Dec 2012 13:43:14 -0800: On Fri, Dec 21, 2012 at 1:08 PM, Leif Halvard Silli xn--mlform-...@xn--mlform-iua.no wrote: In «Tolkovïj slovar’ sovremennogo russkogo jazïka» from 2005 («Dictionary over contempary Russian language»), has located words on Ё in its a separate