On Fri, Dec 21, 2012 at 4:56 AM, Leif Halvard Silli <xn--mlform-...@xn--mlform-iua.no> wrote: > > You say that the difference is primary in the beginning of a word but > elsewhere secondary. And yes, that orthographic dictionary that you > link to above, looks as you describe. > > However, in reality, the difference is secondary - if that is the right > word - even as the first letter in a word. Wikipedia has the following > example: едок > ёж > ездит.[1] And, for instance the word ёлка could > also be written елка.
> [1] <http://en.wikipedia.org/wiki/Ё#Russian> Wikipedia's example is sadly unsourced, unlike mine. > Hence I would argue that the dictionary you linked to above considers > the difference to *always* be secondary. It is just that the dictionary > applies the sorting algorithm to a collection where the words that > begins with the letter Ё has been separated from words that begins on > the letter Е. Isn't that notionally the same as having the difference primary for the first letter? >> A cursory scan of the UCA doesn't reveal if that's implementable, and >> experiments in a fairly fresh Linux Mint yield either >> ель < ёлка < тель < тёлка or ель < тель < тёлка < ёлка depending on >> the LANG setting (en_US works better than ru_RU). > > (Both examples consider the difference primary, but the the last > example is incorrect as the ёлка follows after the тёлка - which is > incorrect from every angle (except from the angle of the number of the > letter inside Unicode.) Right. And, ironically, the [en] collation is the correct one. >> Could someone tell if the UCA in its current form is able to support that? > > Is there not a need for 3 kinds of sorting? Namely: a) Е/Ё as always > distinct letters, b) Е/Ё as always non-distinct letters, c) Е/Ё as > non-distinct letters except when used as the first letter. (Note that > the last variant would only be yield correct result on collections of > words where a first-letter Ё is guaranteed be rendered with a Ё. Thus, > if ёлка is written елка, then the result becomes incorrect.) We're not talking here about *words per se* that may or may not be rendered with a Ё, we're talking about letter sequences with Ё as a given. The dictionary order shows that all word-initial Ёs go after all word-initial Еs, but within a word the difference is secondary. For a set of letter sequences using canonical spelling of words, the collation algorithm should give their dictionary ordering, shouldn't it? Re the linguistic PS: you're right, and that proves that an approximation to the proper collation using secondary ordering is preferred to an approximation using primary ordering. Leo