I'll try to pick out the relevant points. > Please do. Do you really want all those letters > between "e" and "f" interfiled with "e"? I surely > do not.
You seem to have a misperception of what I think we should be looking at. What I think we should be examining is which of the items that are not interfiled (to use your phrasing) should be, if any. I don't think everything should be. In particular, I think John's list is the list we should be focusing on. > John's list? That's was in my original mail, that you were commenting on when you changed the subject line, but which you didn't apparently didn't bother to actually read. Here is the text: >> If you look at John's suggested file for diacritic >> folding(http://www.ccil.org/~cowan/DiacriticFolding.txt), there are quite a >> number that are not reflected in the UCA. > My point is made here. It is really only in > initial position where this is likely to be > noticed. This is incorrect. It will make a difference in other positions. Sorting "SÃren" after "Sozar" in a long list, if someone isn't expecting it, will cause problems. They look for it after "Soret", don't see it on the page, and assume it isn't there; fooled by the fact that it is on a completely different page. Remember that the collation sequence is also used for language-sensitive matching as well as sorting. > What I want is the status quo, however. > Leave the template and its principles alone. Stability is important, and we want to consider that very carefully before making any change. However, I believe that the current way we handle a few characters in UCA is distinctly suboptimal, and worth considering. âMark ----- Original Message ----- From: "Michael Everson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Friday, July 09, 2004 13:25 Subject: Re: Changing UCA primarly weights (bad idea) > Mark, your examples are all of the > run-of-the-mill Scandinavian variety. Trotting > out Polish and Danish doesn't address the issue. > The issue is all the phonetic characters, and > all the African ones (for instance). > > > > 1) it destabilizes the default tailorable template of ISO/IEC 14651 > > > and the UCA which has been published for some time. Anyone who *has* > > > tailored it would have to do all that work all over again. > > > >You are certainly right that this is not a slam-dunk; > > This noun must have been on TV a lot in the US > recently; I have seen it a lot but it remains > obscure, apart from being a basketball reference. > What does it mean? That I am right that the > proposal is not a shoo-in? Or, indeed, that I am > right that it is not a foregone conclusion that > the proposal will be accepted? > > >there are reasons for > >and against it. And it may well be that the committee decides against it. > > There are two templates, which are synchronized, > and decided about by two committees. > > >What we actually did was to put similar letters > >near other letters, *and if their decompositions > >were the same* we interfiled them. > > I remember. I was on the committee that helped to decide these things. > > >There is, however, little principled difference > >between Ã,  ,  , Ã, Ã, ?, and à that would > >cause a user to think that the some should be > >interfiled and some should not. In some > >languages these would be seen as "separate > >letters" (e.g. with different primary weights) > >and in others not; but that does not line up in > >any particular way with what is in the UCA. (see > >also comment below). > > Those aren't the ones I'm worried about, and they > are not much of a problem. We had principles for > determining "basic letters" and those are what we > used; what I see now is a proposal to change that. > > >See http://www.unicode.org/charts/collation/chart_Latin.html for many other > >cases. > > Please do. Do you really want all those letters > between "e" and "f" interfiled with "e"? I surely > do not. > > > > 3) in discussions elsewhere, Mark has talked about what "most users" > >> "expect" and I found his suggestion to be anglocentric and > >> unsubstantiated. > > > >And I will refrain from saying what I think of your reasoning ability in > >general, although circularity seems to be a particular specialty. > > Sweet of you to say. > > >I suggest that we stick to the facts instead of ad hominem attacks. > > Calling a thing "ad hominem" doesn't make it ad > hominem. It is your suggestion which I > criticized, because it seems very A-to-Z and > alien to the principles which have been in the > template until now. > > >For user expectations, check out how foreign words with unusual accents are > >sorted in a variety of languages. I have seen no reason to believe that > >Germans or French or others behave much differently when faced with a letter > >like à that is not one that they use. The key is whether they would expect > >to see: > > > >a) Interleaved: > >..oa.. > >..Ãb.. > >..oz.. > > You can tailor for this now. > > >b) Separate but near: > >..oz.. > >..Ãb.. > >..pa.. > > This is what we have now. > > >c) Like a particular language (Danish) > >..yb.. > >..Ãb.. > > You can tailor for this now. > > My point is made here. It is really only in > initial position where this is likely to be > noticed. What I want is the status quo, however. > Leave the template and its principles alone. > > >a) Interleaved: > >..oa.. > >..Ãb.. > >..oz.. > > This is what we have now. > > >b) Separate but near: > >..oz.. > >..Ãb.. > >..pa.. > > You can tailor for this now. > > >c) Like a particular language (Swedish or Phonebook German) > >..yb.. > >..Ãb.. > > > >..od.. > >..Ãz.. > >..of.. > > You can tailor for this now. > > >More accurately, you believe that the correct behavior occurs. > > It is correct for most of the letters which would > be affected by the change you propose. The > overwhelming majority of the > letters-without-diacritics which occur between > the "main A-Z letters" are correctly filed that > way, and would be incorrectly filed if interfiled > with the "main" letters. Is there a discomfort in > what happens between Ã/Ã? Well, that's an > anomaly, right enough but it is well-known and > can easily be tailored for anyone worried about > it. Lumping all the Engs with N or all the Schwas > with E, however, would have only the effect of > making a working template cease to work for the > people who really need those letters: linguists, > speakers of African languages, and so on. The > only people who use the sideways "o" and the top- > and bottom-half "o" are Uralic linguists, and the > template works correctly for them, at least for > those letters. > > > > 5) if Mark wants to make a tailoring to interfile all these letters > >> (which can only result in what I describe as "visual seasickess" to > >> any poor users who have to actually read such wordlists. > > > >Again, no evidence. > > It was argued years ago in TC304 and WG20. I'm > disheartened to have to reopen the arguments now, > particularly as it affects stability and you > yourself have been a champion for stability. > > >Let's look at a particular example, letters based on > >"O". UCA *already* interleaves the list below (UCA O List). Adding John's > >list to that would add only the two elements: > > John's list? > > > > 6) the Latin alphabet has a lot more than 26 letters in it. In this > >> age of the Universal Character Set, "most users" would do better to > >> get used to this than to be hobbled by older concepts. > > > >I agree with the general principle, but it has > >no bearing on the topic at hand. > > It is the key to the principles which are in the template now. > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > > >