Re: Changing UCA primarly weights (bad idea)

Mark Davis Fri, 09 Jul 2004 17:56:29 -0700

I'll try to pick out the relevant points.

> Please do. Do you really want all those letters
> between "e" and "f" interfiled with "e"? I surely
> do not.


You seem to have a misperception of what I think we should be looking at.
What I think we should be examining is which of the items that are not
interfiled (to use your phrasing) should be, if any. I don't think
everything should be. In particular, I think John's list is the list we
should be focusing on.

> John's list?

That's was in my original mail, that you were commenting on when you changed
the subject line, but which you didn't apparently didn't bother to actually
read. Here is the text:

>> If you look at John's suggested file for diacritic
>> folding(http://www.ccil.org/~cowan/DiacriticFolding.txt), there are quite
a
>> number that are not reflected in the UCA.

> My point is made here. It is really only in
> initial position where this is likely to be
> noticed.

This is incorrect. It will make a difference in other positions. Sorting
"SÃren" after "Sozar" in a long list, if someone isn't expecting it, will
cause problems. They look for it after "Soret", don't see it on the page,
and assume it isn't there; fooled by the fact that it is on a completely
different page.

Remember that the collation sequence is also used for language-sensitive
matching as well as sorting.

> What I want is the status quo, however.
> Leave the template and its principles alone.

Stability is important, and we want to consider that very carefully before
making any change. However, I believe that the current way we handle a few
characters in UCA is distinctly suboptimal, and worth considering.

âMark

----- Original Message ----- 
From: "Michael Everson" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Friday, July 09, 2004 13:25
Subject: Re: Changing UCA primarly weights (bad idea)


> Mark, your examples are all of the
> run-of-the-mill Scandinavian variety. Trotting
> out Polish and Danish doesn't address the issue.
> The issue is all the  phonetic characters, and
> all the African ones (for instance).
>
> >  > 1) it destabilizes the default tailorable template of ISO/IEC 14651
> >  > and the UCA which has been published for some time. Anyone who *has*
> >  > tailored it would have to do all that work all over again.
> >
> >You are certainly right that this is not a slam-dunk;
>
> This noun must have been on TV a lot in the US
> recently; I have seen it a lot but it remains
> obscure, apart from being a basketball reference.
> What does it mean? That I am right that the
> proposal is not a shoo-in? Or, indeed, that I am
> right that it is not a foregone conclusion that
> the proposal will be accepted?
>
> >there are reasons for
> >and against it. And it may well be that the committee decides against it.
>
> There are two templates, which are synchronized,
> and decided about by two committees.
>
> >What we actually did was to put similar letters
> >near other letters, *and if their decompositions
> >were the same* we interfiled them.
>
> I remember. I was on the committee that helped to decide these things.
>
> >There is, however, little principled difference
> >between Ã, Â , Â , Ã, Ã, ?, and Ã that would
> >cause a user to think that the some should be
> >interfiled and some should not. In some
> >languages these would be seen as "separate
> >letters" (e.g. with different primary weights)
> >and in others not; but that does not line up in
> >any particular way with what is in the UCA. (see
> >also comment below).
>
> Those aren't the ones I'm worried about, and they
> are not much of a problem. We had principles for
> determining "basic letters" and those are what we
> used; what I see now is a proposal to change that.
>
> >See http://www.unicode.org/charts/collation/chart_Latin.html for many
other
> >cases.
>
> Please do. Do you really want all those letters
> between "e" and "f" interfiled with "e"? I surely
> do not.
>
> >  > 3) in discussions elsewhere, Mark has talked about what "most users"
> >>  "expect" and I found his suggestion to be anglocentric and
> >>  unsubstantiated.
> >
> >And I will refrain from saying what I think of your reasoning ability in
> >general, although circularity seems to be a particular specialty.
>
> Sweet of you to say.
>
> >I suggest that we stick to the facts instead of ad hominem attacks.
>
> Calling a thing "ad hominem" doesn't make it ad
> hominem. It is your suggestion which I
> criticized, because it seems very A-to-Z and
> alien to the principles which have been in the
> template until now.
>
> >For user expectations, check out how foreign words with unusual accents
are
> >sorted in a variety of languages. I have seen no reason to believe that
> >Germans or French or others behave much differently when faced with a
letter
> >like Ã that is not one that they use. The key is whether they would
expect
> >to see:
> >
> >a) Interleaved:
> >..oa..
> >..Ãb..
> >..oz..
>
> You can tailor for this now.
>
> >b) Separate but near:
> >..oz..
> >..Ãb..
> >..pa..
>
> This is what we have now.
>
> >c) Like a particular language (Danish)
> >..yb..
> >..Ãb..
>
> You can tailor for this now.
>
> My point is made here. It is really only in
> initial position where this is likely to be
> noticed. What I want is the status quo, however.
> Leave the template and its principles alone.
>
> >a) Interleaved:
> >..oa..
> >..Ãb..
> >..oz..
>
> This is what we have now.
>
> >b) Separate but near:
> >..oz..
> >..Ãb..
> >..pa..
>
> You can tailor for this now.
>
> >c) Like a particular language (Swedish or Phonebook German)
> >..yb..
> >..Ãb..
> >
> >..od..
> >..Ãz..
> >..of..
>
> You can tailor for this now.
>
> >More accurately, you believe that the correct behavior occurs.
>
> It is correct for most of the letters which would
> be affected by the change you propose. The
> overwhelming majority of the
> letters-without-diacritics which occur between
> the "main A-Z letters" are correctly filed that
> way, and would be incorrectly filed if interfiled
> with the "main" letters. Is there a discomfort in
> what happens between Ã/Ã? Well, that's an
> anomaly, right enough but it is well-known and
> can easily be tailored for anyone worried about
> it. Lumping all the Engs with N or all the Schwas
> with E, however, would have only the effect of
> making a working template cease to work for the
> people who really need those letters: linguists,
> speakers of African languages, and so on.  The
> only people who use the sideways "o" and the top-
> and bottom-half "o" are Uralic linguists, and the
> template works correctly for them, at least for
> those letters.
>
> >  > 5) if Mark wants to make a tailoring to interfile all these letters
> >>  (which can only result in what I describe as "visual seasickess" to
> >>  any poor users who have to actually read such wordlists.
> >
> >Again, no evidence.
>
> It was argued years ago in TC304 and WG20. I'm
> disheartened to have to reopen the arguments now,
> particularly as it affects stability and you
> yourself have been a champion for stability.
>
> >Let's look at a particular example, letters based on
> >"O". UCA *already* interleaves the list below (UCA O List). Adding John's
> >list to that would add only the two elements:
>
> John's list?
>
> >  > 6) the Latin alphabet has a lot more than 26 letters in it. In this
> >>  age of the Universal Character Set, "most users" would do better to
> >>  get used to this than to be hobbled by older concepts.
> >
> >I agree with the general principle, but it has
> >no bearing on the topic at hand.
>
> It is the key to the principles which are in the template now.
> -- 
> Michael Everson * * Everson Typography *  * http://www.evertype.com
>
>
>

Re: Changing UCA primarly weights (bad idea)

Reply via email to