On Fri, 2 Nov 2018 14:27:37 -0700 Ken Whistler via Unicode <[email protected]> wrote:
> On 11/2/2018 10:02 AM, Philippe Verdy via Unicode wrote: > > UTR#10 still does not explicitly state that its use of "0000" does > > not mean it is a valid "weight", it's a notation only > > No, it is explicitly a valid weight. And it is explicitly and > normatively referred to in the specification of the algorithm. See > UTS10-D8 (and subsequent definitions), which explicitly depend on a > definition of "A collation weight whose value is zero." The entire > statement of what are primary, secondary, tertiary, etc. collation > elements depends on that definition. And see the tables in Section > 3.2, which also depend on those definitions. The definition is defective in that it doesn't handle 'large weight values' well. There is the anomaly that a mapping of collating element to [1234.0000.0000][0200.020.002] may be compatible with WF1, but the exactly equivalent mapping to [1234.020.002][0200.0000.0000] makes the table ill-formed. The fractional weight definitions for UCA eliminate this '0000' notion quite well, and I once expected the UCA to move to the CLDRCA (CLDR Collation Algorithm) fractional weight definition. The definition of the CLDRCA does a much better job of explaining 'large weight values'. It turns them from something exceptional to a normal part of its functioning. > > (but the notation is used for TWO distinct purposes: one is for > > presenting the notation format used in the DUCET > > It is *not* just a notation format used in the DUCET -- it is part of > the normative definitional structure of the algorithm, which then > percolates down into further definitions and rules and the steps of > the algorithm. It's not needed for the CLDRCA! The statement of the UCA algorithm does depend on its notation, but it can be recast to avoid these zero weights. Richard.

