On Fri, 15 Mar 2013 21:12:48 -0700, Markus Scherer wrote:
On Fri, Mar 15, 2013 at 6:52 PM, Richard Wordingham wrote:
(Well, actually the send button was pressed at 01.52 GMT on Saturday.)
The point is that no sequence of
units (8-bit, 16-bit or whatever the implementation uses) can be
On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Please give an example of how the low/high split would fail. With the
primary collation weights 20, 21, 21 80 and 22 I get the following
primary collation weight sequences for one and two collating
On Sat, 16 Mar 2013 09:29:07 -0700
Markus Scherer markus@gmail.com wrote:
On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Please give an example of how the low/high split would fail. With
the primary collation weights 20, 21, 21 80 and 22 I
2013/3/16 Richard Wordingham richard.wording...@ntlworld.com:
On Sat, 16 Mar 2013 09:29:07 -0700
Markus Scherer markus@gmail.com wrote:
On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Please give an example of how the low/high split would
2013/3/16 Richard Wordingham richard.wording...@ntlworld.com:
But with the low/high split scheme, start units have to have low values
(e.g. 20, 21 22) and continuation units have high values (e.g. 80)
just to stop this very problem.
Note also that all technics used for data compression can
On Sat, 16 Mar 2013 21:58:02 +0100
Philippe Verdy verd...@wanadoo.fr wrote:
2013/3/16 Richard Wordingham richard.wording...@ntlworld.com:
On Sat, 16 Mar 2013 09:29:07 -0700
Markus Scherer markus@gmail.com wrote:
On Sat, Mar 16, 2013 at 4:09 AM, Richard Wordingham
2013/3/16 Richard Wordingham richard.wording...@ntlworld.com:
If you start with my start = low, continuation = high scheme, you can
convert it in an order-preserving manner to a no-prefix scheme by
the following simple transform:
If a simple weight precedes a continuation weight, add 0ยท8
On Thu, 14 Mar 2013 19:13:43 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, Mar 14, 2013 at 4:09 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
On Thu, 14 Mar 2013 14:49:18 -0700
Markus Scherer markus@gmail.com wrote:
While variableTop=u2FD5 ...
... but
On Fri, 15 Mar 2013 13:52:39 -0700
Markus Scherer markus@gmail.com wrote:
On Fri, Mar 15, 2013 at 12:50 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Not quite. The characterisation of variable weights knows nothing
of the concept, and that is the problem.
That's a
On Fri, Mar 15, 2013 at 3:05 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
In CLDR/ICU's FractionalUCA.txt, all but 40 or so of the primary
weights (and many of the secondary weights) use the large weights
mechanism.
No, they're 32-bit weights expressed by omitting
On Fri, 15 Mar 2013 16:03:57 -0700
Markus Scherer markus@gmail.com wrote:
On Fri, Mar 15, 2013 at 3:05 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
In CLDR/ICU's FractionalUCA.txt, all but 40 or so of the primary
weights (and many of the secondary weights) use the
On Fri, Mar 15, 2013 at 6:52 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
The fractional refers to the same kind of mechanism as the large
weight values in the UCA spec.
Yes. The problem is that formally the UCA clearly treats 'large
weights' as being in multiple
Richard Wordingham wrote:
Actually, there is a subtle and nasty difference, but probably one that
will very rarely strike practical use. It's most obvious manifestation
is in the application of the UCA parametric tailoring
topVariable=u2FD5. U+2FD5 KANGXI RADICAL FLUTE is the last symbol in
In ICU, setVariableTop() has a documented limitation: It requires that the
primary weight has only 1 or 2 bytes. Until a few years ago, this was true
for most characters. Since then, Unicode added many more characters and we
ran out of space for 2-byte weights, given our constraints. So we use
On Thu, 14 Mar 2013 14:49:18 -0700
Markus Scherer markus@gmail.com wrote:
However, it does not make a lot of sense to set the variable top to
something above the currency symbols range -- it's basically an
option for an ignore punctuation mode, and you wouldn't want to
ignore nearly every
On Thu, 14 Mar 2013 21:01:10 +
Whistler, Ken ken.whist...@sap.com wrote:
Richard Wordingham wrote:
...UCA parametric tailoring topVariable=u2FD5 ...
The parametric tailoring in question is variableTop, not
topVariable,
Sorry.
and it would be expressed u00u2FD5, not u2FD5.
No -
On Thu, Mar 14, 2013 at 4:09 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
On Thu, 14 Mar 2013 14:49:18 -0700
Markus Scherer markus@gmail.com wrote:
However, it does not make a lot of sense to set the variable top to
something above the currency symbols range -- it's
On Wed, Mar 13, 2013 at 11:38 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
One of the changes from Version 6.1.0 to 6.2.0 of the the UCA (UTS#10)
was to changed weights from being 16 bits to just being general
non-negative integers. Was this just to accommodate the 4th
Richard Wordingham wrote:
One of the changes from Version 6.1.0 to 6.2.0 of the the UCA (UTS#10)
was to changed weights from being 16 bits to just being general
non-negative integers. Was this just to accommodate the 4th weight in
DUCET (scheduled for deletion in Version 6.3.0), or is it
On Wed, 13 Mar 2013 21:07:06 +
Whistler, Ken ken.whist...@sap.com wrote:
Richard Wordingham wrote:
One of the changes from Version 6.1.0 to 6.2.0 of the the UCA
(UTS#10) was to changed weights from being 16 bits to just being
general non-negative integers. Was this just to
Richard Wordingham wrote:
It loosened up the spec, so that the spec itself didn't seem to be
requiring that each of the first 3 levels had to be expressed with a
full 16 bits in any collation element table.
I don't read it that way. But it did allow the 4th weight to go up to
10!
21 matches
Mail list logo