On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
This will then work for DUCET
6.1.0, work for Danish, and work for my mischievous 0302 COMBINING
CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction.
There is a very similar rule in CLDR for
On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
Just in case you haven't already thought of it, one reasonable scheme
would be to decompose input if and only if searching for contractions
or the input character could *hide* the start of a
On Sun, 20 May 2012 16:15:24 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
CORRECTION:
For the general case, we ought to be able to express a rule such as
'ignore the countering of sof-dottedness', as in Lithuanian casing,
but I don't see any finite method of expressing it
On Sun, 20 May 2012 17:05:00 +0100
Richard Wordingham richard.wording...@ntlworld.com wrote:
CORRECTION to correction
I wrote
rules for soft-dotted indecomposable+0307+ccc=203
when, of course, I meant
rules for soft-dotted indecomposable+0307+ccc=230
Sorry about that.
Richard.
Hi Richard,
This is essentially the same problem as
http://bugs.icu-project.org/trac/ticket/9319 right? (Contractions
overlapping with decomposition mappings.)
Would you mind adding a reply to that with the Lithuanian issue?
Thanks,
markus
On Thu, 17 May 2012 21:32:19 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 4:29 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
As I've already said, DUCET 6.1.0 omits a contraction for 0FB2+0F71,
and
so CE(0FB2, 0334, 0F71, 0F80) =
On Thu, 17 May 2012 21:32:19 -0700
Markus Scherer markus@gmail.com wrote:
Ok, but assuming we didn't add 0FB2+0F71, why can't we add the
contraction 0FB2+0F81 and have the 0334 and any other non-starter be
handled via discontiguous matching?
Time for me to make a pronouncement on
Back to first principles.
UCA conformance requires getting the same results as the Main Algorithm.
This can be done easily with NFD input text, or by implementing Step 1
which normalizes the input to NFD. Everything else is a performance
optimization, and there are trade-offs.
We also want
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer markus@gmail.com wrote:
There is nothing that requires us to get correct results *without
normalization* for all FCD strings or any other particular input
conditions (except NFD input).
So long as you don't claim conformance to the CLDR
There is an action item from the UTC and CLDR committees to clarify the
meanings of the setting; they are supposed to allow some degree of
variation.
--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il meglio è l’inimico del bene —*
**
On Fri, May 18,
On Fri, 18 May 2012 09:51:34 -0700
Markus Scherer markus@gmail.com wrote:
On inspection, we think we can do better (and want to), probably by
adding overlap contractions. If we get into trouble with that, we
will think of alternatives. One is to decompose more characters even
in FCD
On 5/16/2012 9:46 PM, Mark Davis ☕ wrote:
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because
it sometimes occurs in the middle of English text.
From: Mark Davis ☕ m...@macchiato.com
On Wed, May 16, 2012 at 9:20 PM, vanis...@boil.afraid.org wrote:
From: Ken Whistler kenw_at_sybase.com
Orthographies which mix in random characters from other scripts do not
(or should not) drive the identity of characters for *scripts* per se.
And
*Please* use a different email subject line for the x vs. Lao discussion.
markus
On Thu, May 17, 2012 at 1:57 AM, vanis...@boil.afraid.org wrote:
Well, I was speaking of the general case, not this specific example.
Orthographies which mix in random characters from other scripts do not, and
On Wed, 16 May 2012 16:03:08 -0700
Markus Scherer markus@gmail.com wrote:
The problem is a contraction x+0F72 and input text x+0F73 where the
inner 0F71 should be skipped. We can avoid this by adding a
contraction for x+0F73 (and one for the equivalent x+0F71+0F72).
On the other hand,
On Wed, 16 May 2012 21:46:17 -0700
Mark Davis ☕ m...@macchiato.com wrote:
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because
it sometimes occurs in the middle of English text.
No, it's more
On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
As x = 0F71, we also need the
contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
give the pair of long vowels. We don't need to worry about
x+0F73,0F73 because that is not FCD.
I am
2012/5/17 Richard Wordingham richard.wording...@ntlworld.com:
On Wed, 16 May 2012 21:46:17 -0700
Mark Davis ☕ m...@macchiato.com wrote:
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because
it
On Thu, 17 May 2012 13:39:08 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
As x = 0F71, we also need the
contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
give the pair of
On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
If using DUCET, the collation elements for 0F71+0F71+0F72 are those for
0F73, 0F71, namely (at 6.1.0):
[.2572.0020.0002.0F73][.2570.0020.0002.0F71].
The correct collation elements for FCD sequence
On Thu, 17 May 2012 15:42:37 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
HOWEVER, you must *not* have the added contraction for 0F71+0F71.
If we don't have this prefix contraction, then we will
On Thu, May 17, 2012 at 4:29 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
On Thu, 17 May 2012 15:42:37 -0700
Markus Scherer markus@gmail.com wrote:
On Thu, May 17, 2012 at 3:00 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
HOWEVER, you must *not*
On Tue, 15 May 2012 21:33:03 -0700
Markus Scherer markus@gmail.com wrote:
On Tue, May 15, 2012 at 4:42 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I am puzzled as to how an implementation can compliantly implement
the tailoring of normalisation in the UCA.
I think
On Wed, May 16, 2012 at 1:24 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Section 5.1 of the UCA says that one may have a parametric
normalisation tailoring.
Aha :-)
When you write normalisation tailoring it sounds like you are tailoring
the normalization algorithm or
On Wed, 16 May 2012 09:17:51 -0700
Markus Scherer markus@gmail.com wrote:
On Wed, May 16, 2012 at 1:24 AM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
Section 5.1 of the UCA says that one may have a parametric
normalisation tailoring.
Section 5.1 is about runtime
On 5/16/2012 2:54 PM, Richard Wordingham wrote:
Similar remarks apply to 'reorder'. What if I move 'Q' and 'q' into
the Cyrillic sequence? (I've a recollection that this letter is used
in Kurdish written in Cyrillic.)
Obsolete recollection. See:
051A;CYRILLIC CAPITAL LETTER
On Wed, May 16, 2012 at 2:54 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
The tailoring 'locale' is not orthogonal.
Well, right, that one selects the Collation Element Table :-)
The tailoring 'caseFirst' rather reshuffles the tertiary weights. I am
not entirely convinced
From: Ken Whistler kenw_at_sybase.com
On 5/16/2012 2:54 PM, Richard Wordingham wrote:
I have been wondering if U+0078 LATIN
SMALL LETTER X should be made common script because of its use for
displaying Lao vowels, but perhaps the principle of separation of
scripts should lead to LAO
No, it's not.
Including x in Lao for some pedagogical (I'm guessing) purpose is
completely out of scope. That'd be like including π in Latin because it
sometimes occurs in the middle of English text.
--
Mark https://plus.google.com/114199149796022210033
*
*
*— Il
I am puzzled as to how an implementation can compliantly implement the
tailoring of normalisation in the UCA.
Can an implementation be said to compliantly implement the tailoring of
normalisation if nominally turning it off actually has no effect? If
it can, my puzzlement goes away.
Simply
On Tue, May 15, 2012 at 4:42 PM, Richard Wordingham
richard.wording...@ntlworld.com wrote:
I am puzzled as to how an implementation can compliantly implement the
tailoring of normalisation in the UCA.
I think you mean something like implement tailorings where contractions
overlap with
31 matches
Mail list logo