Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Thu, 17 May 2012 21:32:19 -0700 Markus Scherer markus@gmail.com wrote: On Thu, May 17, 2012 at 4:29 PM, Richard Wordingham richard.wording...@ntlworld.com wrote: As I've already said, DUCET 6.1.0 omits a contraction for 0FB2+0F71, and so CE(0FB2, 0334, 0F71, 0F80) =

Why no mathematical sans-serif capital theta?

2012-05-18 Thread Jukka K. Korpela
Is there a good reason, or some explanation, for the lack of MATHEMATICAL SANS-SERIF CAPITAL THETA in Unicode? As far as I have understood, the Mathematical Alphanumeric Symbols block has been added to make it possible to make certain distinctions at the character level. The difference

Re: Origins of w

2012-05-18 Thread Andreas Prilop
On Wed, 16 May 2012, Denis Jacquerye wrote: How about U+1E1C, U+1E1D Hebrew U+05B1 U+1E4E, U+1E4F I don't know. U+1E64, U+1E65, U+1E66, U+1E67 ? Hebrew U+FB2D and U+FB2C (in this order) Which transliteration systems are they from? ISO 259 (1984)

Re: Origins of w

2012-05-18 Thread Denis Jacquerye
Thank you Andreas. On Fri, May 18, 2012 at 10:33 AM, Andreas Prilop prilop4...@trashmail.net wrote: On Wed, 16 May 2012, Denis Jacquerye wrote: How about U+1E1C, U+1E1D Hebrew U+05B1 U+1E4E, U+1E4F I don't know. U+1E64, U+1E65, U+1E66, U+1E67 ? Hebrew U+FB2D and U+FB2C (in this

Re: Origins of w

2012-05-18 Thread Andreas Prilop
On Wed, 16 May 2012, Denis Jacquerye wrote: U+1E00 and U+1E01 are also a mystery. You can find letter a with ring below in the title Grammar of the Pasto or language of the Afghans by Ernest Trumpp, published 1873. http://www.google.co.uk/search?q=%22P%E1%B8%81%E1%B9%A3%CC%8Ct%C5%8D%22 I don't

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Thu, 17 May 2012 21:32:19 -0700 Markus Scherer markus@gmail.com wrote: Ok, but assuming we didn't add 0FB2+0F71, why can't we add the contraction 0FB2+0F81 and have the 0334 and any other non-starter be handled via discontiguous matching? Time for me to make a pronouncement on

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Markus Scherer
Back to first principles. UCA conformance requires getting the same results as the Main Algorithm. This can be done easily with NFD input text, or by implementing Step 1 which normalizes the input to NFD. Everything else is a performance optimization, and there are trade-offs. We also want

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Fri, 18 May 2012 09:51:34 -0700 Markus Scherer markus@gmail.com wrote: There is nothing that requires us to get correct results *without normalization* for all FCD strings or any other particular input conditions (except NFD input). So long as you don't claim conformance to the CLDR

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Mark Davis ☕
There is an action item from the UTC and CLDR committees to clarify the meanings of the setting; they are supposed to allow some degree of variation. -- Mark https://plus.google.com/114199149796022210033 * * *— Il meglio è l’inimico del bene —* ** On Fri, May 18,

Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

2012-05-18 Thread Richard Wordingham
On Fri, 18 May 2012 09:51:34 -0700 Markus Scherer markus@gmail.com wrote: On inspection, we think we can do better (and want to), probably by adding overlap contractions. If we get into trouble with that, we will think of alternatives. One is to decompose more characters even in FCD