Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/21/2013 4:22 PM, Philippe Verdy wrote: 2013/3/21 Richard Wordingham richard.wording...@ntlworld.com: Further, the code chart glyphs for the ANO TELEIA and the MIDDLE DOT differ, see attachment. If they are canonically equivalent, and one is a mandatory decomposition of the other, why do

In 2013, there are still programs with huge Unicode bugs :-(

2013-03-22 Thread Stephane Bortzmeyer
This one is incredible: https://bugzilla.redhat.com/show_bug.cgi?id=922433

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Philippe Verdy
2013/3/22 Asmus Freytag asm...@ix.netcom.com: Semantic selectors are pure pseudo-coding, because if the semantic differentiation is needed it is needed in plain text - and then it should be expressible in plain character codes. We don't disagree, that's exactly what I meant here : plain

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Philippe Verdy
2013/3/22 Asmus Freytag asm...@ix.netcom.com: If you need to annotate text with the results of semantic analysis as performed by a human reader, then you either need XML, or some other format that can express that particular intent. Absolutely NO. If this encodes semantics, this is part of

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Philippe Verdy
2013/3/22 Asmus Freytag asm...@ix.netcom.com: The number of conventions that can be applicable to certain punctuation characters is truly staggering, and it seems unlikely that Unicode is the right place to a) discover all of them or b) standardize an expression for them. My intent is

Re: In 2013, there are still programs with huge Unicode bugs :-(

2013-03-22 Thread john knightley
But is how do we know whether the bug is there all the time! On Fri, Mar 22, 2013 at 4:45 PM, Stephane Bortzmeyer bortzme...@nic.fr wrote: This one is incredible: https://bugzilla.redhat.com/show_bug.cgi?id=922433

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/22/2013 4:02 AM, Philippe Verdy wrote: 2013/3/22 Asmus Freytag asm...@ix.netcom.com: Semantic selectors are pure pseudo-coding, because if the semantic differentiation is needed it is needed in plain text - and then it should be expressible in plain character codes. We don't disagree,

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/22/2013 4:08 AM, Philippe Verdy wrote: 2013/3/22 Asmus Freytag asm...@ix.netcom.com: If you need to annotate text with the results of semantic analysis as performed by a human reader, then you either need XML, or some other format that can express that particular intent. Absolutely NO. If

Regular expressions level 3

2013-03-22 Thread Martinho Fernandes
Is there an implementation of a regular expression engine with full Unicode Level 3 support as per UTS #18? Mit freundlichen Grüßen, Martinho

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/22/2013 4:16 AM, Philippe Verdy wrote: 2013/3/22 Asmus Freytag asm...@ix.netcom.com: The number of conventions that can be applicable to certain punctuation characters is truly staggering, and it seems unlikely that Unicode is the right place to a) discover all of them or b) standardize an

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Richard Wordingham
On Fri, 22 Mar 2013 12:08:14 +0100 Philippe Verdy verd...@wanadoo.fr wrote: adding new variants of existing characters like what was done specifically for maths is not a stabl long term solution; solutions similar to variant selectors however are much more meaningful, and will allow for

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Karl Williamson
On 03/21/2013 04:48 PM, Richard Wordingham wrote: For linguistic analysis, you need the normalisation appropriate to the task. This is a case where Unicode normalisation generally throws away information (namely, how the author views the characters), whereas in analysing Burmese you may want to

Re: In 2013, there are still programs with huge Unicode bugs :-(

2013-03-22 Thread Stephan Stiller
This one is incredible: https://bugzilla.redhat.com/show_bug.cgi?id=922433 This sort of failure to perform input validation and/or escaping is also a sign of bad software engineering in general. I recall an important CGI form of my university refusing to let me submit because I input an

Re: In 2013, there are still programs with huge Unicode bugs :-(

2013-03-22 Thread Philippe Verdy
And how many web forms forget to check the presence of a percent sign and are executing SQL searches without cheking it using clauses similar to WHERE table.field LIKE :parameter by binding directly the submitted form value to the parameter variable placeholder, ignoring the fact that the percent

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/22/2013 12:08 PM, Karl Williamson wrote: On 03/21/2013 04:48 PM, Richard Wordingham wrote: For linguistic analysis, you need the normalisation appropriate to the task. Linguistic analysis (in general) being a hugely complex undertaking, mere normalization pales in comparison, so

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Richard Wordingham
On Fri, 22 Mar 2013 13:08:01 -0600 Karl Williamson pub...@khwilliamson.com wrote: This is the first time I've heard someone suggest that one can tailor normalizations. I think the officially acceptable term is 'folding'. One would not be 'tailoring a Unicode normalisation', but subverting the

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Richard Wordingham
On Fri, 22 Mar 2013 18:01:14 -0700 Asmus Freytag asm...@ix.netcom.com wrote: On 03/21/2013 04:48 PM, Richard Wordingham wrote: However, distinguishing U+00B7 and U+0387 would fail spectacularly of the text had been converted to form NFC before you received it. That's a claim for which

Re: Rendering Raised FULL STOP between Digits

2013-03-22 Thread Asmus Freytag
On 3/22/2013 6:17 PM, Richard Wordingham wrote: On Fri, 22 Mar 2013 18:01:14 -0700 Asmus Freytag asm...@ix.netcom.com wrote: On 03/21/2013 04:48 PM, Richard Wordingham wrote: However, distinguishing U+00B7 and U+0387 would fail spectacularly of the text had been converted to form NFC before