On 9/6/2015 11:23 PM, Richard
Wordingham wrote:
On Thu, 03 Sep 2015 09:32:42 -0700 Rick McGowan <r...@unicode.org> wrote:A proposed update to the LDML specification (UTS #35) will be available for review as of Monday, September 7 at 06:00 GMT. The open review period closes on Monday, September 14 at 06:00 GMT. (This is a short review period, because CLDR 28 is scheduled for release in the week of September 16.) The proposed update will be at http://unicode.org/reports/tr35/proposed.html To report bugs in the specification, please use http://unicode.org/cldr/trac/newticketHave the implications of adding string ranges to Unicode sets been considered? I'm mentioning them on the list because their impact goes beyond locales, and I haven't worked out their implications myself. By my reading, adding string ranges will initially make regular _expression_ engines that don't use ICU non-compliant with Level 1 of UTS#18 Unicode Regular Expressions, in particular RL1.3 'subtraction and intersection'. I don't imagine the extra work of set operations on Unicode sets containing string ranges will be popular. It may be worst for the minority of regular _expression_ engines that use the regularity of regular expressions. I note that the safety feature of requiring the start and end points to have the same length has been removed from their design. The restriction appears to have weakened to the point where the left string is allowed to be longer, and where the "excess" is then understood as a common prefix. On the face of it, that seems a mere convenience. String ranges seem particularly vulnerable to the ill-effects of unpredictable normalisation. If a String range is, as claimed, merely a more compact statement of what can be done with existing sets and patterns, this should be made explicit, by giving the rewrite rules. That would answer two of your issues. 1) a preprocessor can be used to change range expressions into expressions that work with older engines 2) the normalization issues are no worse than for other sets There may be the issue of how these play with operations on the sets themselves, like union intersection and difference. These cases should be covered by the required rewrite rules to make it verifiable that the ranges are simply syntactic sugar and do not have hidden new functionality. A./ Richard. |
- The proposed update LDML specification for CLDR Release... Rick McGowan
- String Ranges in Unicode Sets Richard Wordingham
- Re: String Ranges in Unicode Sets Mark Davis ☕️
- Re: String Ranges in Unicode Sets Richard Wordingham
- Re: String Ranges in Unicode Sets Mark Davis ☕️
- Re: String Ranges in Unicode Sets Asmus Freytag (t)
- Re: String Ranges in Unicode ... Mark Davis ☕️
- Re: String Ranges in Unic... Richard Wordingham
- Re: String Ranges in Unicode Sets Asmus Freytag (t)