Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
On 23/11/16 11:45, Daniel Bünzli wrote: On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote: I took a look at the ICU sources, and they explicitly mention this case, so it seems I was mistaken with interpreting the intention of the UAX. I still find it confusing, but based on this thread,

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Daniel Bünzli
On Wednesday 23 November 2016 at 12:28, Tom Hacohen wrote: > I took a look at the ICU sources, and they explicitly mention this case, > so it seems I was mistaken with interpreting the intention of the UAX. I > still find it confusing, but based on this thread, it seems to just be me. It's not on

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
On 23/11/16 11:20, Philippe Verdy wrote: 2016-11-23 12:00 GMT+01:00 Tom Hacohen mailto:t...@osg.samsung.com>>: Also take another look at http://www.unicode.org/reports/tr29/#Grapheme_Cluster_and_Format_Rules

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Philippe Verdy
2016-11-23 12:00 GMT+01:00 Tom Hacohen : > > Also take another look at http://www.unicode.org/reports > /tr29/#Grapheme_Cluster_and_Format_Rules specifically the table that > shows another way of writing the ignore rule. This again shows my > understanding of rule 4 is correct. > > Specially look

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
On 23/11/16 11:11, Daniel Bünzli wrote: On Wednesday 23 November 2016 at 12:00, Tom Hacohen wrote: This looks like a mistake statement rather than a binding rule. Well at least to me it's pretty clear that this is not the case. Even if that's true, look at my second statement (which you red

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Philippe Verdy
You say "theres's no case where two rules apply". I don't think this is right, rules apply in the precedence order as long as they've not produced a decision for generating a "break here" or no break here". This is especially important for rules that generate only a replacement, that are executed i

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Daniel Bünzli
On Wednesday 23 November 2016 at 12:00, Tom Hacohen wrote: > This looks like a mistake statement rather than a binding rule. Well at least to me it's pretty clear that this is not the case. > Even if that's true, look at my second statement (which you redacted in > your reply): I'm not arguing

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
On 23/11/16 10:52, Daniel Bünzli wrote: On Wednesday 23 November 2016 at 11:22, Tom Hacohen wrote: Thank you for your reply, but I don't think the UAX, specifically the line you quoted implies that. The line you quoted says that the process is terminated when a rule matches and produces a bounda

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Daniel Bünzli
On Wednesday 23 November 2016 at 11:22, Tom Hacohen wrote: > Thank you for your reply, but I don't think the UAX, specifically the > line you quoted implies that. The line you quoted says that the process > is terminated when a rule matches and produces a boundary status. In > Table 1[1], the rig

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
On 23/11/16 10:01, Daniel Bünzli wrote: On Tuesday 22 November 2016 at 13:07, Tom Hacohen wrote: However, looking at the test case and the UAX[2], this does not look correct. More specifically, because of rule 4: ZWJ Extended GAZ -> ZWJ GAZ And then according to rule 3c, there should be no break

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Daniel Bünzli
On Tuesday 22 November 2016 at 13:07, Tom Hacohen wrote: > However, looking at the test case and the UAX[2], this does not look > correct. More specifically, because of rule 4: > ZWJ Extended GAZ -> ZWJ GAZ > And then according to rule 3c, there should be no break opportunity > between them. I'd

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-23 Thread Tom Hacohen
You said: > So ignore it and test whever the last symbols glues with ZWJ (it should, > so there's no break in the reference implementation). Which makes me think you misread the example I quoted. There is a break in the reference implementation, though I argue (like you just did) that there sho

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Philippe Verdy
Note also this statement at the begining of the specification: Single boundaries. Each rule has exactly one boundary position. This restriction is more a limitation on the specification methods, because a rule with multiple boundaries could be expressed instead as multiple rules. For example: *

Re: Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Philippe Verdy
IMHO, the ZWJ should glue with the last symbol following your examples. But the combining diaeresis following the ZWJ extends it (even if in my opinion it is "defective" and would likely display on a dotted ciurcle in renderers, but not defective for the string definition of combining sequences). S

Potential contradiction between the WordBreak test data and UAX #29

2016-11-22 Thread Tom Hacohen
Dear, I recently updated libunibreak[1] according to unicode 9.0.0. I thought I implemented it correctly, however it fails against two of the tests in the reference test data: ÷ 200D × 0308 ÷ 2764 ÷ # ÷ [0.2] ZERO WIDTH JOINER (ZWJ_FE) × [4.0] COMBINING DIAERESIS (Extend_FE) ÷ [999.0] HEAVY