Not to worry, these things happen to the best of us. Just glad the root of the problem was found.
Mark Mark On Sat, Jul 14, 2018 at 5:51 PM, Karl Williamson <[email protected]> wrote: > On 07/09/2018 02:11 PM, Karl Williamson via Unicode wrote: > >> On 07/08/2018 03:21 AM, Mark Davis ☕️ wrote: >> >>> I'm surprised that the tests for 11.0 passed for a 10.0 implementation, >>> because the following should have triggered a difference for WB. Can you >>> check on this particular case? >>> >>> ÷ 0020 × 0020 ÷#÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷ >>> [0.3] >>> >> >> I'm one of the people who advocated for this change, and I had already >> tailored our implementation of 10.0 to not break between horizontal white >> space, so it's actually not surprising that this rule didn't break >> >>> >>> > It turns out that the fault was all mine; the Unicode 11.0 tests were > failing on a 10.0 implementation. I'm sorry for starting this red herring > thread. > > If you care to know the details, read on. > > The code that runs the tests knows what version of the UCD it is using, > and it knows what version of the UAX boundary algorithms it is using. If > these differ, it emits a warning about the discrepancy, and expects that > there are going to be many test failures, so it marks all failing ones as > 'To do' which suppresses their output, so as to not distract from any other > failures that have been introduced by using the new UCD version. (Updating > the algorithm comes last.) > > The solution for the future is to change the warning about the discrepancy > to note that the failing boundary algorithm tests are suppressed. This > will clue me (or whoever) in that all is not necessarily well. > > > >>> About the testing: >>> >>> The tests are generated so that they go all the combinations of pairs, >>> and some combinations of triples. The generated test cases use a sample >>> from each partition of characters, to cut down on the file size to a >>> reasonable level. That also means that some changes in the rules don't >>> cause changes in the test results. Because it is not possible to test every >>> combination, so there is also provision for additional test cases, such as >>> those at the end of the files, eg: >>> >>> https://unicode.org/Public/11.0.0/ucd/auxiliary/WordBreakTest.html >>> https://unicode.org/Public/10.0.0/ucd/auxiliary/WordBreakTest.html >>> >>> We should extend those each time to make sure we cover combinations that >>> aren't covered by pairs. There were some additions to that end; if they >>> didn't cover enough cases, then we can look at your experience to add more. >>> >>> I can suggest two strategies for further testing: >>> >>> 1. To do a full test, for each row check every combinations obtained by >>> replacing each sample character by every other character in its >>> partition. Eg for the above line that would mean testing every <WSegSpace, >>> WSegSpace> sequence. >>> >>> 2. Use a monkey test against ICU. That is, generate random combinations >>> of characters from different partitions and check that ICU and your >>> implementation are in sync. >>> >>> 3. During the beta period, test your previous-version with the new test >>> files. If there are no failures, yet there are changes in the rules, then >>> raise that issue during the beta period so we can add tests. >>> >> >> I actually did this, and as I recall, did find some test failures. In >> retrospect, I must have screwed up somehow back then. I was under tight >> deadline pressure, and as a result, did more cursory beta testing than >> normal. >> >>> >>> 4. If possible, during the beta period upgrade your implementation and >>> test against the new and old test files. >>> >> >> >>> Anyone else have other suggestions for testing? >>> >>> Mark >>> >>> >> As an aside, a release or two ago, I implemented SB, and someone >> immediately found a bug, and accused me of releasing software that had not >> been tested at all. He had looked through the test suite and not found >> anything that looked like it was testing that. But he failed to find the >> test file which bundled up all your tests, in a manner he was not >> accustomed to, so it was easy for him to overlook. The bug only manifested >> itself in longer runs of characters than your pairs and triples tested. I >> looked at it, and your SB tests still seemed reasonable, and I should not >> expect a more complete series than you furnished. >> >>> >>> >>> Mark >>> ////// >>> >>> On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode < >>> [email protected] <mailto:[email protected]>> wrote: >>> >>> I am working on upgrading from Unicode 10 to Unicode 11. >>> >>> I used all the new files. >>> >>> The algorithms for some of the boundaries, like GCB and WB, have >>> changed so that some of the property values no longer have code >>> points associated with them. >>> >>> I ran the tests furnished in 11.0 for these boundaries, without >>> having changed the algorithms from earlier releases. All passed >>> 100%. >>> >>> Unless I'm missing something, that indicates that the tests >>> furnished in 11.0 do not contain instances that exercise these >>> changes. My guess is that the 10.0 tests were also deficient. >>> >>> I have been relying on the UCD to furnish tests that have enough >>> coverage to sufficiently exercise the algorithms that are specified >>> in UAX 31, but that appears to have been naive on my part >>> >>> >>> >> >> >> >

