PS, although the title was "Missing UAX#31 tests?", I assumed you were talking about http://unicode.org/reports/tr29/
Mark On Sun, Jul 8, 2018 at 11:21 AM, Mark Davis ☕️ <m...@macchiato.com> wrote: > I'm surprised that the tests for 11.0 passed for a 10.0 implementation, > because the following should have triggered a difference for WB. Can you > check on this particular case? > > ÷ 0020 × 0020 ÷ # ÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷ > [0.3] > > About the testing: > > The tests are generated so that they go all the combinations of pairs, and > some combinations of triples. The generated test cases use a sample from > each partition of characters, to cut down on the file size to a reasonable > level. That also means that some changes in the rules don't cause changes > in the test results. Because it is not possible to test every > combination, so there is also provision for additional test cases, such as > those at the end of the files, eg: > > https://unicode.org/Public/11.0.0/ucd/auxiliary/WordBreakTest.html > https://unicode.org/Public/10.0.0/ucd/auxiliary/WordBreakTest.html > > We should extend those each time to make sure we cover combinations that > aren't covered by pairs. There were some additions to that end; if they > didn't cover enough cases, then we can look at your experience to add more. > > I can suggest two strategies for further testing: > > 1. To do a full test, for each row check every combinations obtained by > replacing each sample character by every other character in its > partition. Eg for the above line that would mean testing every <WSegSpace, > WSegSpace> sequence. > > 2. Use a monkey test against ICU. That is, generate random combinations of > characters from different partitions and check that ICU and your > implementation are in sync. > > 3. During the beta period, test your previous-version with the new test > files. If there are no failures, yet there are changes in the rules, then > raise that issue during the beta period so we can add tests. > > 4. If possible, during the beta period upgrade your implementation and > test against the new and old test files. > > Anyone else have other suggestions for testing? > > Mark > > > > > Mark > > On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode < > unicode@unicode.org> wrote: > >> I am working on upgrading from Unicode 10 to Unicode 11. >> >> I used all the new files. >> >> The algorithms for some of the boundaries, like GCB and WB, have changed >> so that some of the property values no longer have code points associated >> with them. >> >> I ran the tests furnished in 11.0 for these boundaries, without having >> changed the algorithms from earlier releases. All passed 100%. >> >> Unless I'm missing something, that indicates that the tests furnished in >> 11.0 do not contain instances that exercise these changes. My guess is >> that the 10.0 tests were also deficient. >> >> I have been relying on the UCD to furnish tests that have enough coverage >> to sufficiently exercise the algorithms that are specified in UAX 31, but >> that appears to have been naive on my part >> > >