On Sun, Jul 3, 2011 at 10:07 PM, Hao Zheng <zheng...@chromium.org> wrote: >> There's at least two reasons for divergence .. one is that the port is >> actually doing the wrong thing, and the other is that the port is >> doing the "right" thing but the output is different anyway (e.g., a >> control is rendered differently). We cannot easily separate the two if >> we have only a single convention (platform-specific -expected files), >> but SKIPPING tests seems wrong for either category. > > Yes. I think separating the two categories is important. But we can do > it without -failing files. > 1. the port is doing the "right" thing but the output is different anyway. > We can 'rebaseline' these tests. ('rebaseline' means check in the > -expected files) > 2. the port is actually doing the wrong thing > We should NOT 'rebaeline' them. Instead, we should add them into > test_expectations.txt with a bug number. We can easily track all > failures we have at the specific time by just seeing > test_expectations.txt, and opening the related bug if we want to see > more detailed description. > > Both things can be done under current test framework. Adding -failing > files will make the huge layout tests effort even more complicated. > Anyway, we only want to know which tests are failing, but not to what > extent do they fail. If we want to know that, it means our tests are > not reduced to the proper scale. Of course there are many 'big' tests, > like acid tests, but I think the potential problems covered by these > tests can also be covered by other small tests; if that's not the > case, we just need to add some more small tests. So IMO -failing files > are not necessary. >
The problem with your idea is I think what brought this idea up in the first place: if you just track that the test is failing using the test_expectations.txt file, but don't track *how* it is failing (by using something like the -failing.txt idea, or a new -expected.txt file), then you cannot tell when the failing output changes, and you may miss significant regressions. -- Dirk >> It seems like -failing gives you the control you would want, no? >> Obviously, it wouldn't help the thousands of -expected files that are >> "wrong" but at least it could keep things from getting worse. >> > > How to correct thousands of the wrong files is really a big problem... > > On Sat, Jul 2, 2011 at 6:37 AM, Dirk Pranke <dpra...@chromium.org> wrote: >> On Fri, Jul 1, 2011 at 3:24 PM, Darin Fisher <da...@chromium.org> wrote: >>> On Fri, Jul 1, 2011 at 3:04 PM, Darin Adler <da...@apple.com> wrote: >>>> >>>> On Jul 1, 2011, at 2:54 PM, Dirk Pranke wrote: >>>> >>>> > Does that apply to -expected.txt files in the base directories, or just >>>> > platform-specific exceptions? >>>> >>>> Base directories. >>>> >>>> Expected files contain output reflecting the behavior of WebKit at the >>>> time the test was checked in. The expected result when we re-run a test. >>>> Many expected files contain text that says “FAIL” in them. The fact that >>>> these expected results are not successes, but rather expected failures does >>>> not seem to me to be a subtle point, but one of the basic things about how >>>> these tests are set up. >>> >>> Right, it helps us keep track of where we are, so that we don't regress, and >>> only make forward progress. >>> >>>> >>>> > I wonder how it is that I've been working (admittedly, mostly on >>>> > tooling) in WebKit for more that two years and this is the first I'm >>>> > hearing >>>> > about this. >>>> >>>> I’m guessing it’s because you have been working on Chrome. >>>> >>>> The Chrome project came up with a different system for testing layered on >>>> top of the original layout test machinery based on different concepts. I >>>> don’t think anyone ever discussed that system with me; I was the one who >>>> created the original layout test system, to help Dave Hyatt originally, and >>>> then later the rest of the team started using it. >>> >>> The granular annotations (more than just SKIP) in test_expectations.txt was >>> something we introduced back when Chrome was failing a large percentage of >>> layout tests, and we needed a system to help us triage the failures. It was >>> useful to distinguish tests that crash from tests that generate bad results, >>> for example. We then focused on the crashing tests first. >>> In addition, we wanted to understand how divergent we were from the standard >>> WebKit port, and we wanted to know if we were failing to match text results >>> or just image results. This allowed us to measure our degree of >>> incompatibility with standard WebKit. We basically used this mechanism to >>> classify differences that mattered and differences that didn't matter. >>> I think that if we had just checked in a bunch of port-specific "failure" >>> expectations as -expected files, then we would have had a hard time >>> distinguishing failures we needed to fix for compat reasons from failures >>> that were expected (e.g., because we have different looking form controls). >>> I'm not sure if we are at a point now where this mechanism isn't useful, but >>> I kind of suspect that it will always be useful. Afterall, it is not >>> uncommon for a code change to result in different rendering behavior between >>> the ports. I think it is valuable to have a measure of divergence between >>> the various WebKit ports. We want to minimize such divergence from a web >>> compat point of view, of course. Maybe the count of SKIPPED tests is >>> enough? But, then we suffer from not running the tests at all. At least by >>> annotating expected IMAGE failures, we get to know that the TEXT output is >>> the same and that we don't expect a CRASH. >> >> There's at least two reasons for divergence .. one is that the port is >> actually doing the wrong thing, and the other is that the port is >> doing the "right" thing but the output is different anyway (e.g., a >> control is rendered differently). We cannot easily separate the two if >> we have only a single convention (platform-specific -expected files), >> but SKIPPING tests seems wrong for either category. >> >> It seems like -failing gives you the control you would want, no? >> Obviously, it wouldn't help the thousands of -expected files that are >> "wrong" but at least it could keep things from getting worse. >> >> I will note that reftests might solve some issues but not all of them >> (since obviously code could render both pages "wrong"). >> >> -- Dirk >> >>> I suspect this isn't the best solution to the problem though. >>> -Darin >>> >>> >>>> >>>> > Are there reasons we [are] doing things this way[?] >>>> >>>> Sure. The idea of the layout test framework is to check if the code is >>>> still behaving as it did when the test was created and last run; we want to >>>> detect any changes in behavior that are not expected. When there are >>>> expected changes in behavior, we change the contents of the expected >>>> results >>>> files. >>>> >>>> It seems possibly helpful to augment the test system with editorial >>>> comments about which tests show bugs that we’d want to fix. But I wouldn’t >>>> want to stop running all regression tests where the output reflects the >>>> effects of a bug or missing feature. >>>> >>>> -- Darin >>>> >>> >>> >>> >> _______________________________________________ >> webkit-dev mailing list >> webkit-dev@lists.webkit.org >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev >> > _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev