Re: [webkit-dev] Skipping Flakey Tests

2009-12-22 Thread Maciej Stachowiak


On Dec 22, 2009, at 10:31 AM, Darin Adler wrote:


On Dec 21, 2009, at 6:14 PM, Dirk Pranke wrote:

Given all that, Darin, what were you suggesting when you said  
"Let's fix that"?


Lets add a feature so something in the tests tree can indicate a  
Chromium Windows result should be the base result, rather than the  
platform/win one. We can debate exactly how it should work, but lets  
come up with a design and do it.


One possibility is to have three windows-related result directories,  
win/, win-cg/ and win-chromium/. We'd use the latter two for test  
results that are different between the CoreGraphics/CFNetwork/Apple  
port vs. the Chromium/Google port. Any results that should be common  
to all windows-based platforms.


Note: As far as I can tell, this problem is all about granularity of  
applying the expected results, not about whether we check in known  
failing results instead of out-of-band FAIL expectations. The latter  
may or may not be a good idea(*), but it seems independent of the  
original problem. Some of Dirk and Peter's messages seemed to conflate  
these two points.


Regards,
Maciej

* - I tend to think we should track expected failure results, for the  
reasons Darin cited and also because tests that "fail" already can  
still catch further regressions.


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-22 Thread Dirk Pranke
On Tue, Dec 22, 2009 at 4:58 PM, Darin Adler  wrote:
> On Dec 22, 2009, at 4:27 PM, Dirk Pranke wrote:
>
>> In the completely generic case, I hope we are not checking in incorrect 
>> results.
>
> We do intentionally check in incorrect results, fairly often. For example, 
> we’ve checked in whole test suites and then generated expected results 
> without studying the tests to see which ones are successful and which are 
> failures.
>

Interesting. I wasn't aware of that, and I guess I hadn't noticed it yet.

>> An alternative would be to move to the more general syntax (and hopefully, 
>> just move to the tool) that Chromium uses.
>
> I’m surprised that Chromium developed a separate tool. If instead the 
> Chromium team had enhanced the WebKit project’s shared run-webkit-tests we’d 
> be better off. How did we end up with two separate tools?!

That I couldn't tell you, as the decision precedes me joining the
team; I'm sure someone else can chime in. I don't think anyone would
argue that one tool would be better than two, and Eric Seidel and I
have been working on a plan to merge the two feature sets so that we
do end up with only one tool; the major plusses to the Chromium tool
are that it has a more expressive syntax for tracking failures across
multiple platforms, and it can run tests in parallel across multiple
cores, so it tends to be 3x faster than the perl version (at least on
my 4-CPU MacPro). I do know the WebKit version supports a bunch of
switches and features that the Chromium tool doesn't, but they're
mostly switches I've never needed to use, I think, so I couldn't tell
you off the top of my head what they are.

>> Second, there's the question of whether or not you want to track what the 
>> "expected incorrect" results are, separate from what the "expected correct" 
>> results are. That way, you can detect when a test fails *differently* than 
>> it has been in the past.
>
> I do think we want to track this. It’s part of why the original system worked 
> they way it did when I created it back in 2005.

Good to know. Being a fan of this feature myself, I would happily add it.

> Also, in some cases it may be difficult to generate “correct” results if the 
> engine doesn’t yet have correct behavior at the time the test is being 
> created.

True enough.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-22 Thread Darin Adler
On Dec 22, 2009, at 4:27 PM, Dirk Pranke wrote:

> In the completely generic case, I hope we are not checking in incorrect 
> results.

We do intentionally check in incorrect results, fairly often. For example, 
we’ve checked in whole test suites and then generated expected results without 
studying the tests to see which ones are successful and which are failures.

> An alternative would be to move to the more general syntax (and hopefully, 
> just move to the tool) that Chromium uses.

I’m surprised that Chromium developed a separate tool. If instead the Chromium 
team had enhanced the WebKit project’s shared run-webkit-tests we’d be better 
off. How did we end up with two separate tools?!

> Second, there's the question of whether or not you want to track what the 
> "expected incorrect" results are, separate from what the "expected correct" 
> results are. That way, you can detect when a test fails *differently* than it 
> has been in the past.

I do think we want to track this. It’s part of why the original system worked 
they way it did when I created it back in 2005.

Also, in some cases it may be difficult to generate “correct” results if the 
engine doesn’t yet have correct behavior at the time the test is being created.

-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-22 Thread Dirk Pranke
On Tue, Dec 22, 2009 at 10:31 AM, Darin Adler  wrote:
> On Dec 21, 2009, at 6:14 PM, Dirk Pranke wrote:
>
>> Given all that, Darin, what were you suggesting when you said "Let's fix 
>> that"?
>
> Lets add a feature so something in the tests tree can indicate a Chromium 
> Windows result should be the base result, rather than the platform/win one. 
> We can debate exactly how it should work, but lets come up with a design and 
> do it.
>

For a given test, either the test produces generic results (and the
results are checked in alongside the test), the test produces "mostly
generic" results (meaning most platforms/ports can use the generic
results, but some intentionally diverge), or the test produces
completely platform-specific results.

In the completely generic case, I hope we are not checking in
incorrect results. Are we concerned about this case?

I think the "mostly generic" case is probably a variant of the
"generic" case, and should have the same policy.

That leaves the "platform-specific" case. In this case, marking any
particular platform as "right" doesn't make a lot of sense, because
what's right for one platform may or may not be right for another. The
problem comes up in ports like Chromium that use a search path for
results. I would not suggest that we change anything here - if
platform/win/foo-expected.txt is "wrong", we should probably just
check in an override in platform/chromium-win/foo-expected.txt . If
too many of these situations occur, we're probably just better off
dropping platform/win from the search path (which is what I think we
actually probably should do in in our win port, but I leave that as an
excercise for me to determine).

So, I don't think we need to change anything to address the above issues.

There are one or two other points of design.

First, there's the question of whether or not "intentionally
incorrect" results should ever be checked in. One reason to do this is
because run-webkit-tests doesn't have a "FAIL" concept, just a
"SKIPPED" concept. It would be easy to do this, and probably the best
way to do this is to add a "Failures" file alongside the "Skipped"
file, using the same syntax. An alternative would be to move to the
more general syntax (and hopefully, just move to the tool) that
Chromium uses.

Second, there's the question of whether or not you want to track what
the "expected incorrect" results are, separate from what the "expected
correct" results are. That way, you can detect when a test fails
*differently* than it has been in the past. It is an open question as
to how useful and/or how much maintenance it would be to do this. If
we were to do it, I would suggest adding something like
"foo-failure.txt" files alongside the "foo-expected.txt" files.

To sum up:

(1) For platform-specific failures, we should either (a) check in new
overriding baselines or (b) fix the baseline search path. No
significant code changes are needed.

(2) For generic failures, we can either (a) add a "Failures" file, (b)
implement Chromium's test-expectations syntax, (c) move to Chromium's
tools (getting b along the way), (d) check in incorrect output as the
"expected results" and add platform-specific baselines for platforms
that "get it right".

(3) If you want to capture "expected incorrect" *and* "expected
correct", add a "-failure" set of expectations and mod the tools
accordingly.

I would vote for (1a) or (1b) (basically status quo), and (2c). I
really don't like (2d), and (2b) seems like a waste of effort compared
to (2c). If we're unclear if (2c) is really valuable, I would
volunteer to implement (2a) as a stopgap (although it won't happen
until after the holidays). I would not bother to implement (3) at this
point, but I won't stop someone else from doing it, either.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-22 Thread Darin Adler
On Dec 21, 2009, at 6:14 PM, Dirk Pranke wrote:

> Given all that, Darin, what were you suggesting when you said "Let's fix 
> that"?

Lets add a feature so something in the tests tree can indicate a Chromium 
Windows result should be the base result, rather than the platform/win one. We 
can debate exactly how it should work, but lets come up with a design and do it.

-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-21 Thread Peter Kasting
On Mon, Dec 21, 2009 at 6:14 PM, Dirk Pranke  wrote:

> The Chromium framework doesn't look at the Skipped files, but does
> look at the Safari Win baselines (we'll use those if we don't find a
> better match).
>
> Given all that, Darin, what were you suggesting when you said "Let's fix
> that"?


I assumed it meant "There should be a distinction between all-Win-platforms
baselines and Safari/Win-only baselines", and then we could make the
Chromium harness look at the former but not the latter, and put this
particular case into the latter.

I have no idea who's making that change, or whatever other change we decide
on, though.

PK
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-21 Thread Dirk Pranke
Somewhere in between the two of you I got lost.

Which "the framework" are you referring to? If you're referring to the
run-webkit-tests in WebKitTools/Scripts, you are correct that it has
no way to distinguish Safari/Win from Chromium/Win. This doesn't
really matter, since this framework isn't used by Chromium.

The Chromium framework doesn't look at the Skipped files, but does
look at the Safari Win baselines (we'll use those if we don't find a
better match). To Peter's point, this means that we have to check in a
correct expected result to override the incorrect one in platform/win,
which is annoying. I don't see any obvious way to get around this,
except that I think it probably does us little to no good to use the
Win expected results at all, and we should probably just skip them in
general. This doesn't really solve the problem in general (or if Mac
has this problem).

Note that if/when we happen to upstream the Chromium version of
run_webkit_tests, it does support marking files as 'FAIL', although it
does not currently allow us to capture 'expected fail' results
separately from 'expected pass' results. So, if we were all using
Chromium's run_webkit_tests, we could mark the Safari/Win version as
expected to fail (separately from SKIP), but I'm not sure if this is
what the Safari/Win guys want.

Given all that, Darin, what were you suggesting when you said "Let's fix that"?

-- Dirk

On Mon, Dec 21, 2009 at 1:54 PM, Darin Adler  wrote:
> On Dec 21, 2009, at 1:50 PM, Peter Kasting wrote:
>
>> the framework doesn't seem to have a way of distinguishing Safari/Win from 
>> Chromium/Win
>
> Lets fix that.
>
>    -- Darin
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-21 Thread Darin Adler
On Dec 21, 2009, at 1:50 PM, Peter Kasting wrote:

> the framework doesn't seem to have a way of distinguishing Safari/Win from 
> Chromium/Win

Lets fix that.

-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-12-21 Thread Peter Kasting
On Thu, Oct 1, 2009 at 10:41 AM, Drew Wilson  wrote:

> In this case, there was a failure in one of the layout tests on the windows
> platform, so following the advice below, aroben correctly checked in an
> update to the test expectations instead of skipping the tests.
>
> Downstream, this busted the Chromium tests, because that failure was not
> happening in Chromium, and now our correct test output doesn't match the
> incorrect test output that's been codified in the test expecations. We can
> certainly manage this downstream by rebaselining the test and managing a
> custom chromium test expectation, but that's a pain and is somewhat fragile
> as it requires maintenance every time someone adds a new test case to the
> test.
>

This came up again this past Friday.
http://trac.webkit.org/changeset/52324added purposefully-failing
results for WebKit Windows, which broke Chromium
downstream because we don't fail the test.

Darin's original reply here included the line "And we should structure test
results and exceptions so that it’s easy to get the expected failure on the
right platforms and success on others."  It seems like this isn't the case
currently, since the framework doesn't seem to have a way of distinguishing
Safari/Win from Chromium/Win, meaning the only way we can express the
current state of affairs via result snapshots is to check in a bad baseline
over the good one for all Windows ports and then have each port that passes
check in a good baseline.  This is a pretty poor experience :(, and more
than "a slight inconvenience" as Darin dismissively termed it.

I liked Dirk's idea of being able to note that a test is failing, rather
than skipping it or checking in a bogus baseline.  I don't see a lot of
value in the bad-baseline strategy beyond keeping the test running, and
noting "this test fails on Safari/Win" accomplishes that same objective in a
less-misleading and more-other-port-friendly fashion.

PK
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Drew Wilson
OK, I agree as well - skipping is not a good solution here; I don't think
the status quo is perfect, yet probably not imperfect enough to do anything
about :)
I guess there's just a process wrinkle we need to address on the Chromium
side. It's easy to rebaseline a test in Chromium, but less easy to figure
out when it's safe to un-rebaseline it.
-atw

On Thu, Oct 1, 2009 at 11:57 AM, Eric Seidel  wrote:

> I agree with Darin.  I don't think that this is a good example of
> where skipping would be useful.
>
> I think more you're identifying that there is a test hierarchy problem
> here.  Chromium really wants to base its tests off of some base "win"
> implementation, and then "win-apple", "win-chromium", "win-cairo"
> results could derive from that, similar to how "mac" and
> "mac-leopard", "mac-tiger", "mac-snowleopard" work.
>
> > I think we should skip only tests that endanger the testing strategy
> because
> > they are super-slow, crash, or adversely affect other tests in some way.
>
> Back to the original topic:  I do however see flakey tests as
> "endangering our testing strategy" because they provide false
> negatives, and greatly reduce the value of the layout tests and things
> which run the layout tests, like the buildbots or the commit-bot.
>
> I also agree with Darin's earlier comment that WebKit needs something
> like Chromium's multiple-expected results support so that we can
> continue to run flakey tests, even if they're flakey instead of having
> to resort to skipping them.  But for now, skipping is the best we
> have, and I still encourage us to use it when necessary instead of
> leaving layout tests flakey. :)
>
> -eric
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Darin Adler

On Oct 1, 2009, at 11:58 AM, Eric Seidel wrote:

I think more you're identifying that there is a test hierarchy  
problem here.  Chromium really wants to base its tests off of some  
base "win" implementation, and then "win-apple", "win-chromium",  
"win-cairo" results could derive from that, similar to how "mac" and  
"mac-leopard", "mac-tiger", "mac-snowleopard" work.


Something like that would be excellent if this pattern turns up often.  
I don’t think we should make the change because of one test, but if it  
comes up a lot we definitely should.


Back to the original topic:  I do however see flakey tests as  
"endangering our testing strategy" because they provide false  
negatives, and greatly reduce the value of the layout tests and  
things which run the layout tests, like the buildbots or the commit- 
bot.


I also agree with Darin's earlier comment that WebKit needs  
something like Chromium's multiple-expected results support so that  
we can continue to run flakey tests, even if they're flakey instead  
of having to resort to skipping them.  But for now, skipping is the  
best we have, and I still encourage us to use it when necessary  
instead of leaving layout tests flakey. :)


I agree on all of this.

Except that the two specific flakey tests we were discussing that got  
us started on this discussion were really serious bugs and it was  
really good to fix them rather than skipping them. After this  
experience, I now do share Alexey’s fear that if we had skipped them  
we would not have fixed the regression. Best, if possible, would have  
been to notice when they turned from reliable tests to flakey tests  
and rolled the change that made them flakey out.


-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Eric Seidel
I agree with Darin.  I don't think that this is a good example of
where skipping would be useful.

I think more you're identifying that there is a test hierarchy problem
here.  Chromium really wants to base its tests off of some base "win"
implementation, and then "win-apple", "win-chromium", "win-cairo"
results could derive from that, similar to how "mac" and
"mac-leopard", "mac-tiger", "mac-snowleopard" work.

> I think we should skip only tests that endanger the testing strategy because
> they are super-slow, crash, or adversely affect other tests in some way.

Back to the original topic:  I do however see flakey tests as
"endangering our testing strategy" because they provide false
negatives, and greatly reduce the value of the layout tests and things
which run the layout tests, like the buildbots or the commit-bot.

I also agree with Darin's earlier comment that WebKit needs something
like Chromium's multiple-expected results support so that we can
continue to run flakey tests, even if they're flakey instead of having
to resort to skipping them.  But for now, skipping is the best we
have, and I still encourage us to use it when necessary instead of
leaving layout tests flakey. :)

-eric

On Thu, Oct 1, 2009 at 11:47 AM, Darin Adler  wrote:
> On Oct 1, 2009, at 11:41 AM, Drew Wilson wrote:
>
>> I don't have an opinion about flakey tests, but flat-out-busted tests
>> should get skipped. Any thoughts/objections?
>
> I object.
>
> If a test fails on some platforms and succeeds on others, we should have the
> success result checked in as the default case, and the failure as an
> exception. And we should structure test results and exceptions so that it’s
> easy to get the expected failure on the right platforms and success on
> others. Your story about a slight inconvenience because a test failed on the
> base Windows WebKit but succeeded on the Chromium WebKit does not seem like
> a reason to change this!
>
> Skipping the test does not seem like a good thing to do for the long term
> health of the project. It is good to exercise all the other code each test
> covers and also to notice when a test result gets even worse or gets better
> when a seemingly unrelated change is made.
>
> I think we should skip only tests that endanger the testing strategy because
> they are super-slow, crash, or adversely affect other tests in some way.
>
>    -- Darin
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Dirk Pranke
On Thu, Oct 1, 2009 at 11:47 AM, Darin Adler  wrote:
> On Oct 1, 2009, at 11:41 AM, Drew Wilson wrote:
>
>> I don't have an opinion about flakey tests, but flat-out-busted tests
>> should get skipped. Any thoughts/objections?
>
> I object.
>
> If a test fails on some platforms and succeeds on others, we should have the
> success result checked in as the default case, and the failure as an
> exception. And we should structure test results and exceptions so that it’s
> easy to get the expected failure on the right platforms and success on
> others. Your story about a slight inconvenience because a test failed on the
> base Windows WebKit but succeeded on the Chromium WebKit does not seem like
> a reason to change this!
>
> Skipping the test does not seem like a good thing to do for the long term
> health of the project. It is good to exercise all the other code each test
> covers and also to notice when a test result gets even worse or gets better
> when a seemingly unrelated change is made.
>
> I think we should skip only tests that endanger the testing strategy because
> they are super-slow, crash, or adversely affect other tests in some way.
>

I agree that skipping the test is the wrong thing to do. However,
checking in an incorrect baseline over the correct baseline is also
the wrong thing to do (because, as Drew points out, this can break
other platforms that don't have the bug).

Chromium does have the concept of marking tests as expected to FAIL,
but it does not have a way to capture what the expected failure is
(i.e., there is no way to capture a "FAIL" baseline). We discussed
this recently and punted on it because it was unclear how useful this
would really be, and -- as we all probably agree -- it's better not to
have failing tests in the first place.

Eric and Dimitry have suggested that we look into pulling the Chromium
expectations framework upstream into Webkit and adding the features
that WebKit's framework has that Chromium's doesn't. It sounds to me
like this might be the right long-term solution, and I'd be happy to
work on it.

In the meantime, maybe it makes sense to add Fail files alongside the
Skipped files? That would allow the bots to stay green, but would at
least keep the tests running.

-- Dirk
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Darin Adler

On Oct 1, 2009, at 11:41 AM, Drew Wilson wrote:

I don't have an opinion about flakey tests, but flat-out-busted  
tests should get skipped. Any thoughts/objections?


I object.

If a test fails on some platforms and succeeds on others, we should  
have the success result checked in as the default case, and the  
failure as an exception. And we should structure test results and  
exceptions so that it’s easy to get the expected failure on the right  
platforms and success on others. Your story about a slight  
inconvenience because a test failed on the base Windows WebKit but  
succeeded on the Chromium WebKit does not seem like a reason to change  
this!


Skipping the test does not seem like a good thing to do for the long  
term health of the project. It is good to exercise all the other code  
each test covers and also to notice when a test result gets even worse  
or gets better when a seemingly unrelated change is made.


I think we should skip only tests that endanger the testing strategy  
because they are super-slow, crash, or adversely affect other tests in  
some way.


-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-10-01 Thread Drew Wilson
I wanted to re-open this discussion with some real-world feedback.
In this case, there was a failure in one of the layout tests on the windows
platform, so following the advice below, aroben correctly checked in an
update to the test expectations instead of skipping the tests.

Downstream, this busted the Chromium tests, because that failure was not
happening in Chromium, and now our correct test output doesn't match the
incorrect test output that's been codified in the test expecations. We can
certainly manage this downstream by rebaselining the test and managing a
custom chromium test expectation, but that's a pain and is somewhat fragile
as it requires maintenance every time someone adds a new test case to the
test.

I'd really like to suggest that we skip broken tests rather than codify
their breakages in the expectations file. Perhaps we'd make exceptions to
this rule for tests that have a bunch of working test cases (in which case
there's value in running the other test cases instead of skipping the entire
test). But in general it's less work for everyone just to skip broken tests.

I don't have an opinion about flakey tests, but flat-out-busted tests should
get skipped. Any thoughts/objections?

-atw

On Fri, Sep 25, 2009 at 1:59 PM, Darin Adler  wrote:

> Green buildbots have a lot of value.
>
> I think it’s worthwhile finding a way to have them even when there are test
> failures.
>
> For predictable failures, the best approach is to land the expected failure
> as an expected result, and use a bug to track the fact that it’s wrong. To
> me this does seem a bit like “sweeping something under the rug”, a bug
> report is much easier to overlook than a red buildbot. We don’t have a great
> system for keeping track of the most important bugs.
>
> For tests that give intermittent and inconsistent results, the best we can
> currently do is to skip the test. I think it would make sense to instead
> allow multiple expected results. I gather that one of the tools used in the
> Chromium project has this concept and I think there’s no real reason not to
> add the concept to run-webkit-tests as long as we are conscientious about
> not using it when it’s not needed. And use a bug to track the fact that the
> test gives insufficient results. This has the same downsides as landing the
> expected failure results.
>
> For tests that have an adverse effect on other tests, the best we can
> currently do is to skip the test.
>
> I think we are overusing the Skipped machinery at the moment for platform
> differences. I think in many cases it would be better to instead land an
> expected failure result. On the other hand, one really great thing about the
> Skipped file is that there’s a complete list in the file, allowing everyone
> to see the list. It makes a good to do list, probably better than just a
> list of bugs. This made Darin Fisher’s recent “why are so many tests
> skipped, lets fix it” message possible.
>
>-- Darin
>
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread Alexey Proskuryakov


28.09.2009, в 17:00, Maciej Stachowiak написал(а):


p.s. I now have two "skipping flakey tests" changes up for review:
https://bugs.webkit.org/show_bug.cgi?id=29322


If Brady and Alexey are ok with disabling this test, then I'm fine  
with it. I like Alexey's suggestion to have a separate "flaky tests"  
list that the commit queue script can ignore, without preventing  
them from being run at all.



I'm investigating this one now, it looks like we have a refcounting  
issue with some new code in CString/CStringBuffer. I'd prefer to keep  
this test enabled (and it's a good thing that it's been causing lots  
of pain, that's what keeps regression count low).


- WBR, Alexey Proskuryakov

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread David Levin
On Mon, Sep 28, 2009 at 5:01 PM, Maciej Stachowiak  wrote:

>
> On Sep 28, 2009, at 4:47 PM, David Levin wrote:
>
> I don't believe that the test was checked in a flaky state. It was solid
> for a long time and then something happened...
>
>
> What's "the test" in this context? The network / credentials one?
>

Yep.


>
>
> I'll try to add more logging to this test this evening (after my turn at
> helping chromium stay up to date with WebKit is over).
>
> I'll ping Drew about the other test.
>
>
> It sounds like that test was always buggy, if Drew is right about the
> cause.
>
>  - Macie
>
>
> Dave
>
>
> On Mon, Sep 28, 2009 at 4:40 PM, Eric Seidel  wrote:
>
>> On Fri, Sep 25, 2009 at 2:42 PM, Maciej Stachowiak  wrote:
>> > I like Dave Levin's idea that the first action should be to instrument
>> the
>> > tests so we can find out why they intermittently fail. Especially if the
>> > failure is reproducible on the bots but not on developer systems.
>>
>> I like this idea too.  I don't like the reality that flakey tests are
>> a burden on all developers caused by one.
>>
>> > Using the
>> > skip list should be a last resort, because that hides the failure
>> instead of
>> > helping us diangose the cause.
>>
>> I (respectfully) disagree.  I think we shouldn't be so afraid to skip
>> tests.  We don't allow people to check in compiles which fail.  We
>> don't allow people to check in tests which fail on other platforms
>> (without skipping them) or on every other run.  Why should we allow
>> people to check in tests which fail every 10 runs?  Or worse, why
>> should we leave a known flakey test checked in/un-attended which fails
>> every 10 runs?
>>
>> If we can't easily roll-out the failing tests (or the commit which
>> cause them to start failing), we should skip them to keep the bots (a
>> shared resource) green, so as not to block other work on the project.
>> No?
>>
>> I very much like WebKit's "everyone is responsible for the whole
>> project" culture, but I disagree that the burden of diagnosis should
>> be on the person trying to make a completely unrelated checkin (as is
>> the case when we leave flakey tests enabled in the tree).
>>
>> -eric
>>
>> p.s. I now have two "skipping flakey tests" changes up for review:
>> https://bugs.webkit.org/show_bug.cgi?id=29322
>> https://bugs.webkit.org/show_bug.cgi?id=29344
>> ___
>> webkit-dev mailing list
>> webkit-dev@lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>
>
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread Maciej Stachowiak


On Sep 28, 2009, at 4:47 PM, David Levin wrote:

I don't believe that the test was checked in a flaky state. It was  
solid for a long time and then something happened...


What's "the test" in this context? The network / credentials one?



I'll try to add more logging to this test this evening (after my  
turn at helping chromium stay up to date with WebKit is over).


I'll ping Drew about the other test.


It sounds like that test was always buggy, if Drew is right about the  
cause.


 - Macie



Dave


On Mon, Sep 28, 2009 at 4:40 PM, Eric Seidel  wrote:
On Fri, Sep 25, 2009 at 2:42 PM, Maciej Stachowiak   
wrote:
> I like Dave Levin's idea that the first action should be to  
instrument the
> tests so we can find out why they intermittently fail. Especially  
if the

> failure is reproducible on the bots but not on developer systems.

I like this idea too.  I don't like the reality that flakey tests are
a burden on all developers caused by one.

> Using the
> skip list should be a last resort, because that hides the failure  
instead of

> helping us diangose the cause.

I (respectfully) disagree.  I think we shouldn't be so afraid to skip
tests.  We don't allow people to check in compiles which fail.  We
don't allow people to check in tests which fail on other platforms
(without skipping them) or on every other run.  Why should we allow
people to check in tests which fail every 10 runs?  Or worse, why
should we leave a known flakey test checked in/un-attended which fails
every 10 runs?

If we can't easily roll-out the failing tests (or the commit which
cause them to start failing), we should skip them to keep the bots (a
shared resource) green, so as not to block other work on the project.
No?

I very much like WebKit's "everyone is responsible for the whole
project" culture, but I disagree that the burden of diagnosis should
be on the person trying to make a completely unrelated checkin (as is
the case when we leave flakey tests enabled in the tree).

-eric

p.s. I now have two "skipping flakey tests" changes up for review:
https://bugs.webkit.org/show_bug.cgi?id=29322
https://bugs.webkit.org/show_bug.cgi?id=29344
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread Maciej Stachowiak


On Sep 28, 2009, at 4:40 PM, Eric Seidel wrote:

On Fri, Sep 25, 2009 at 2:42 PM, Maciej Stachowiak   
wrote:
I like Dave Levin's idea that the first action should be to  
instrument the
tests so we can find out why they intermittently fail. Especially  
if the

failure is reproducible on the bots but not on developer systems.


I like this idea too.  I don't like the reality that flakey tests are
a burden on all developers caused by one.


Then in my opinion that's what we should do first, when a test is  
failing sporadically and the cause is unknown.



Using the
skip list should be a last resort, because that hides the failure  
instead of

helping us diangose the cause.


I (respectfully) disagree.  I think we shouldn't be so afraid to skip
tests.  We don't allow people to check in compiles which fail.  We
don't allow people to check in tests which fail on other platforms
(without skipping them) or on every other run.  Why should we allow
people to check in tests which fail every 10 runs?  Or worse, why
should we leave a known flakey test checked in/un-attended which fails
every 10 runs?


If a brand new test fails every 10 runs, then we should revert the  
patch that landed it, just as if it had caused a test to always fail.  
However, I get the impression that many "flaky test" issues appear  
after the fact - a test that has been running fine for a long time  
starts failing sporadically. In that kind of case, it seems likely  
that a subsequent code change and not the original test is at fault.  
The challenge is that it may be difficult to identify the code change  
that made the test start failing sporadically. But for example if a  
test newly started failing 100% of the time, we would not consider it  
an appropriate fix to disable that test. Or at least I wouldn't.



If we can't easily roll-out the failing tests (or the commit which
cause them to start failing), we should skip them to keep the bots (a
shared resource) green, so as not to block other work on the project.
No?


The reason for the "keep the bots green" rule is to prevent  
regressions. If we maintain the rule by disabling tests, we are  
sacrificing the actual purpose of the rule for the sake of pro forma  
adherence. Thats why I'd like it to be a last resort, unless the test  
is new and its clear it was flaky from the get-go. The first resort  
should be to get the person who made the test



I very much like WebKit's "everyone is responsible for the whole
project" culture, but I disagree that the burden of diagnosis should
be on the person trying to make a completely unrelated checkin (as is
the case when we leave flakey tests enabled in the tree).


This normally hasn't been a huge problem, other than for the commit  
queue script.




-eric

p.s. I now have two "skipping flakey tests" changes up for review:
https://bugs.webkit.org/show_bug.cgi?id=29322


If Brady and Alexey are ok with disabling this test, then I'm fine  
with it. I like Alexey's suggestion to have a separate "flaky tests"  
list that the commit queue script can ignore, without preventing them  
from being run at all.



https://bugs.webkit.org/show_bug.cgi?id=29344


This one seems like the test was always buggy, so OK to turn off. I do  
think the test can be fixed however, despite comments to the contrary  
in the bug.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread David Levin
I don't believe that the test was checked in a flaky state. It was solid for
a long time and then something happened...
I'll try to add more logging to this test this evening (after my turn at
helping chromium stay up to date with WebKit is over).

I'll ping Drew about the other test.

Dave


On Mon, Sep 28, 2009 at 4:40 PM, Eric Seidel  wrote:

> On Fri, Sep 25, 2009 at 2:42 PM, Maciej Stachowiak  wrote:
> > I like Dave Levin's idea that the first action should be to instrument
> the
> > tests so we can find out why they intermittently fail. Especially if the
> > failure is reproducible on the bots but not on developer systems.
>
> I like this idea too.  I don't like the reality that flakey tests are
> a burden on all developers caused by one.
>
> > Using the
> > skip list should be a last resort, because that hides the failure instead
> of
> > helping us diangose the cause.
>
> I (respectfully) disagree.  I think we shouldn't be so afraid to skip
> tests.  We don't allow people to check in compiles which fail.  We
> don't allow people to check in tests which fail on other platforms
> (without skipping them) or on every other run.  Why should we allow
> people to check in tests which fail every 10 runs?  Or worse, why
> should we leave a known flakey test checked in/un-attended which fails
> every 10 runs?
>
> If we can't easily roll-out the failing tests (or the commit which
> cause them to start failing), we should skip them to keep the bots (a
> shared resource) green, so as not to block other work on the project.
> No?
>
> I very much like WebKit's "everyone is responsible for the whole
> project" culture, but I disagree that the burden of diagnosis should
> be on the person trying to make a completely unrelated checkin (as is
> the case when we leave flakey tests enabled in the tree).
>
> -eric
>
> p.s. I now have two "skipping flakey tests" changes up for review:
> https://bugs.webkit.org/show_bug.cgi?id=29322
> https://bugs.webkit.org/show_bug.cgi?id=29344
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-28 Thread Eric Seidel
On Fri, Sep 25, 2009 at 2:42 PM, Maciej Stachowiak  wrote:
> I like Dave Levin's idea that the first action should be to instrument the
> tests so we can find out why they intermittently fail. Especially if the
> failure is reproducible on the bots but not on developer systems.

I like this idea too.  I don't like the reality that flakey tests are
a burden on all developers caused by one.

> Using the
> skip list should be a last resort, because that hides the failure instead of
> helping us diangose the cause.

I (respectfully) disagree.  I think we shouldn't be so afraid to skip
tests.  We don't allow people to check in compiles which fail.  We
don't allow people to check in tests which fail on other platforms
(without skipping them) or on every other run.  Why should we allow
people to check in tests which fail every 10 runs?  Or worse, why
should we leave a known flakey test checked in/un-attended which fails
every 10 runs?

If we can't easily roll-out the failing tests (or the commit which
cause them to start failing), we should skip them to keep the bots (a
shared resource) green, so as not to block other work on the project.
No?

I very much like WebKit's "everyone is responsible for the whole
project" culture, but I disagree that the burden of diagnosis should
be on the person trying to make a completely unrelated checkin (as is
the case when we leave flakey tests enabled in the tree).

-eric

p.s. I now have two "skipping flakey tests" changes up for review:
https://bugs.webkit.org/show_bug.cgi?id=29322
https://bugs.webkit.org/show_bug.cgi?id=29344
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-25 Thread Maciej Stachowiak


On Sep 25, 2009, at 1:49 PM, Eric Seidel wrote:


Hum...  Discussion kinda died.  I'm interested in soliciting more
input, particularly from Apple folks.

Unless silence indicates that others agree with David Levin, Yaar and
Dimitri that we should skip flakey tests?

If that's the case, then I'd love a review on:
https://bugs.webkit.org/show_bug.cgi?id=29322
:)


I like Dave Levin's idea that the first action should be to instrument  
the tests so we can find out why they intermittently fail. Especially  
if the failure is reproducible on the bots but not on developer  
systems. Using the skip list should be a last resort, because that  
hides the failure instead of helping us diangose the cause.


Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-25 Thread Eric Seidel
On Fri, Sep 25, 2009 at 1:59 PM, Darin Adler  wrote:
> For tests that give intermittent and inconsistent results, the best we can
> currently do is to skip the test. I think it would make sense to instead
> allow multiple expected results. I gather that one of the tools used in the
> Chromium project has this concept and I think there’s no real reason not to
> add the concept to run-webkit-tests as long as we are conscientious about
> not using it when it’s not needed.

Not to derail the discussion, but to provide context for Darin's reply:

Yes, Chromium's version of run-webkit-tests (called
run_webkit_tests.py) has multiple-expected-results support (along with
running tests in parallel and other goodness).  But it's missing a
bunch of the nifty flags WebKit's run-webkit-tests has.  My hope is to
eventually unify them, which we've filed some bugs on:
http://code.google.com/p/chromium/issues/detail?id=23099
https://bugs.webkit.org/show_bug.cgi?id=10906

-eric

On Fri, Sep 25, 2009 at 1:59 PM, Darin Adler  wrote:
> Green buildbots have a lot of value.
>
> I think it’s worthwhile finding a way to have them even when there are test
> failures.
>
> For predictable failures, the best approach is to land the expected failure
> as an expected result, and use a bug to track the fact that it’s wrong. To
> me this does seem a bit like “sweeping something under the rug”, a bug
> report is much easier to overlook than a red buildbot. We don’t have a great
> system for keeping track of the most important bugs.
>
> For tests that give intermittent and inconsistent results, the best we can
> currently do is to skip the test. I think it would make sense to instead
> allow multiple expected results. I gather that one of the tools used in the
> Chromium project has this concept and I think there’s no real reason not to
> add the concept to run-webkit-tests as long as we are conscientious about
> not using it when it’s not needed. And use a bug to track the fact that the
> test gives insufficient results. This has the same downsides as landing the
> expected failure results.
>
> For tests that have an adverse effect on other tests, the best we can
> currently do is to skip the test.
>
> I think we are overusing the Skipped machinery at the moment for platform
> differences. I think in many cases it would be better to instead land an
> expected failure result. On the other hand, one really great thing about the
> Skipped file is that there’s a complete list in the file, allowing everyone
> to see the list. It makes a good to do list, probably better than just a
> list of bugs. This made Darin Fisher’s recent “why are so many tests
> skipped, lets fix it” message possible.
>
>    -- Darin
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-25 Thread Darin Adler

Green buildbots have a lot of value.

I think it’s worthwhile finding a way to have them even when there are  
test failures.


For predictable failures, the best approach is to land the expected  
failure as an expected result, and use a bug to track the fact that  
it’s wrong. To me this does seem a bit like “sweeping something under  
the rug”, a bug report is much easier to overlook than a red buildbot.  
We don’t have a great system for keeping track of the most important  
bugs.


For tests that give intermittent and inconsistent results, the best we  
can currently do is to skip the test. I think it would make sense to  
instead allow multiple expected results. I gather that one of the  
tools used in the Chromium project has this concept and I think  
there’s no real reason not to add the concept to run-webkit-tests as  
long as we are conscientious about not using it when it’s not needed.  
And use a bug to track the fact that the test gives insufficient  
results. This has the same downsides as landing the expected failure  
results.


For tests that have an adverse effect on other tests, the best we can  
currently do is to skip the test.


I think we are overusing the Skipped machinery at the moment for  
platform differences. I think in many cases it would be better to  
instead land an expected failure result. On the other hand, one really  
great thing about the Skipped file is that there’s a complete list in  
the file, allowing everyone to see the list. It makes a good to do  
list, probably better than just a list of bugs. This made Darin  
Fisher’s recent “why are so many tests skipped, lets fix it” message  
possible.


-- Darin

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-25 Thread Eric Seidel
Hum...  Discussion kinda died.  I'm interested in soliciting more
input, particularly from Apple folks.

Unless silence indicates that others agree with David Levin, Yaar and
Dimitri that we should skip flakey tests?

If that's the case, then I'd love a review on:
https://bugs.webkit.org/show_bug.cgi?id=29322
:)

Thanks for your time!

-eric

On Thu, Sep 24, 2009 at 1:40 PM, Eric Seidel  wrote:
> I think the question is most interesting in the abstract as a question
> of general policy about flakey tests.  I think handling the
> circumstances of each bug is best left for discussion in the bug
> themselves.  That said, I've attached a list of recently filed bugs
> about flakey tests, per your request.
>
> -eric
>
> https://bugs.webkit.org/show_bug.cgi?id=29322 (mentioned in the original mail)
> https://bugs.webkit.org/show_bug.cgi?id=29505 (same root bug, I suspect)
> https://bugs.webkit.org/show_bug.cgi?id=29620 (being worked on,
> skipping not yet proposed)
> https://bugs.webkit.org/show_bug.cgi?id=28845 (OS bug, resulted in 2
> skips so far)
> https://bugs.webkit.org/show_bug.cgi?id=28624 (same OS bug, not yet skipped)
> https://bugs.webkit.org/show_bug.cgi?id=29035 (same OS bug, not yet skipped)
> https://bugs.webkit.org/show_bug.cgi?id=29154 (still being
> investigated, skipping not yet proposed)
>
> On Wed, Sep 23, 2009 at 11:23 PM, Maciej Stachowiak  wrote:
>>
>> On Sep 23, 2009, at 11:09 PM, Eric Seidel wrote:
>>
>> Alexey and I have been discussing if WebKit should add flakey tests to
>> Skipped lists:
>> https://bugs.webkit.org/show_bug.cgi?id=29322
>>
>> Alexey asked that I bring the discussion to a larger audience.
>>
>> Pros:
>> - Buildbots stay green.
>> - Red bots/tests means your change caused an error.
>>
>> Cons:
>> - Skipped tests may be more likely to be forgotten (and thus never fixed).
>>
>> What does WebKit think?  Should we skip flakey tests which can't be
>> resolved in a timely manner (and instead track issues via bugs) as
>> WebKit policy?  Or should we reserve the skipped list for other
>> conditions (like bugs in the OS)?
>>
>> Thoughts?
>>
>> I'm a little concerned about sweeping failures under the carpet, but on the
>> other hand the buildbots are much less valuable if they are red too much.
>> It's hard to think about this in the abstract. Do you have any concrete
>> examples?
>>  - Maciej
>>
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-24 Thread Eric Seidel
I think the question is most interesting in the abstract as a question
of general policy about flakey tests.  I think handling the
circumstances of each bug is best left for discussion in the bug
themselves.  That said, I've attached a list of recently filed bugs
about flakey tests, per your request.

-eric

https://bugs.webkit.org/show_bug.cgi?id=29322 (mentioned in the original mail)
https://bugs.webkit.org/show_bug.cgi?id=29505 (same root bug, I suspect)
https://bugs.webkit.org/show_bug.cgi?id=29620 (being worked on,
skipping not yet proposed)
https://bugs.webkit.org/show_bug.cgi?id=28845 (OS bug, resulted in 2
skips so far)
https://bugs.webkit.org/show_bug.cgi?id=28624 (same OS bug, not yet skipped)
https://bugs.webkit.org/show_bug.cgi?id=29035 (same OS bug, not yet skipped)
https://bugs.webkit.org/show_bug.cgi?id=29154 (still being
investigated, skipping not yet proposed)

On Wed, Sep 23, 2009 at 11:23 PM, Maciej Stachowiak  wrote:
>
> On Sep 23, 2009, at 11:09 PM, Eric Seidel wrote:
>
> Alexey and I have been discussing if WebKit should add flakey tests to
> Skipped lists:
> https://bugs.webkit.org/show_bug.cgi?id=29322
>
> Alexey asked that I bring the discussion to a larger audience.
>
> Pros:
> - Buildbots stay green.
> - Red bots/tests means your change caused an error.
>
> Cons:
> - Skipped tests may be more likely to be forgotten (and thus never fixed).
>
> What does WebKit think?  Should we skip flakey tests which can't be
> resolved in a timely manner (and instead track issues via bugs) as
> WebKit policy?  Or should we reserve the skipped list for other
> conditions (like bugs in the OS)?
>
> Thoughts?
>
> I'm a little concerned about sweeping failures under the carpet, but on the
> other hand the buildbots are much less valuable if they are red too much.
> It's hard to think about this in the abstract. Do you have any concrete
> examples?
>  - Maciej
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-24 Thread Dimitri Glazkov
I wonder if there's interest in the WebKit community to improve
granularity (and detail) with which you can document test
expectations, like we have done on Chromium:

http://dev.chromium.org/developers/testing/webkit-layout-tests#TOC-Test-Expectations

This way, you could monitor known failures with the awesomeness that
is Layout Test Flakiness Dashboard:

http://src.chromium.org/viewvc/chrome/trunk/src/webkit/tools/layout_tests/flakiness_dashboard.html

Since we already have done this work on one port, we could definitely
extend this to all other ports.

:DG<

On Wed, Sep 23, 2009 at 11:41 PM, David Levin  wrote:
> If a test is flaky, it doesn't seem good to keep in the mix because it will
> turn red for no reason and people will think a check in was bad.
> oth, if a test is flaky but has enough logging in it to indicated an
> underlying cause, then it may be worth keeping in the mix so that the root
> issue may be determined.
> imo, a good solution in this case is to improve the test to print out more
> information to help track down the underlying bug.
> dave
>
> On Wed, Sep 23, 2009 at 11:23 PM, Maciej Stachowiak  wrote:
>>
>> On Sep 23, 2009, at 11:09 PM, Eric Seidel wrote:
>>
>> Alexey and I have been discussing if WebKit should add flakey tests to
>> Skipped lists:
>> https://bugs.webkit.org/show_bug.cgi?id=29322
>>
>> Alexey asked that I bring the discussion to a larger audience.
>>
>> Pros:
>> - Buildbots stay green.
>> - Red bots/tests means your change caused an error.
>>
>> Cons:
>> - Skipped tests may be more likely to be forgotten (and thus never fixed).
>>
>> What does WebKit think?  Should we skip flakey tests which can't be
>> resolved in a timely manner (and instead track issues via bugs) as
>> WebKit policy?  Or should we reserve the skipped list for other
>> conditions (like bugs in the OS)?
>>
>> Thoughts?
>>
>> I'm a little concerned about sweeping failures under the carpet, but on
>> the other hand the buildbots are much less valuable if they are red too
>> much. It's hard to think about this in the abstract. Do you have any
>> concrete examples?
>>  - Maciej
>>
>> ___
>> webkit-dev mailing list
>> webkit-dev@lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-23 Thread David Levin
If a test is flaky, it doesn't seem good to keep in the mix because it will
turn red for no reason and people will think a check in was bad.
oth, if a test is flaky but has enough logging in it to indicated an
underlying cause, then it may be worth keeping in the mix so that the root
issue may be determined.

imo, a good solution in this case is to improve the test to print out more
information to help track down the underlying bug.

dave

On Wed, Sep 23, 2009 at 11:23 PM, Maciej Stachowiak  wrote:

>
> On Sep 23, 2009, at 11:09 PM, Eric Seidel wrote:
>
> Alexey and I have been discussing if WebKit should add flakey tests to
> Skipped lists:
> https://bugs.webkit.org/show_bug.cgi?id=29322
>
> Alexey asked that I bring the discussion to a larger audience.
>
> Pros:
> - Buildbots stay green.
> - Red bots/tests means your change caused an error.
>
> Cons:
> - Skipped tests may be more likely to be forgotten (and thus never fixed).
>
> What does WebKit think?  Should we skip flakey tests which can't be
> resolved in a timely manner (and instead track issues via bugs) as
> WebKit policy?  Or should we reserve the skipped list for other
> conditions (like bugs in the OS)?
>
> Thoughts?
>
>
> I'm a little concerned about sweeping failures under the carpet, but on the
> other hand the buildbots are much less valuable if they are red too much.
> It's hard to think about this in the abstract. Do you have any concrete
> examples?
>
>  - Maciej
>
>
> ___
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] Skipping Flakey Tests

2009-09-23 Thread Maciej Stachowiak


On Sep 23, 2009, at 11:09 PM, Eric Seidel wrote:


Alexey and I have been discussing if WebKit should add flakey tests to
Skipped lists:
https://bugs.webkit.org/show_bug.cgi?id=29322

Alexey asked that I bring the discussion to a larger audience.

Pros:
- Buildbots stay green.
- Red bots/tests means your change caused an error.

Cons:
- Skipped tests may be more likely to be forgotten (and thus never  
fixed).


What does WebKit think?  Should we skip flakey tests which can't be
resolved in a timely manner (and instead track issues via bugs) as
WebKit policy?  Or should we reserve the skipped list for other
conditions (like bugs in the OS)?

Thoughts?


I'm a little concerned about sweeping failures under the carpet, but  
on the other hand the buildbots are much less valuable if they are red  
too much. It's hard to think about this in the abstract. Do you have  
any concrete examples?


 - Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] Skipping Flakey Tests

2009-09-23 Thread Eric Seidel
Alexey and I have been discussing if WebKit should add flakey tests to
Skipped lists:
https://bugs.webkit.org/show_bug.cgi?id=29322

Alexey asked that I bring the discussion to a larger audience.

Pros:
- Buildbots stay green.
- Red bots/tests means your change caused an error.

Cons:
- Skipped tests may be more likely to be forgotten (and thus never fixed).

What does WebKit think?  Should we skip flakey tests which can't be
resolved in a timely manner (and instead track issues via bugs) as
WebKit policy?  Or should we reserve the skipped list for other
conditions (like bugs in the OS)?

Thoughts?

-eric
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev