Hi,

There are different issues here:
- EWS bots don't run webkitpy tests, except the Mac WK1 and WK2.
(running webkitpy tests is part of run tests) Maybe we should
run webkitpy tests on the non tester EWS bots too.

- poor webkitpy test coverage: Windows buildbots should have
noticed the mentioned failure after the patch landed.
(Or is it possible that Win EWS and buildbot have different config?)

- EWS can't check properly if a patch break its own code, because
it applies the patch and then do the build, test, etc - without
restarting the process. By this time most of the code are in the
memory, changing the code of a running process won't influence
the actual run. It isn't trivial to fix this issue, and I'm not
sure if it is so important.

This kind of issues are very rare and can be catched easily if the port
maintainers / gardeners are monitoring the EWS queues and buildbots
continuously, not only once a day or more rarely. Maybe we could add
a heartbeat feature to webkitbot. It could ping maintainers on IRC
or send an email if a buildbot or EWS is offline or the queue is too long for a while.

br,
Ossy

Maciej Stachowiak írta:
You said it did not detect the failure until many builds later. That seems bad. People expect EWS validation to happen on their bug, not out of band 10-13 builds later. Is there any way to fix this limitation? That seems better than asking people to remember exceptions about patches that EWS can't validate the normal way.



On Apr 1, 2015, at 9:29 PM, Brent Fulgham <[email protected] <mailto:[email protected]>> wrote:

The Windows EWS bots process patches fairly quickly. Once I corrected the problem today, it managed to process about 97 patches in about an hour.

I do think one bottleneck is due to individual EWS bots "locking" patches. The first bot to reach a patch locks the patch against other bots handling it. If the patch happens to be 'consumed' be a bot with some kind of problem (e.g., bad local configuration, a full disk drive, etc.), that patch will not be touched again --- even if the other eight EWS bots are sitting dormant.

Is there some other processing metric you are concerned about?

? Brent Fulgham - Apple Inc.



On Apr 1, 2015, at 2:26 PM, Maciej Stachowiak <[email protected] <mailto:[email protected]>> wrote:


Is it possible to make EWS start processing changes more promptly?

On Apr 1, 2015, at 12:42 PM, Brent Fulgham <[email protected] <mailto:[email protected]>> wrote:

Hi Everyone,

We lost Windows EWS coverage for the past 36 hours due to a very benign-appearing change to some webkitpy code. I haven't yet figured out why this particular set of changes caused the Windows bots to start failing, but it has to do with various differences between the Cygwin Python 2.7.8 build and the versions used on our other EWS bots.

This does not seem like something developers SHOULD have to worry about, but it's an unfortunately reality that they really do need to.

To make matters worse, the patch that introduced the problem passed EWS. This is because the EWS bots only really begin using changes to webkitpy when they restart processing (about once every 10-13 build iterations).

To help combat this problem, I'd like to request that when making changes to webkitpy, please keep an eye on the various EWS bots to make sure they continue processing. If they do start failing, please roll the patch back out and we can work together to resolve the issue.

I apologize for how manual and inconvenient this needs to be (at least for now), but keeping the EWS up and running is critical to the smooth function of this project.

If you have any questions, please don't hesitate to e-mail me or look for me on IRC.

Thanks!

-Brent
_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to