Hi,
There are different issues here:
- EWS bots don't run webkitpy tests, except the Mac WK1 and WK2.
(running webkitpy tests is part of run tests) Maybe we should
run webkitpy tests on the non tester EWS bots too.
- poor webkitpy test coverage: Windows buildbots should have
noticed the mentioned failure after the patch landed.
(Or is it possible that Win EWS and buildbot have different config?)
- EWS can't check properly if a patch break its own code, because
it applies the patch and then do the build, test, etc - without
restarting the process. By this time most of the code are in the
memory, changing the code of a running process won't influence
the actual run. It isn't trivial to fix this issue, and I'm not
sure if it is so important.
This kind of issues are very rare and can be catched easily if the port
maintainers / gardeners are monitoring the EWS queues and buildbots
continuously, not only once a day or more rarely. Maybe we could add
a heartbeat feature to webkitbot. It could ping maintainers on IRC
or send an email if a buildbot or EWS is offline or the queue is too
long for a while.
br,
Ossy
Maciej Stachowiak írta:
You said it did not detect the failure until many builds later. That
seems bad. People expect EWS validation to happen on their bug, not out
of band 10-13 builds later. Is there any way to fix this limitation?
That seems better than asking people to remember exceptions about
patches that EWS can't validate the normal way.
On Apr 1, 2015, at 9:29 PM, Brent Fulgham <[email protected]
<mailto:[email protected]>> wrote:
The Windows EWS bots process patches fairly quickly. Once I corrected
the problem today, it managed to process about 97 patches in about an
hour.
I do think one bottleneck is due to individual EWS bots "locking"
patches. The first bot to reach a patch locks the patch against other
bots handling it. If the patch happens to be 'consumed' be a bot with
some kind of problem (e.g., bad local configuration, a full disk
drive, etc.), that patch will not be touched again --- even if the other
eight EWS bots are sitting dormant.
Is there some other processing metric you are concerned about?
? Brent Fulgham - Apple Inc.
On Apr 1, 2015, at 2:26 PM, Maciej Stachowiak <[email protected]
<mailto:[email protected]>> wrote:
Is it possible to make EWS start processing changes more promptly?
On Apr 1, 2015, at 12:42 PM, Brent Fulgham <[email protected]
<mailto:[email protected]>> wrote:
Hi Everyone,
We lost Windows EWS coverage for the past 36 hours due to a very
benign-appearing change to some webkitpy code. I haven't yet figured
out why this particular set of changes caused the Windows bots to
start failing, but it has to do with various differences between the
Cygwin Python 2.7.8 build and the versions used on our other EWS bots.
This does not seem like something developers SHOULD have to worry
about, but it's an unfortunately reality that they really do need to.
To make matters worse, the patch that introduced the problem passed
EWS. This is because the EWS bots only really begin using changes to
webkitpy when they restart processing (about once every 10-13 build
iterations).
To help combat this problem, I'd like to request that when making
changes to webkitpy, please keep an eye on the various EWS bots to
make sure they continue processing. If they do start failing, please
roll the patch back out and we can work together to resolve the issue.
I apologize for how manual and inconvenient this needs to be (at
least for now), but keeping the EWS up and running is critical to
the smooth function of this project.
If you have any questions, please don't hesitate to e-mail me or
look for me on IRC.
Thanks!
-Brent
_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev