Hi everyone,

Mostly, our system has been running well. This summer, we did have an intern who we gave the job of both updating our masters and making our UI changes work on the latest version. So we're currently running 1.2.0 for our masters. We're further behind on the workers, because updating those is painful for us.

We did run into an odd problem over the weekend. As you might remember, we divide our builders essentially two ways. The smaller set produces our installers. The larger set uses those installers to run tests. Our installer builders use locks to make sure that only one build is running at a time. We can't use the usual mechanism because we also have one builder for monitoring that needs to run concurrently with the 'active' build.

We changed from using one branch to another. This was done after the old branch's builds started. Because we construct builder names from the branch names, this can mean that we create new builders, or resurrect older builders (which is what happened in this case). Naturally, we no longer see the oldĀ  builders by default.

Every one of those builders got stuck acquiring locks. We do see this problem from time to time, but it's usually a single host's builds.

Cancelling the build won't solve the problem. The next build will also get stuck. Restarting the worker doesn't help, nor does a reconfiguration. Restarting the master will solve the problem, but that's pretty drastic, and we tend to lose a lot of work when we do that.

If this happens to you, and you need to fix it, here's what we do. We use the manhole in twistd. And we do these:

foo = master.botmaster.namedServices['<name of stuck builder>'].building[0].locks[0][0]
foo.release(foo.owners[0][0], foo.owners[0][1])

The index of 'building' might not always be 0, but almost always is.

I remember some discussion of whether newer versions of twistd still had the manhole. Because we need to do this regularly (though thankfully not frequently), I was dreading updating twistd. But it looks like it survives.

Thanks for your time reading this!

Neil Gilmore
grammatech.com
_______________________________________________
users mailing list
users@buildbot.net
https://lists.buildbot.net/mailman/listinfo/users

Reply via email to