Hi Neil, The "master election" process for schedulers is not perfect for me. It would need a bit more work in order to really be up to date with state of the art master election best practices. So indeed, I would at the moment advice the have a single master doing the schedulers.
For example: https://docs.google.com/presentation/d/1CA932aTgicnOpIReOhZcqijHE9BGapXfpkIqx9nSRnw/edit#slide=id.g2219c85b58_1_532 Scality have a single "frontend" master which is doing the www and schedulers, and hooks. This is not a "HA" setup, but anyway buildbot scheduler election has a 10min recovery timer for when a master fail. As for claims and collapsing, we have found recently some annoying bugs about those: https://github.com/buildbot/buildbot/pull/3411 https://github.com/buildbot/buildbot/pull/3152 On Wed, Jul 5, 2017 at 6:20 PM Neil Gilmore <[email protected]> wrote: > Hi everyone. > > Well, now that I can (reliably) release locks though Twisted's manhole, > things are a bit brighter. We have a somewhat rare problem in which all > of a worker's builders are 'acquiring locks'. Doesn't happen often, but > it keeps things from running. Remember, we can't use the worker > configuration to limit builds. > > But we're having another problem that seems to be getting a bit worse. I > seem to recall Pierre saying that in a multi-master configuration, if > there was a scheduler that existed on multiple masters, that scheduler > would only be active on a single master. Other masters might activate > that scheduler if the first master went away. So there should only be > one master's scheduler scheduling particular builds. > > Well, that isn't happening for us. It's not a problem most of the time, > because the builds do collapse, most of the time. Except when they don't. > > For example, last weekend we had 3 builds schedule and build for the > same sourcestamp (according to the debug information in the UI). The > builds were scheduled within 3 seconds of each other. However, they were > claimed many hours apart. It appears that the first build completed > before the second was claimed, etc. Is this how it ought to go? I > haven't quite cracked the submitted/claimed/started timing. > > We had a similar claiming problem last week where a build went unclaimed > for 44 days. So when it popped up. it appeared that we had gone back in > time (as the revision was quite old at that time). > > Do I just need to figure out how to not put schedulers on more than 1 > master? > > Neil Gilmore > grammatech.com > _______________________________________________ > users mailing list > [email protected] > https://lists.buildbot.net/mailman/listinfo/users >
_______________________________________________ users mailing list [email protected] https://lists.buildbot.net/mailman/listinfo/users
