https://bugzilla.wikimedia.org/show_bug.cgi?id=70597
--- Comment #11 from Antoine "hashar" Musso (WMF) <has...@free.fr> --- I forgot to update this bug after my debugging session on Oct 24th here is the rough brain dump: The Gearman plugin source code is at https://review.openstack.org/openstack-infra/gearman-plugin.git I have added a logger for the Gearman plugin ( hudson.plugins.gearman.logger @ INFO ): https://integration.wikimedia.org/ci/log/Plugins%20-%20Gearman/ Whenever the issue occurs we can switch it to FINE and get some debug messages. IIRC the bastion executors threads were held in lock() within src/main/java/hudson/plugins/gearman/NodeAvailabilityMonitor.java lock() function. It takes a worker as parameter and has a wait(5000). I think the debug message had a 'null' worker possibly ending in a dead end of being always considered busy and thus waiting. That might be a bad interaction with the jobs that are scheduled by Jenkins itself such as the hourly database update that runs on deployment bastion as well. And of course: I have no idea how to reproduce it nor how to hook a debugger in Jenkins nor how to dump the state of variables :-/ -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l