https://bugzilla.wikimedia.org/show_bug.cgi?id=70597

--- Comment #11 from Antoine "hashar" Musso (WMF) <has...@free.fr> ---
I forgot to update this bug after my debugging session on Oct 24th here is the
rough brain dump:

The Gearman plugin source code is at
https://review.openstack.org/openstack-infra/gearman-plugin.git

I have added a logger for the Gearman plugin ( hudson.plugins.gearman.logger @
INFO ):

https://integration.wikimedia.org/ci/log/Plugins%20-%20Gearman/

Whenever the issue occurs we can switch it to FINE and get some debug messages.


IIRC the bastion executors threads were held in lock() within
src/main/java/hudson/plugins/gearman/NodeAvailabilityMonitor.java lock()
function. It takes a worker as parameter and has a wait(5000).

I think the debug message had a 'null' worker possibly ending in a dead end of
being always considered busy and thus waiting.

That might be a bad interaction with the jobs that are scheduled by Jenkins
itself such as the hourly database update that runs on deployment bastion as
well.


And of course: I have no idea how to reproduce it nor how to hook a debugger in
Jenkins nor how to dump the state of variables :-/

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to