[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Antoine "hashar" Musso changed: What|Removed |Added Summary|[worked around] Zuul is |Zuul is ultra slow post |ultra slow post Jenkins |Jenkins upgrade |upgrade | -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #10 from Antoine "hashar" Musso --- I have pinged wikitech-l about the two crashes that happened today related to that issue : http://lists.wikimedia.org/pipermail/wikitech-l/2013-May/069261.html -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #11 from Antoine "hashar" Musso --- This happened again overnight. Will attach some debugging output. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #12 from Antoine "hashar" Musso --- Created attachment 12286 --> https://bugzilla.wikimedia.org/attachment.cgi?id=12286&action=edit jstack -F -m -l using the PID of the stalled thread -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #13 from Antoine "hashar" Musso --- Created attachment 12287 --> https://bugzilla.wikimedia.org/attachment.cgi?id=12287&action=edit jmap using the PID of the stalled thread -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #14 from Antoine "hashar" Musso --- Created attachment 12288 --> https://bugzilla.wikimedia.org/attachment.cgi?id=12288&action=edit Jenkins threaddump 1 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #15 from Antoine "hashar" Musso --- Created attachment 12289 --> https://bugzilla.wikimedia.org/attachment.cgi?id=12289&action=edit Jenkins threaddump 2 Jenkins has an internal thread dumper which is available via the web interface https://integration.wikimedia.org/ci/threadDump and dumped in the /var/log/jenkins/jenkins.log file. The stalled thread is still 24524 The two thread dumps attached have been taken at 10 minutes interval. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #16 from Antoine "hashar" Musso --- The stalled thread as the id 24524 (or 0x5fcc ). That shows in both thread dumps as: "VM Thread" prio=10 tid=0x7ff398079000 nid=0x5fcc runnable -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #17 from Antoine "hashar" Musso --- Last week Timo had the Java Melody plugin installed on Jenkins which let us monitor the java process. The page entry is https://integration.wikimedia.org/ci/monitoring We can get heap history via https://integration.wikimedia.org/ci/monitoring?part=heaphisto -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #18 from Antoine "hashar" Musso --- After a bit more investigation, our build.xml files contains references to org.jvnet.hudson.plugins.DownstreamBuildViewAction which was a plugin to display the dependencies between builds. When upgrading Jenkins I have removed that plugin but the build files still contains references to it such as: When Jenkins parse the build history, it will detect this is no more used and will log a message which is stored in memory. The data can be discarded manually via the 'Manage old data' https://integration.wikimedia.org/ci/administrativeMonitor/OldData/manage With the thousands of build history, the old data store turns out to have too many item for Java heap size and that most probably cause the memory exhaustion. The slow way to solve that is to ask Jenkins to parse the history files for a job then manually discard all data. The fastest / hard way would be to parse all the build.xml and remove the XML snippet. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #19 from Antoine "hashar" Musso --- And the super long filtering command: find . -name build.xml \ -exec fgrep -q 'DownstreamBuild' {} \; \ -exec sed -i-back \ '/ /d' {} \; -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #20 from Antoine "hashar" Musso --- running above tool on mediawiki-core-lint -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #21 from Antoine "hashar" Musso --- Launched for all jobs using find . -wholename '*/builds/*/builds.xml' That is running in a tmux window on gallium. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #22 from Antoine "hashar" Musso --- All build.xml now have the org.jvnet.hudson.plugins.DownstreamBuildViewAction snippet removed. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #23 from Antoine "hashar" Musso --- Sent a summary on wikitech-l http://lists.wikimedia.org/pipermail/wikitech-l/2013-May/069303.html I will monitor this bug till monday included then probably close this. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Antoine "hashar" Musso changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #24 from Antoine "hashar" Musso --- That issue is solved for now. The build.xml rewrite fixed it. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Antoine "hashar" Musso changed: What|Removed |Added Priority|Unprioritized |Immediate Status|NEW |ASSIGNED Assignee|wikibugs-l@lists.wikimedia. |has...@free.fr |org | --- Comment #1 from Antoine "hashar" Musso --- yeah for working over night :-] -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #2 from Antoine "hashar" Musso --- The workaround did get rid of the calls to the slow API. Looking at the Apache logs in /var/log/apache2/integration_access.log , I have noticed it took up to 8 seconds to update a job description. That is something being done by Zuul very often and prevents it from doing anything else. Jenkins had at least a thread at 100% CPU and the main process (java) was reporting 100% CPU too. I have restarted Jenkins. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #3 from Antoine "hashar" Musso --- Tips to investigate: tail -f /var/log/apache2/integration_access.log | grep Python That will show the Zuul requests to Jenkins. The time is when the query started, so any backward time will indicated a query that took a long time. On completion, query send a 302 which in turns triggers a GET from Jenkins. Exemple: 02/May/2013:23:17:53 "POST /ci/job/parsoid-parsertests-run-harder/65/submitDescription 302" 02/May/2013:23:17:53 "GET /ci/job/parsoid-parsertests-run-harder/65/ 200" That one has been fast. Before triggering a job, Zuul attempts to lookup whether the job exist. It does that based on the python-jenkins module which hit an API call which is very slow on our setup.I have overridden the existence check with Gerrit Change #62095. For now it seems Jenkins is more or less working after the last restart. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #29 from Gerrit Notification Bot --- Change 102465 merged by Hashar: WMF: less build description updates https://gerrit.wikimedia.org/r/102465 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Gerrit Notification Bot changed: What|Removed |Added Status|RESOLVED|PATCH_TO_REVIEW Resolution|FIXED |--- -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #28 from Gerrit Notification Bot --- Change 102465 had a related patch set uploaded by Hashar: WMF: less build description updates https://gerrit.wikimedia.org/r/102465 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Antoine "hashar" Musso changed: What|Removed |Added Status|PATCH_TO_REVIEW |RESOLVED Resolution|--- |FIXED --- Comment #30 from Antoine "hashar" Musso --- Gerrit change 102465 deployed in production and apparently definitely solved the issue. Summary sent on wikitech-l http://lists.wikimedia.org/pipermail/wikitech-l/2013-December/073703.html -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Gerrit Notification Bot changed: What|Removed |Added Status|RESOLVED|PATCH_TO_REVIEW Resolution|FIXED |--- -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #25 from Gerrit Notification Bot --- Change 86812 had a related patch set uploaded by Matanya: Removed pstack package since bug 48025 is resolved. https://gerrit.wikimedia.org/r/86812 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 Antoine "hashar" Musso changed: What|Removed |Added Status|PATCH_TO_REVIEW |RESOLVED Resolution|--- |FIXED --- Comment #26 from Antoine "hashar" Musso --- Unrelated change. -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 48025] Zuul is ultra slow post Jenkins upgrade
https://bugzilla.wikimedia.org/show_bug.cgi?id=48025 --- Comment #27 from Gerrit Notification Bot --- Change 86812 merged by Dzahn: Removed pstack package since bug 48025 is resolved. https://gerrit.wikimedia.org/r/86812 -- You are receiving this mail because: You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l