[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-29 Thread Smalyshev
Smalyshev added a comment.
Do we have anything left to do here?TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Smalyshev, Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-25 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-06-25T12:00:11Z]  repooling wdqs1005 it has catched up on updates - T198042TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-25 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-06-25T07:46:25Z]  depooling wdqs1005 to allow it to catch up on updates - T198042TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-25 Thread Gehel
Gehel added a comment.
The pattern of banned / throttled request as seen on wdqs matches a pattern of HTTP 500 seen on varnish. It is the same user agent / IP. I was expecting all those banned / throttled requests to be 403 / 429, but it looks like this is not the case. Something is wrong...TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-25 Thread Gehel
Gehel added a comment.
Looking at thread dumps on wdqs1005, there is > 5000 threads waiting logging (see stack trace below). We could improve the situation with an AsyncAppender (probably a good idea anyway), but that's only treating the symptoms, not the root cause.

com.bigdata.journal.Journal.executorService20427 - priority:5 - threadId:0x7f5c8013f800 - nativeId:0x350c - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x7f5d69220380> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at ch.qos.logback.core.OutputStreamAppender.writeBytes(OutputStreamAppender.java:197)
at ch.qos.logback.core.OutputStreamAppender.subAppend(OutputStreamAppender.java:231)
at ch.qos.logback.core.OutputStreamAppender.append(OutputStreamAppender.java:102)
at ch.qos.logback.core.UnsynchronizedAppenderBase.doAppend(UnsynchronizedAppenderBase.java:84)
at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:421)
at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
at ch.qos.logback.classic.Logger.log(Logger.java:765)
at org.apache.log4j.Category.differentiatedLog(Category.java:193)
at org.apache.log4j.Category.warn(Category.java:257)
at com.bigdata.util.concurrent.Haltable.logCause(Haltable.java:474)
at com.bigdata.util.concurrent.Haltable.halt(Haltable.java:197)
at com.bigdata.bop.join.PipelineJoin$JoinTask$BindingSetConsumerTask.call(PipelineJoin.java:1022)
at com.bigdata.bop.join.PipelineJoin$JoinTask.consumeSource(PipelineJoin.java:739)
at com.bigdata.bop.join.PipelineJoin$JoinTask.call(PipelineJoin.java:623)
at com.bigdata.bop.join.PipelineJoin$JoinTask.call(PipelineJoin.java:382)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1346)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:926)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
at com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:821)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-24 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-06-24T22:26:46Z]  restarting wdqs1004 - T198042TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Framawiki, Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-24 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-06-24T21:51:23Z]  restarting wdqs1005 after taking a few thread dumps - T198042TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot, Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-24 Thread Gehel
Gehel added a comment.
wdqs1005 was lagging on updates. A few thread dumps for further analysis before restarting it: F22597390: threads.tar.gzTASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T198042: WDQS timeout on the public eqiad cluster

2018-06-24 Thread Gehel
Gehel added a comment.
Situation is better, but still not entirely stable (I just restarted blazegraph on wdqs1004). Looking at logstash, the number of Haltable errors is high, but it has been just as high in the last 7 days without major issues.

Side note, we might want to have a log of long running queries, to see more easily if we can spot something...TASK DETAILhttps://phabricator.wikimedia.org/T198042EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Gehel, Aklapper, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Avner, Zppix, Jonas, FloNight, Xmlizer, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs