[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-10-22 Thread Smalyshev
Smalyshev added a comment.
Also I notice the timestamp on 1003 is not advancing:

Oct 23 00:54:40 wdqs1003 wdqs-updater[10071]: 00:54:40.692 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-10-23T00:43:08Z at (4.9, 8.0, 5.1) updates per second and (1.8, 2256.6, 3258.3) milliseconds per second
Oct 23 00:54:40 wdqs1003 wdqs-updater[10071]: 00:54:40.692 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-10-23T00:43:08Z at (4.9, 8.0, 5.1) updates per second and (1.8, 2256.6, 3258.3) milliseconds per second
Oct 23 00:56:21 wdqs1003 wdqs-updater[10071]: 00:56:21.063 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-10-23T00:43:08Z at (9.5, 9.4, 6.0) updates per second and (0.3, 1616.9, 2915.6) milliseconds per second
Oct 23 00:54:40 wdqs1003 wdqs-updater[10071]: 00:54:40.692 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2018-10-23T00:43:08Z at (4.9, 8.0, 5.1) updates per second and (1.8, 2256.6, 3258.3) milliseconds per second

This does not happen on other hosts. It looks like maybe something wrong with communication with kafka?TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-10-22 Thread Smalyshev
Smalyshev added a comment.
Getting also this now:

Oct 23 00:41:49 wdqs1003 wdqs-updater[10071]: 00:41:49.901 [main] ERROR o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=wdqs1003] Offset commit failed on partition eqiad.mediawiki.page-undelete-0 at offset 136517: The request timed out.
Oct 23 00:41:55 wdqs1003 wdqs-updater[10071]: 00:41:55.009 [main] ERROR o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-1, groupId=wdqs1003] Offset commit failed on partition eqiad.mediawiki.page-undelete-0 at offset 136517: The request timed out.

Looks like we have some serious network problems or something else is seriously messed up there, since Kafka is now failing too.TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-10-02 Thread gerritbot
gerritbot added a comment.
Change 463248 merged by Gehel:
[operations/puppet@production] wdqs: don't send nginx logs to logstash

https://gerrit.wikimedia.org/r/463248TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-09-28 Thread Gehel
Gehel added a comment.

In T200563#4623531, @Smalyshev wrote:
Great work!


Thanks (I'll forward to @Volans)

I am not sure though why logging would be that much of an issue, shouldn't the log code take care of batching it, etc.? As for not logging nginx - do we have these logs somewhere else? If yes, then I guess we can stop that. We could probably tune Kafka logging too - right now it's kinda verbose and we most likely don't want that in the logstash.

nginx sends logs over syslog, so one UDP packet per log message. Honestly, I'm not sure it will make much difference. The dropped packets are on RX, and nginx/syslog should be only TX. They still compete for the same CPU, so that could help.

In the end, the problem is CPU contention, so anything that can reduce it is good to take. Nginx logs is the low hanging fruit. Spreading NIC related workload to more CPU should be doable. And making sure that Blazegraph CPU consumption is bounded is what we should really address (but probably quite a bit more complex).

As far as I understand, access logs should be available via varnish kafka, in the analytics cluster / hive.TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-09-27 Thread Smalyshev
Smalyshev added a comment.
Great work!
I am not sure though why logging would be that much of an issue, shouldn't the log code take care of batching it, etc.? As for not logging nginx - do we have these logs somewhere else? If yes, then I guess we can stop that. We could probably tune Kafka logging too - right now it's kinda verbose and we most likely don't want that in the logstash.TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-09-27 Thread gerritbot
gerritbot added a comment.
Change 463254 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: cleanup logback configuration

https://gerrit.wikimedia.org/r/463254TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, Nandana, NebulousIris, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-09-27 Thread gerritbot
gerritbot added a comment.
Change 463248 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] wdqs: don't send nginx logs to logstash

https://gerrit.wikimedia.org/r/463248TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Volans, Stashbot, Gehel, Aklapper, Smalyshev, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-09-14 Thread Smalyshev
Smalyshev added a comment.
So, weird thing: now that we switched data centers, wdqs2003 is showing the same anomaly. Could it be that our load balancing is not balancing the load evenly for these hosts?TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Stashbot, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T200563: wdq1003 is anomalous

2018-07-31 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2018-07-31T12:29:06Z]  rebalance LVS weights to send less traffic to wdqs1003 - T200563TASK DETAILhttps://phabricator.wikimedia.org/T200563EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: StashbotCc: Stashbot, Gehel, Aklapper, Smalyshev, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs