[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-03 Thread jcrespo
jcrespo added a comment. $ check_mariadb.py -h db1052 --slave-status --primary-dc=eqiad {"datetime": 1501777331.898183, "ssl_expiration": 1619276854.0, "connection": "ok", "connection_latency": 0.07626748085021973, "ssl": true, "total_queries": 15981662418, "heartbeat": {"s1": 0.400536}, "uptime":

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-03 Thread gerritbot
gerritbot added a comment. Change 369397 merged by Jcrespo: [operations/puppet@production] mariadb: Add new python3 script to check the health of a server https://gerrit.wikimedia.org/r/369397TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREFERENCEShttps://phabricator.wikimedia.org/set

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. Wikidata goes into read-only the subscriptions mentioned Yes, definitely some extensions in the past do not behave perfectly and do not respect mediawiki's read-only mode- I do not know what is the sate of Wikidata, but for what you say, a ticket should be filed so its sta

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread gerritbot
gerritbot added a comment. Change 369397 had a related patch set uploaded (by Jcrespo; owner: Jcrespo): [operations/puppet@production] mariadb: Add new python3 script to check the health of a server https://gerrit.wikimedia.org/r/369397TASK DETAILhttps://phabricator.wikimedia.org/T171928EMAIL PREF

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I have started working on more complete monitoring, useful if we go over the route of human monitoring rather than automation, here is one example: $ ./check_mariadb.py --icinga -h db1052.eqiad.wmnet --check_read_only=0 Version 10.0.28-MariaDB, Uptime 16295390s, read_only:

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread mark
mark added a comment. I agree; there's a very good reason for setting masters to read-only when something happened, because it needs manual intervention to investigate whether it's safe to go read-write again. Any automation to do that should be REALLY thoroughly thought through, covering all cases

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread Marostegui
Marostegui added a comment. From my side, I would prefer option "b" (monitoring read-only status on the active masters) My reasoning for this is: I wouldn't like puppet to automatically change settings, specially on the masters. And if a master crashes, I want to investigate why it crashed (in cas

[Wikidata-bugs] [Maniphest] [Commented On] T171928: Wikidata and dewiki databases locked

2017-08-01 Thread jcrespo
jcrespo added a comment. I've almost finished the above incident documentation. However, I am unsure about which are the right actionables and their priorities (last section). let's use this ticket to agree on what would be the best followup, a) making puppet change read-only state of the db serv