[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-05-31 Thread Stashbot
Stashbot added a comment.
Mentioned in SAL (#wikimedia-operations) [2017-05-31T13:19:58Z]  Synchronized wmf-config/InitialiseSettings.php: Log "api-readonly" errors (T164191, T123867) (duration: 00m 43s)TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_Pintscher, StashbotCc: Stashbot, gerritbot, jcrespo, Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, hoo, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Izno, Wikidata-bugs, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-05-18 Thread gerritbot
gerritbot added a project: Patch-For-Review.
TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_Pintscher, gerritbotCc: gerritbot, jcrespo, Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, hoo, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, Lewizho99, Maathavan, Izno, Wikidata-bugs, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-28 Thread Multichill
Multichill added a comment.
Thanks @jcrespo  for you response. Task was filed because we noticed the database going read-only every once in a while and because T100123 hasn't been implemented yet, this has impact. Something might have been wrong. I'll just sum up your response as: Works as designed.

That leaves us without any actions in this task so I think it's best to just close this one. Do you agree @hoo ?TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MultichillCc: jcrespo, Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, hoo, QZanden, Izno, Wikidata-bugs, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-24 Thread jcrespo
jcrespo edited projects, added MediaWiki-Database; removed DBA.jcrespo added a comment.
That is the max lag, and it is normal on the slaves that are not waited by mediawiki. This issue has nothing to do with databases, mediawiki does what it is programmed to do: if it detects lag even if a few seconds, it disables the API (by design)- if you do not like that functionality, a specific change should be directed to the mediawiki persistance functionality maintainers, if you do not like you bot erroring out or crashing, complain to the API client application (as the response clearly tells you to retry in X seconds, but databases themselves have no issues- individual servers lag at a time and that is the way it is programmed (which to me makes sense, but I have no say on that).

Note I am not saying this bug is invalid, I am saying there is nothing broken on the databases or its hardware. For example, there is a vslow slave on each shard that lags all the time, and that is ok because it is not used for main queries (has weight 0). As you can see here, only 1 server is slower - and normally by 1 second ( https://grafana.wikimedia.org/dashboard/db/mysql-replication-lag?panelId=5=1=1492933452444=1493019852444=codfw%20prometheus%2Fops ), how the application responds to that is not a database issue as that is a normal state.

Select queries do not cause lag, unless there is something so impacting that blocks all available resources (as Innodb reads never block writes)- only writes do (or blocking queries), so while the queries mentioned may be bad- lag is caused by writes- either bots writing too many rows at a time, or large write transactions. Current response of mediawiki is to put thing in read only when lots of load is detected- if that is undesired (or maybe can be tuned better)- direct your complains to #mediawiki-database (not sure if that is handled by performance, platform or who?), not #DBA (I think queries should only blocked if the majority of slaves are lagged, and not only when one is lagged). Replication is by design asyncronous, and lag will be existing always, unless you want us to slow down writes and write syncronously, reducing the throughput to 1/1000 to the current one (and creating even more errors).

I can tell you what I think it happens- wikidata is highly populated by bots and they create lots of traffic, and I think that causes many times stress on s5, higher than normal. A better approach rather than stopping serving all requests would be to rate-limit the bad users- but I think there is not such a technology yet (and not easy to implement). I have also seen that some job related writes are sometimes too intensive (see T163544) What I can tell you is what we as DBAs will be doing to minimize wikidata impact- we are going to give it dedicated hardware on it own separate shard and call it s8. If that will solve this issues the answer probably is no- I think this is not a bug, and I would support making it invalid- if your client detects an error, it should retry after 1 minute (or the time you are told to wait) once. The only thing I would support is to make the detection (for example, I think the lag detection code is too strict -complaining when the error is >1 second and the detection error is 1+ seconds, and I already transmitted my opinion on that) and its effects lower (like allowing more than X seconds of lag or not blocking in read only mode certain actions), but the general logic I think works as intended.TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, hoo, QZanden, Izno, Wikidata-bugs, Mbch331, Jay8g___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-24 Thread Marostegui
Marostegui added a comment.
Looks like db2059 (api slave) had some peaks of lag around that time, which matches some slow queries too. Could be related to all the stuff being discussed here: T163495TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MarosteguiCc: Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, jcrespo, hoo, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2017-04-17 Thread Marostegui
Marostegui added a comment.

In T123867#3183910, @Multichill wrote:
In the last 10 minutes:

pywikibot.data.api.APIError: readonly: The database has been automatically locked while the slave database servers catch up to the master [readonlyreason:Waiting for 7 lagged database(s); help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce; for notice of API deprecations and breaking changes.]
 
 CRITICAL: Closing network session.



pywikibot.data.api.APIError: readonly: The database has been automatically locked while the slave database servers catch up to the master [readonlyreason:Waiting for 5 lagged database(s); help:See https://www.wikidata.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce; for notice of API deprecations and breaking changes.]
 
 CRITICAL: Closing network session.


This is most likely due to: T148609#3183908 as we had issues with most of the server there, which were having spikes in threads, load, lag etc.TASK DETAILhttps://phabricator.wikimedia.org/T123867EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MarosteguiCc: Marostegui, Ladsgroup, Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, jcrespo, hoo, QZanden, Minhnv-2809, Izno, Luke081515, Wikidata-bugs, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2016-02-04 Thread jcrespo
jcrespo added a comment.

Could be related to either https://phabricator.wikimedia.org/T122429 or 
https://phabricator.wikimedia.org/T109943.


TASK DETAIL
  https://phabricator.wikimedia.org/T123867

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, 
jcrespo, hoo, Izno, Wikidata-bugs, Mbch331, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2016-02-04 Thread jcrespo
jcrespo added a blocked task: T95501: Fix causes of slave lag and get it to 
under 5 seconds at peak (tracking).

TASK DETAIL
  https://phabricator.wikimedia.org/T123867

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, 
jcrespo, hoo, Izno, Wikidata-bugs, Mbch331, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T123867: Repeated reports of wikidatawiki (s5) API going read only

2016-01-25 Thread jcrespo
jcrespo added a comment.

I do not own such changes, and you should discuss them with Aaron, which I 
think is trying to lower all those timeouts. Or better, with wikidata 
developers to avoid long queries and monitoring lag, as reported several times 
by me: https://phabricator.wikimedia.org/T111769.


TASK DETAIL
  https://phabricator.wikimedia.org/T123867

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: Aklapper, StudiesWorld, aaron, daniel, aude, Lydia_Pintscher, Multichill, 
jcrespo, hoo, Wikidata-bugs, Mbch331, Krenair



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs