[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2018-06-15 Thread Ladsgroup
Ladsgroup added a comment.
That's another topic, I think we can call this resolved.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: PokestarFan, elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, AndyTan, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, LawExplorer, Vali.matei, Minhnv-2809, Volker_E, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-27 Thread Marostegui
Marostegui added a comment.
Holding connections on the master:  if there are 5-10 jobs running it shouldn't be a big deal as I assume only 10 connections (max) will be just connected (but not doing anything, right?).
It can be a problem if the master is under heavy stress, however, if the lock isn't doing anything but keeping the connection open, isn't a massive deal.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MarosteguiCc: Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, Zppix, D3r1ck01, Izno, Luke081515, Cmjohnson, Wikidata-bugs, aude, GWicke, mark, faidon, Mbch331, Jay8g, Krenair, akosiaris, fgiunchedi___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.
I would favor doing the locking using an alternative mechanism for which we already have infrastructure. Setting up a separate Maria server just to serve 10 mostly idle connections seems overkill.

However, I'm not sure the available locking mechanismns based on Memc and Redis have sufficient protection against stale (orphan) locks. Perhaps @aaron can shed some light on this.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.
Possibly relevant: https://github.com/mlanett/redis-lockTASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.
Using memcached is probably not feasible, since locks may be dropped at any time. If Redis turns out not to be an option, we may consider using a more powerful distributed locking service. Running another service just for this seems to be a bit of overkill, but if the alternative is to set up a separate Maria host for a single 1000 row table, ti may still be the better option.

Two options that come to mind:


http://research.google.com/archive/chubby.html
https://zookeeper.apache.org/
TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.

In T151681#2826465, @jcrespo wrote:
@Manuel, @daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of connections, which means half of the connections are reserved but doing nothing, so you are limiting the master throughput to 50% of the reality.


I don't quite understand - why would an idle connection be hogging CPU threads? These connections are not active for 15 minutes, just open... Sure, they count towards the connection limit, but that's higher than just 32, no?TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.

In T151681#2826465, @jcrespo wrote:
@Marostegui, @daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of connections, which means half of the connections are reserved but doing nothing, so you are limiting the master throughput to 50% of the reality. Also, running connections for 15 minutes means that in case of a master problem, those connections do not get new configuration (like a master failover) or a new mediawiki version, something that limits High Availability- if there is a problem with the master, it cannot be easily failed over. All long running connections should be short, or easy to kill.

The wiki master is a SPOF, and it should not be used for coordination. For example, as @aaron will be able to tell you, the real master will not be easily accessible from a remote datacenter, which means this functionality as it is now, limits cross-dc scalability.

There is absolutely no reason to do this on a master, when it could be easily done on a separate server, that doesn't interact with other functionality.


TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Manuel, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread jcrespo
jcrespo added a comment.
max_connections is 5000, maximum active threads is 32 enforced on the connection pool. No connections should be open that are idle, and a typical connection should take less than 1 second, otherwise it has the risk of getting killed by the watchdog looking for idle connections (and we are not going to make an exception because wikidata). Extra connections make other connections take longer to connect, which increases the timeout error rate from regular connections.

Even if your connection don't count to the limit, they have an ongoing transaction open (lock) which creates issues with both metadata locking on things like ALTER TABLES and provide purge issues.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread aaron
aaron added a comment.
How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown handler usually catches SIGINT. I guess there are still fatal errors, though I'd hope that sort of thing would be rare. In that case, the redis lock manager used by our FileBackend instances could be reused.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aaronCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread aaron
aaron added a comment.
There is also a flip-side to automatically dropping on connection loss, which is that loss can happen (possibly due to the net_wait_timeout options) while the connection to DBs actually being updated stays alive. In that case, multiple threads could run on the same client wiki. Not sure if that actually happens though, there would be reconnection events in logstash if it did.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: aaronCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-28 Thread daniel
daniel added a comment.

In T151681#2827708, @aaron wrote:
How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown handler usually catches SIGINT. I guess there are still fatal errors, though I'd hope that sort of thing would be rare. In that case, the redis lock manager used by our FileBackend instances could be reused.


Reusing there code from FileBackend sounds like it's worth looking into, thanks @aaron!TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment.
Another example of why long running connections are a problem: I am depooling es1017 for important maintenance, I have depooled it, so I expect connections so finish within a few seconds, with the exception of wikiadmin's known long running queries, but I just see 2 sleeping connections:

| 2188112525 | wikiuser| 10.64.32.39:47222 | wikidatawiki   | Sleep   | 1138 | | NULL |0.000 |
| 2188171306 | wikiuser| 10.64.16.64:43004 | wikidatawiki   | Sleep   |  584 | | NULL |0.000 |

Should I kill them? Should I not be able to depool the server, even if I was in an emergency? I do not mind long running connections, if I know I can kill them at any time.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment.
I also do not want you to make you work more than necessary. If you only need 1000 rows, and it contains no private data, I can give you access to a misc server shared with other resources, no need to have a dedicated server.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread hoo
hoo added a comment.

In T151681#2836131, @jcrespo wrote:
Another example of why long running connections are a problem: I am depooling es1017 for important maintenance, I have depooled it, so I expect connections so finish within a few seconds, with the exception of wikiadmin's known long running queries, but I just see 2 sleeping connections:

| 2188112525 | wikiuser| 10.64.32.39:47222 | wikidatawiki   | Sleep   | 1138 | | NULL |0.000 |
| 2188171306 | wikiuser| 10.64.16.64:43004 | wikidatawiki   | Sleep   |  584 | | NULL |0.000 |

Should I kill them? Should I not be able to depool the server, even if I was in an emergency? I do not mind long running connections, if I know I can kill them at any time.


Hm, these are both job runners, jobs (probably) shouldn't run for so long. I wonder what's causing this.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2016-11-30 Thread jcrespo
jcrespo added a comment.
Hm, these are both job runners, jobs (probably) shouldn't run for so long. I wonder what's causing this.

Separate issue then, but heads up for it.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, Vali.matei, Minhnv-2809, D3r1ck01, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-08 Thread Ladsgroup
Ladsgroup added a comment.
Okay, it's up and running. What's next?TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread Ladsgroup
Ladsgroup added a comment.
For the record in history. This chart is number of errors of master going readonly due to replicas being lagged:
F8015751: image.png
Guess when redis dispatching has been deployed.
In ordinary days, we had around 1700 of errors of the master going readonly. Now, it's zero.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, GoranSMilovanovic, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread Ladsgroup
Ladsgroup added a comment.
Yes, Strangely I removed all selectors in logstash and added them back one by one and made some changes.
 it worked properly. This is the correct result:
F8016523: image.png
Since deployment, we still have readonly mode, but they reduced to it's 80% of original flow by comparing last 24 hours and the similar period in last week. That's not super bad. I will continue monitoring this on a week total to see if it's having a real impact or not.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, GoranSMilovanovic, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread hoo
hoo added a comment.
With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore. The queries are unchanged also (despite the GET_LOCK etc.).

Due to this it would be surprising to me if this change alone improved the situation (it was mostly a pre-requirement for that).TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, GoranSMilovanovic, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-05-10 Thread jcrespo
jcrespo added a comment.
With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore.

And that for me is a huge win alone.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: jcrespoCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, GoranSMilovanovic, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T151681: DispatchChanges: Avoid long-lasting connections to the master DB

2017-06-28 Thread Ladsgroup
Ladsgroup added a comment.
With the deployment of the changes in the dispatching, in the last 24 hours we had around 1,800 cases of going readonly but this number last week the exact same day was 3,482. This is completely tangible.TASK DETAILhttps://phabricator.wikimedia.org/T151681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: LadsgroupCc: elukey, Ladsgroup, aaron, Marostegui, jcrespo, Aklapper, Jonas, Lydia_Pintscher, hoo, daniel, GoranSMilovanovic, QZanden, Vali.matei, Minhnv-2809, Volker_E, Izno, Luke081515, Wikidata-bugs, aude, GWicke, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs