[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-07-04 Thread tfmorris
tfmorris added a comment.


  In T240442#5851541 , 
@Addshore wrote:
  
  > In T240442#5834866 , 
@Ladsgroup wrote:
  >
  >> Very broad idea, feel free to discard, I think using industry-wide 
standards for throttling like `token bucket`, `leaky bucket`, `fixed-window 
counter` or `sliding-window counter` might help here.
  >
  > One of the primary questions we need to answer is do we want to keep doing 
this client side self throttling, or switch to something more server side.
  
  I would have thought that it'd be obvious that this can't be done client 
side. They can cheat. They don't know what each other are doing. They don't 
know what other factors are affecting the servers.
  
  As @Ladsgroup hints, this is a basic distributed systems engineering problem 
with known answers. In addition to rate limiting at ingress, it may be helpful 
to add backpressure signals between the various internal servers as well as add 
jitter to the Retry-After signals sent to clients.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: tfmorris
Cc: tfmorris, valhallasw, Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, 
Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-03-09 Thread Addshore
Addshore added a comment.


  In T240442#5945682 , 
@Ladsgroup wrote:
  
  > What do you think?
  
  Definitely worth considering.
  Could be worth an RFC to get wider involvement?
  This is essentially edit rate limiting for an entire site.
  
  I'm not sure how ops perhaps would feel about artificially inflating save 
timing on wikidata for the app servers?

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: valhallasw, Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, Addshore, 
Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-03-05 Thread Ladsgroup
Ladsgroup added a comment.


  I have an idea. I think we should use PoolCounter 
 (which is basically a SaaS, 
Semaphore as a service) to put a cap on edits happening on wikidata at the same 
time. This is being used when an article is being reparsed as well, so not too 
many mw nodes parse an article at the same time (The Michael Jackson effect 
).
  
  Basically once a request realizes it's going to make an edit in Wikidata, it 
decreases the semaphore of "edit cap on Wikidata" (let's say initialized by 
value of 10, meaning only ten edits at the same time can happen in Wikidata). 
Once the semaphore reaches zero, PoolCounter keeps the 11th mw node trying to 
lock waiting and responds once one of the ten current ones finishes, if it's 
more let's say twenty, it just responds with "Too many edits happening". This 
means edit saving time might be artificially slow when there are more ten edits 
happening at the same time. Not that this already works fine with parsing 
articles (look at the blog post), I used this a while back on ores to prevent 
more than four IPs requesting ores at the same to avoid intentional and 
unintentional DDoSes, It works fine as well.
  
  PoolCounter is a pretty reliable service with almost zero down time and 
already have a good support inside mediawiki.
  
  What do you think?

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: valhallasw, Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, Addshore, 
Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-03-04 Thread Xqt
Xqt added a comment.


  In addition: should read access also be throttled?

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Xqt
Cc: Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, Addshore, Aklapper, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-02-05 Thread Addshore
Addshore added a comment.


  In T240442#5834866 , 
@Ladsgroup wrote:
  
  > Very broad idea, feel free to discard, I think using industry-wide 
standards for throttling like `token bucket`, `leaky bucket`, `fixed-window 
counter` or `sliding-window counter` might help here.
  
  One of the primary questions we need to answer is do we want to keep doing 
this client side self throttling, or switch to something more server side.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Strainu, Xqt, Dvorapa, Ladsgroup, ArthurPSmith, Addshore, Aklapper, 
darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-01-27 Thread Ladsgroup
Ladsgroup added a comment.


  Very broad idea, feel free to discard, I think using industry-wide standards 
for throttling like `token bucket`, `leaky bucket`, `fixed-window counter` or 
`sliding-window counter` might help here.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Ladsgroup, ArthurPSmith, Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-01-19 Thread Addshore
Addshore added a comment.


  It's possible that we could add some sort of suggested wait between actions 
to the output of maxlag, if that could make things easier.
  It would avoid individuals trying to figure out how long to wait..
  
  That's kind of what maxlag is, the time that you should wait before knowing 
that whatever you have written is replicated everywhere on the sal servers.
  We of course now have dispatching and the query service updates piled in 
their that have slightly different dynamics.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: ArthurPSmith, Addshore, Aklapper, darthmon_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-01-17 Thread Pintoch
Pintoch added a comment.


  It is actually possible to retrieve the current maxlag value from the API 
without making any edit (see @Addshore's comment above).
  So, just retrieve the current maxlag value and compute your desired edit rate 
for this maxlag with the function plotted above. Then sleep for the appropriate 
amount of time between any two edits to achieve this rate. Refresh the maxlag 
value from the server periodically.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pintoch
Cc: ArthurPSmith, Addshore, Aklapper, Pintoch, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2020-01-17 Thread ArthurPSmith
ArthurPSmith added a comment.


  Just saw this - I'm wondering technically how you would implement it? You 
could generate a random number between 2.5 and 5, and if maxlag is greater than 
your random number deny the edit?

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArthurPSmith
Cc: ArthurPSmith, Addshore, Aklapper, Pintoch, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2019-12-11 Thread Pintoch
Pintoch added a comment.


  Thanks! I think dynamically changing the maxlag value is likely to still 
introduce some thresholds, whereas a continuous slowdown (by retrieving the lag 
and compute one's edit rate based on it) should in theory reach an equilibrium 
point.
  
  In the meantime, Wikidata is really unusable with mass-editing tools at the 
moment. It is hard to convince people to respect maxlag=5 when that prevents 
them from editing half of the time, so I think it would be worth raising the 
WDQS factor again. We have identified which tools need to comply better, and 
having a small factor was useful for that. We probably do not want to stay in 
this state for weeks (Widar is likely to take a long time to get fixed). We 
might not want to punish the polite ones too hard!

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pintoch
Cc: Addshore, Aklapper, Pintoch, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2019-12-11 Thread Addshore
Addshore added a comment.


  As reported in IRC, maxlag can be checked with, for example, 
https://www.wikidata.org/w/api.php?action=query=json=-1
  Client could also consider dynamically changing their maxlag value, rather 
than always having it set to 5.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, Aklapper, Pintoch, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T240442: Design a continuous throttling policy for Wikidata bots

2019-12-11 Thread Pintoch
Pintoch added a comment.


  If clients are able to retrieve the current lag periodically (through some 
MediaWiki API call? which one?), then this should not require any server-side 
change. Clients can continue to use `maxlag=5` but to also throttle themselves 
using the smoothed function proposed.

TASK DETAIL
  https://phabricator.wikimedia.org/T240442

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Pintoch
Cc: Aklapper, Pintoch, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs