Ladsgroup added a comment.

  In T252091#6154167 <https://phabricator.wikimedia.org/T252091#6154167>, 
@tstarling wrote:
  
  > This proposal is effectively a dynamic rate limit except that instead of 
delivering an error message when it is exceeded, we will just hold the 
connection open, forcing the bot to wait. That's expensive in terms of server 
resources -- we'd rather have the client wait using only its own resources. A 
rate limit has a tunable parameter (the rate) which is not really knowable. 
Similarly, this proposal has a tunable parameter (the pool size) which is not 
really knowable. You have to tune the pool size down until the replag stops 
increasing, but then if the nature of the edits changes, or if the hardware 
changes, the optimal pool size will change.
  >
  > I suggested at T202107 <https://phabricator.wikimedia.org/T202107> that the 
best method for globally controlling replication lag would be with a PID 
controller <https://en.wikipedia.org/wiki/PID_controller>. A PID controller 
suppresses oscillation by having a memory of recent changes in the metric. The 
P (proportional) term is essentially as proposed at T240442 
<https://phabricator.wikimedia.org/T240442> -- just back off proportionally as 
the lag increases. The problem with this is that it will settle into an 
equilibrium lag somewhere in the middle of the range. The I (integral) term 
addresses this by maintaining a rolling average and adjusting the control value 
until the average meets the desired value. This allows it to maintain 
approximately the same edit rate but with a lower average replication lag. The 
D (derivative) term causes the control value to be reduced more aggressively if 
the metric is rising quickly.
  >
  > My proposal is to use a PID controller to set the Retry-After header. 
Clients would be strongly encouraged to respect that header. We could have say 
maxlag=auto to opt in to this system.
  
  I quite like the idea of using PID but there are three notes I want to 
mention:
  
  - With PID, we need to define three constants K_p, K_i and K_d. If we had 
problem with finding the pool size, this is going to get three times more 
complicated (I didn't find a standard way to determine these coefficients, 
maybe I'm missing something obvious)
  - We currently don't have an infrastructure to hold the "maxlag" data over 
time so we can calculate its derivative and integral.  Should we use redis? How 
it's going to look like? These are questions, I don't have answers for them. Do 
you have ideas for that?
  - I'm not sure "Retry-After" is a good header for 2xx responses. It's like 
"We accepted your edit but "retry" it after 2 seconds". I looked at RFC 7231 
and it doesn't explicitly say we can't use it in 2xx requests but I haven't 
seen anywhere use it in 2xx responses. We might be able to find another better 
header?

TASK DETAIL
  https://phabricator.wikimedia.org/T252091

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: tstarling, Joe, Dvorapa, daniel, Krinkle, Aklapper, Jakob_WMDE, 
Lydia_Pintscher, WMDE-leszek, darthmon_wmde, Addshore, Ladsgroup, Demian, 
DannyS712, Nandana, kostajh, Lahi, Gq86, GoranSMilovanovic, RazeSoldier, 
QZanden, LawExplorer, elukey, _jensen, rosalieper, D3r1ck01, Scott_WUaS, Jonas, 
Izno, SBisson, Perhelion, Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, 
fbstj, santhosh, Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to