Ladsgroup added a comment.

  In T252091#6171258 <https://phabricator.wikimedia.org/T252091#6171258>, 
@tstarling wrote:
  
  > I hope you don't mind if I contradict my previous comment a bit, since my 
thinking is still evolving on this.
  
  No worries at all. I'm also changing my mind quickly here.
  
  > One problem with using lag as the metric is that it doesn't go negative, so 
the integral will not be pulled down while the service is idle. We could 
subtract a target lag, say 1 minute, but that loses some of the supposed 
benefit of including an integral term. A better metric would be updater load, 
i.e. demand/capacity. When the load is more than 100%, the lag increases at a 
rate of 1 second per second, but there's no further information in there as to 
how heavily overloaded it is. When the load is less than 100%, lag decreases 
until it reaches zero. While it's decreasing, the slope tells you something 
about how underloaded it is, but once it hits zero, you lose that information.
  >
  > Load is average queue size, if you take the currently running batch as 
being part of the queue. WDQS currently does not monitor the queue size. I 
gather (after an hour or so of research, I'm new to all this) that with some 
effort, KafkaPoller could obtain an estimate of the queue size by subtracting 
the current partition offsets from KafkaConsumer.endOffsets() 
<https://kafka.apache.org/25/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#endOffsets-java.util.Collection->.
  >
  > Failing that, we can make a rough approximation from available data. We can 
get the average utilisation of the importer from the 
rdf-repository-import-time-cnt metric. You can see in Grafana 
<https://grafana.wikimedia.org/d/000000489/wikidata-query-service?panelId=5&fullscreen&orgId=1&refresh=1m>
 that the derivative of this metric hovers between 0 and 1 when WDQS is not 
lagged, and remains near 1 when WDQS is lagged. The metric I would propose is 
to add replication lag to this utilisation metric, appropriately scaled: 
//utilisation + K_lag * lag - 1// where K_lag is say 1/60s. This is a metric 
which is -1 at idle, 0 when busy with no lag, and 1 with 1 minute of lag. The 
control system would adjust the request rate to keep this metric (and its 
integral) at zero.
  >
  >> With PID, we need to define three constants K_p, K_i and K_d. If we had 
problem with finding the pool size, this is going to get three times more 
complicated (I didn't find a standard way to determine these coefficients, 
maybe I'm missing something obvious)
  >
  > One way to simplify it is with K_d=0, i.e. make it a PI controller. Having 
the derivative in there probably doesn't add much. Then it's only two times 
more complicated. Although I added K_lag so I suppose we are still at 3. The 
idea is that it shouldn't matter too much exactly what K_p and K_i are set to 
-- the system should be stable and have low lag with a wide range of parameter 
values. So you just pick some values and see if it works.
  >
  >> We currently don't have an infrastructure to hold the "maxlag" data over 
time so we can calculate its derivative and integral. Should we use redis? How 
it's going to look like? These are questions, I don't have answers for them. Do 
you have ideas for that?
  >
  > WDQS lag is currently obtained by having an ApiMaxLagInfo hook handler 
which queries Prometheus, caching the result. Prometheus has a query language 
which can perform derivatives ("rate") and integrals ("sum_over_time") on 
metrics. So it would be the same system as now, just with a different 
Prometheus query.
  
  I might be a little YAGNI here but I would love to have maxlag numbers be 
kept over time and we build PI controller using the maxlag value and not the 
lag of WDQS. Mostly because WDQS hopefully will be fixed and handled later but 
there will be some sort of edit rate bottleneck all the time (jobqueue, 
replication, you name it) but if you think we can work on WDQS for now, I'm 
okay. My thinking was to have a P controller for start based on the maxlag and 
build the infrastructure to keep the data over time (maybe Prometheus?, query 
statsd? We already store all maxlag there here 
<https://grafana.wikimedia.org/d/000000156/wikidata-dispatch?panelId=22&fullscreen&orgId=1&refresh=1m&from=now-6h&to=now>
 but it seems broken atm) and add it there. I think oscillating over 3s is much 
better than oscillating around 5s because over 5s, the system doesn't accept 
the edit and the user have to re-send it.
  
  > The wording in RFC 7231 suggests to me that it is acceptable to use 
Retry-After in a 2xx response. "Servers send the "Retry-After" header field to 
indicate how long the user agent ought to wait before making a follow-up 
request." That seems pretty close to what we're doing.
  
  ack. I think we should communicate this with the tool developers (and 
pywikibot folks) so they start taking the header all the time.

TASK DETAIL
  https://phabricator.wikimedia.org/T252091

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: Zbyszko, dcausse, Nikerabbit, Majavah, tstarling, Joe, Dvorapa, daniel, 
Krinkle, Aklapper, Jakob_WMDE, Lydia_Pintscher, WMDE-leszek, darthmon_wmde, 
Addshore, Ladsgroup, Demian, DannyS712, Nandana, kostajh, Lahi, Gq86, 
GoranSMilovanovic, RazeSoldier, QZanden, LawExplorer, elukey, _jensen, 
rosalieper, D3r1ck01, Scott_WUaS, Jonas, Izno, SBisson, Perhelion, 
Wikidata-bugs, Base, aude, GWicke, Bawolff, jayvdb, fbstj, santhosh, 
Jdforrester-WMF, Mbch331, Rxy, Jay8g, Ltrlg, bd808, Legoktm
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to