dcausse added a comment.
The ElasticaWrite job seems to be receiving roughly the same amount of messages (150/s per partition on average) before and after the switch. Looking at the partitioned topic ElasticWrite it's heavily backlogged since the switch: F34569267: Capture d’écran du 2021-07-29 10-57-33.png <https://phabricator.wikimedia.org/F34569267> A huge backlog of 5 million messages have been accumulated from jun 28 and starts to be absorbed since jul 2. It matches a big bump in processing rates of changeprop consumers which is explained by a restart of some of the cpjobqueue pods that were behaving poorly : F34569457: Capture d’écran du 2021-07-29 15-15-03.png <https://phabricator.wikimedia.org/F34569457> Job timings as reported by changeprop do not suggest an increase (quite the opposite) but looking at cirrus backend logs for `send_data_write` there is clearly a bump in request_time (see P16924 <https://phabricator.wikimedia.org/P16924>, roughly + 15ms) but could perhaps be explained because jobrunners running in codfw now have to write 2 distant elasticsearch clusters (prod-eqiad and cloudelastic) as opposed to one. Also the jobqueue reports a processing time of 250 to 300ms both before and after the switch so I'm not sure that +15ms between mw and elastic could cause such a change. The consumer group lag suggests that we lack processing power but we don't seem to produce more ElasticaWrite jobs or I can't find any evidence... The increase in processing rate could perhaps be explained by the backlog causing the jobqueue to consume messages in bursts? So possible cause of the backlog: - more messages produced to cirrusElasticWrite after the switch? - the week before the switch this topic ingested 128,916,000 messages the week after 147,295,000 (+14%). - processing time of the job increased? - can't find any evidence of this except a small increase in mw <-> elastic timings but not visible in the job processing time - changeprop not giving enough room to this job? - seemed to have been the case just after switch but then the processing rates went higher then usual TASK DETAIL https://phabricator.wikimedia.org/T287563 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: EBernhardson, dcausse, Nikki, Aklapper, Lydia_Pintscher, Invadibot, MPhamWMF, maantietaja, CBogen, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Gryllida, Addshore, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org