Addshore added a comment.
We can use `page.page_links_updated` to see when the page links / a refresh links job has last been run on the page! (thanks again @Krinkle) In PHP we can do ` WikiPage::factory(Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)'))->getLinksTimestamp() ` If we run the job synchronously on a debug server, we can see the timestamp get bumped correctly. >>> WikiPage::factory(Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)'))->getLinksTimestamp(); => "20210505193823" >>> (new RefreshLinksJob( Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)'), ['rootJobTimestamp'=>wfTimestampNow()] ))->run(); => true >>> WikiPage::factory(Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)'))->getLinksTimestamp(); => "20210505195510" Trying to schedule a job in the queue manually (and wait for it to execute) for the commons category: >>> WikiPage::factory(Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)'))->getLinksTimestamp(); => "20210505195510" >>> wfTimestampNow(); => "20210505195834" >>> $pu->scheduleRefreshLinks( [ Title::newFromText('Category:Interior of Chapelle des Anges de Neunhoffen à Dambach (Bas-Rhin)') ], ['rootJobTimestamp'=>wfTimestampNow()], 'update', 'uid:addshore' ); => null .... It takes a while and at the time of writing this comment I am still waiting for it to execute... We could see the job enter the kafka queue: addshore@stat1007:~$ kafkacat -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.refreshLinks' -o -1000000 | grep commons.wikimedia.org | grep Neunhoffen {"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"766bd8c164f5e0761ffc3a89","id":"9f61fcb8-85db-4e66-906c-155a16fddb16","dt":"2021-05-05T19:58:17Z","domain":"commons.wikimedia.org","stream":"mediawiki.job.refreshLinks"},"database":"commonswiki","type":"refreshLinks","sha1":"3a14ea46843e2f7b3b687bdf45112fbca14ac89a","params":{"rootJobTimestamp":"20210505195817","causeAction":"update","causeAgent":"uid:addshore","namespace":14,"title":"Interior_of_Chapelle_des_Anges_de_Neunhoffen_à_Dambach_(Bas-Rhin)","requestId":"766bd8c164f5e0761ffc3a89"},"mediawiki_signature":"32c72f98713180f9aefc8059c33016bb221d6950"} It took a while so at the same time I tried on testwiki: >>> WikiPage::factory(Title::newFromText('PagePropsies'))->getLinksTimestamp(); => "20210505124631" >>> wfTimestampNow(); => "20210505201007" >>> $pu->scheduleRefreshLinks( [ Title::newFromText('PagePropsies') ], ['rootJobTimestamp'=>wfTimestampNow()], 'update', 'uid:addshore' ); => null ... >>> WikiPage::factory(Title::newFromText('PagePropsies'))->getLinksTimestamp(); => "20210505202436" So on testwiki this took 14 mins 30 seconds (roughly what @Jakob_WMDE @Michael & I saw testing). And a commonswiki category took.... TBA a time when it get executed from the queue -------------------------- Some more relevant conversation from IRC > 9:37 PM <Pchelolo> addshore: is your job refreshLinks or refreshLinksPrioritized? > 9:38 PM <addshore> How do I know that? I was assuming refreshLinks, but perhaps that's where I went wrong? > 9:38 PM <Pchelolo> since those 2 have VERY different guarantees.. for prioritize one mean delay is a few seconds, while for regular it's 6 hours > 9:45 PM <Pchelolo> kafkacat -b kafka-main1001.equad.wmnet > 9:45 PM <Pchelolo> kafka-jumbo is an analytics cluster where jobs were replicated at some point, but ottomata is in the process of stopping the mirroring > 9:47 PM <Pchelolo> so, with kafka-main you should see you job in non-prioritized non-partitioned topic. e.g kafkacat -b kafka-main1001.eqiad.wmnet:9092 -p 0 -t eqiad.mediawiki.job.refreshLinks > 9:47 PM <Pchelolo> if you are sending it via wikibase WikiPageUpdater like in your paste on phab ---------------------- So applying this to investigate another one of these seemingly more "randon" failures: - I grabbed a commons category page from https://www.wikidata.org/w/index.php?title=Special:Contributions&offset=20210503061116&limit=500&target=Pi+bot&namespace=all&tagfilter=&newOnly=1&start=&end= - 2 May 18:48 the sitelink was added https://www.wikidata.org/w/index.php?title=Q106686114&diff=1413255927&oldid=1413255916 - I confirm with `(MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText('Category:Views_from_Palacio_Municipal_(Montevideo)')->toPageRecord(),ParserOptions::newCanonical())->getProperty('wikibase_item');` that the property exists in the parser output - I confirm with `PageProps::getInstance()->getProperties([Title::newFromText('Category:Views_from_Palacio_Municipal_(Montevideo)')],'wikibase_item');` that no page_prop is currently in the db. - I look for the job from 3-4 days ago... - For refreshLinks - An offset of `-50000000` starts the topic with dates from `2021-05-01T19:54:24Z` which is enough time - So `kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.refreshLinks' -o -50000000 | grep commons.wikimedia.org | grep Montevideo | grep Palacio | grep Views` should show us any refreshLinks jobs for this page that were triggered after the edits before reaching the end of the topic - I didn't find anything P15790 <https://phabricator.wikimedia.org/P15790> - I should also find a html cache update job.... - An offest of `-17000000` gave me timestamps of around `2021-05-02T10:50:26Z` so that should do as a starting point - So `kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.htmlCacheUpdate' -o -17000000| grep commons.wikimedia.org | grep Montevideo | grep Palacio | grep Views` should show us any jobs for this page that were triggered... - I did find this job P15791 <https://phabricator.wikimedia.org/P15791> So it looks like the refreshLinks job is not being scheduled? (which would explain the missing page_props in the table even after waiting days... --------- Checking the job situation on another commons category that was recently edited... https://www.wikidata.org/wiki/Q106685791 Sitelink was added on the 2nd of May after 17:00 https://www.wikidata.org/w/index.php?title=Q106685791&diff=1413223762&oldid=1413223751 To the commons category https://commons.wikimedia.org/wiki/Category:Views_from_the_IND_Fulton_Street_Line $t = "Category:Views_from_the_IND_Fulton_Street_Line"; (MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText($t)->toPageRecord(),ParserOptions::newCanonical())->getCacheTime(); (MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText($t)->toPageRecord(),ParserOptions::newCanonical())->getProperty('wikibase_item'); PageProps::getInstance()->getProperties([Title::newFromText($t)],'wikibase_item'); Showed that apparently the parser cache entry exists from before the time of the edit? And also that the page_props table was not populated... >>> (MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText($t)->toPageRecord(),ParserOptions::newCanonical())->getCacheTime(); => "20210424003324" >>> (MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText($t)->toPageRecord(),ParserOptions::newCanonical())->getProperty('wikibase_item'); => false >>> PageProps::getInstance()->getProperties([Title::newFromText($t)],'wikibase_item'); => [] Looking for the relevant jobs... addshore@stat1007:~$ kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.htmlCacheUpdate' -o -17000000 | grep commons.wikimedia.org | grep Views_from_the_IND_Fulton_Street_Line % Reached end of topic eqiad.mediawiki.job.htmlCacheUpdate [0] at offset 5360231743 addshore@stat1007:~$ kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.refreshLinks' -o -50000000 | grep commons.wikimedia.org | grep Views_from_the_IND_Fulton_Street_Line % Reached end of topic eqiad.mediawiki.job.refreshLinks [0] at offset 16377843525 In this case, neither of the jobs were scheduled or run? And then manually making these jobs from Wikibase PHP code on a debug server... $services = MediaWiki\MediaWikiServices::getInstance(); $pu = new \Wikibase\Client\Changes\WikiPageUpdater( JobQueueGroup::singleton(), \Wikibase\Client\WikibaseClient::getLogger( $services ), $services->getStatsdDataFactory()); $pu->purgeWebCache( [ Title::newFromText('Category:Views_from_the_IND_Fulton_Street_Line') ], ['rootJobTimestamp'=>wfTimestampNow()], 'update', 'uid:addshore' ); $pu->scheduleRefreshLinks( [ Title::newFromText('Category:Views_from_the_IND_Fulton_Street_Line') ], ['rootJobTimestamp'=>wfTimestampNow()], 'update', 'uid:addshore' ); Looking for the by hand jobs addshore@stat1007:~$ kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.htmlCacheUpdate' -o -1000000 | grep commons.wikimedia.org | grep Views_from_the_IND_Fulton_Street_Line {"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"5eea20a0a6fd2a89a842fde5","id":"619b6d62-ce04-473d-ac61-695ca21ee627","dt":"2021-05-05T22:48:47Z","domain":"commons.wikimedia.org","stream":"mediawiki.job.htmlCacheUpdate"},"database":"commonswiki","type":"htmlCacheUpdate","sha1":"6b6626bec7f564a9945e696aa5f92c93f49e27b1","root_event":{"signature":"params:ee961b0b6ee828a2b414c3d895df758dea65ec94","dt":"2021-05-05T22:48:47Z"},"params":{"pages":{"104243054":[14,"Views_from_the_IND_Fulton_Street_Line"]},"rootJobSignature":"params:ee961b0b6ee828a2b414c3d895df758dea65ec94","rootJobTimestamp":"20210505224847","causeAction":"update","causeAgent":"uid:addshore","namespace":-1,"title":"Badtitle/Wikibase\\Client\\Changes\\WikiPageUpdater","requestId":"5eea20a0a6fd2a89a842fde5"},"mediawiki_signature":"42cd76073d5b45e84255a2460d2bd4fb262556bc"} % Reached end of topic eqiad.mediawiki.job.htmlCacheUpdate [0] at offset 5360234541 addshore@stat1007:~$ kafkacat -C -b kafka-main1001.eqiad.wmnet -p 0 -t 'eqiad.mediawiki.job.refreshLinks' -o -1000000 | grep commons.wikimedia.org | grep Views_from_the_IND_Fulton_Street_Line {"$schema":"/mediawiki/job/1.0.0","meta":{"uri":"https://placeholder.invalid/wiki/Special:Badtitle","request_id":"5eea20a0a6fd2a89a842fde5","id":"f32e87fd-202b-493f-bc57-8b2edb857de1","dt":"2021-05-05T22:48:48Z","domain":"commons.wikimedia.org","stream":"mediawiki.job.refreshLinks"},"database":"commonswiki","type":"refreshLinks","sha1":"1bf24cbffca3ae3120f434f9167099eb1bea7ec3","params":{"rootJobTimestamp":"20210505224848","causeAction":"update","causeAgent":"uid:addshore","namespace":14,"title":"Views_from_the_IND_Fulton_Street_Line","requestId":"5eea20a0a6fd2a89a842fde5"},"mediawiki_signature":"88689ad9d37babfe1e5408b46a1620c315ddf645"} % Reached end of topic eqiad.mediawiki.job.refreshLinks [0] at offset 16377856891 I was then going to check the `wb_changes` table for this edit, but unfortunately it seems to have happened just before the current cutoff... mysql:research@dbstore1005.eqiad.wmnet [wikidatawiki]> select change_time from wb_changes ORDER BY change_time ASC limit 2; +----------------+ | change_time | +----------------+ | 20210502224506 | | 20210502224506 | +----------------+ 2 rows in set (0.001 sec) But I could also look for the `ChangeNotificationJob` which would happen if the change did exist.... (in the topic `eqiad.mediawiki.job.ChangeNotification` in kafka) An offest of `-2000000` currently gives us timestamps of `20210430041324` which is a plenty far back enough time to start looking. But these jobs only include the change ids.... not other info we can look for... ----------- So these jobs (htmlCache & links update) are sometimes not getting into kafka. Sometimes its one job? sometimes its both? and seemingly sometimes both get schedueled? This may be an issue in Wikibase or in MediaWiki, but the issue seems to happen between the edit occurring, and job landing in kafka, so this could be: - wb_change table entries failing? - ChangeNotificationJob failing? - htmlCache / linksUpdate jobs failing? It would probably be beneficial to add some more logging in Wikibase code around this area to see if we can get to the bottom of this... TASK DETAIL https://phabricator.wikimedia.org/T280627 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michael, Addshore Cc: Krinkle, daniel, Jakob_WMDE, Mike_Peel, Aklapper, Jheald, Lucas_Werkmeister_WMDE, Addshore, WMDE-leszek, aaron, tstarling, LibrErli, Lea_Lacroix_WMDE, Ladsgroup, RolandUnger, Urbanecm, Bencemac, Tacsipacsi, Kizule, CCicalese_WMF, Lydia_Pintscher, Invadibot, maantietaja, Muchiri124, Hazizibinmahdi, CBogen, Akuckartz, Iflorez, WDoranWMF, alaa_wmde, holger.knust, EvanProdromou, Nandana, Lahi, Gq86, Ramsey-WMF, GoranSMilovanovic, QZanden, LawExplorer, Poyekhali, _jensen, rosalieper, Agabi10, Taiwania_Justo, Scott_WUaS, Pchelolo, Jonas, Ixocactus, Wong128hk, Wikidata-bugs, aude, El_Grafo, Dinoguy1000, Steinsplitter, Mbch331, Keegan
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org