Addshore added subscribers: Jakob_WMDE, daniel, Krinkle.
Addshore added a comment.


  So after the look that we (@Jakob_WMDE @Michael & I) took yesterday we think 
we figured out some things.
  The reproduction case in T233520#6898755 
<https://phabricator.wikimedia.org/T233520#6898755> half works, but we seem to 
consistently see the page_props table eventually get populated with the needed 
value.
  
  Seemingly the following is happening:
  
  - Create a sitelink
    - Dispatching process
      - The sitelink edit kicks off the dispatching process for getting the 
change to the needed clients
      - Within a few seconds we see the parser cache / output timestamp change 
to the current time (implying that dispatching is done and the page was purged 
and now reparsed). This can be seen in a HTML comment of the page on the client.
      - This lines up with delay in dispatching (which is now very short, only 
a few seconds) https://www.wikidata.org/wiki/Special:DispatchStats and the fact 
that the jobs on the client also get executed quickly 
https://grafana.wikimedia.org/d/CbmStnlGk/jobqueue-job?orgId=1&var-dc=eqiad%20prometheus%2Fk8s&var-job=ChangeNotification
    - ParserOutput generation after dispatch
      - We see the page reparse as the timestamp for ParserOutput generation 
change to the current time.
      - Using `shell.php` we can see that the page_props added by wikibase hook 
are immediately available in the new `ParserOutput` object stored in the cache.
        - This can be checked with something like 
`(MediaWiki\MediaWikiServices::getInstance()->getParserCache())->get(Title::newFromText('PagePropsies')->toPageRecord(),ParserOptions::newCanonical())->getProperty('wikibase_item');`
      - Using the API or looking directly in the DB we do not see the page 
property for example 
https://test.wikipedia.org/w/api.php?action=query&format=json&prop=pageprops&titles=PagePropsies
    - Some time later the value seems to appear in the page_props table (the 
parser output timestamp does not change here)
  
  I have enquired to @daniel to see if this gap between generating 
`ParserOutput` with our property in it, and population of the `page_props` 
table is normal or to be expected (to try and fully understand the flow here).
  @Krinkle also popped in on IRC and noted that we could be seeing a user view 
cause the initial parse? which would lead to the `page_props` population occur 
in a job after (todo check this)
  
  Another thing that we then discussed was the fact that this dispatching 
process now only takes a few seconds.
  It could be a reasonable assumption that the issue seen in T233520: 
page_props wikibase_item is sometimes not added to client pages when a sitelink 
is added on a repo <https://phabricator.wikimedia.org/T233520> is now caused by 
a race condition due to this fast process.
  The ticket was created in 2019 and dispatching got fast in 2018 (see T205865: 
Investigate decrease in wikidata dispatch times due to eqiad -> codfw DC switch 
<https://phabricator.wikimedia.org/T205865>). Throughout multiple 
investigations we have not really found any other reason for this change in 
behaviour.
  
  **The possible race condition here would be:**
  
  - Edit adding the sitelink to an Item happens on wikidata
  - `wb_changes` table is initially populated via `notifyOnPageModified` during 
the main request (see 
https://www.mediawiki.org/wiki/Manual:Hooks/RevisionFromEditComplete)
  - In `ItemHandler::getSecondaryDataUpdates` we add updates to call 
`saveLinksOfItem` or `deleteLinksOfItem` which can be executed in a deferred 
fashion (post request), and I believe this can also land in a job?
  
  This can lead to wb_changes being written some period of time before 
sitelinks end up in the secondary store, which is where the client reads from 
to populate the page_properties of the client page.
  
  A similar race condition probably leads to what we see in T248984: Client 
recentchanges entries sometimes don't have their wb_changes.change_id reference 
set <https://phabricator.wikimedia.org/T248984>
  `$changeStore->saveChange` is called in `onRecentChangeSave` that adds recent 
change info to the already stored wb_changes row.
  But this row may already have been read by the client due to fast 
dispatching, thus this data ends up missing from the client.

TASK DETAIL
  https://phabricator.wikimedia.org/T280627

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Michael, Addshore
Cc: Krinkle, daniel, Jakob_WMDE, Mike_Peel, Aklapper, Jheald, 
Lucas_Werkmeister_WMDE, Addshore, WMDE-leszek, aaron, tstarling, LibrErli, 
Lea_Lacroix_WMDE, Ladsgroup, RolandUnger, Urbanecm, Bencemac, Tacsipacsi, 
Kizule, CCicalese_WMF, Lydia_Pintscher, Invadibot, maantietaja, Muchiri124, 
Hazizibinmahdi, CBogen, Akuckartz, Iflorez, WDoranWMF, alaa_wmde, holger.knust, 
EvanProdromou, Nandana, Lahi, Gq86, Ramsey-WMF, GoranSMilovanovic, QZanden, 
LawExplorer, Poyekhali, _jensen, rosalieper, Agabi10, Taiwania_Justo, 
Scott_WUaS, Pchelolo, Jonas, Ixocactus, Wong128hk, Wikidata-bugs, aude, 
El_Grafo, Dinoguy1000, Steinsplitter, Mbch331, Keegan
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to