Lucas_Werkmeister_WMDE added a comment.
In T309445#8023699 <https://phabricator.wikimedia.org/T309445#8023699>, @Lydia_Pintscher wrote: > Does any of the lists of potential merge candidates at https://www.wikidata.org/wiki/Help:Merge#Finding_items_to_merge help? Thanks, I might try some of those later. ---- (Aside about localhost: I tried to make my wiki serve API requests in parallel more, because I now suspect this is a race condition. Currently, I can send requests in parallel and they’ll be processed in parallel, and after setting `output_buffering = 0;` in php.init and therefore letting MediaWiki taking care of buffering and sending a `Content-Length` response header, the browser will also perceive the API requests to have completed as soon as the main request is done, so that secondary data updates actually take place post-send as expected and can potentially run in parallel to subsequent API requests. But for some reason, a `sleep()` in a post-send update will still prevent //later// API requests from starting processing, so that post-send updates still won’t overlap with requests that came in after the first request finished sending the response. It’s possible that session_write_close() <https://www.php.net/manual/en/function.session-write-close.php> should help with this, or at least that it’s related to sessions – I stopped looking into it at this point, and started experimenting on Test Wikidata instead.) It turns out the merge gadget doesn’t do as many different API things as I thought. It has some pre-merge checks and some error handling, but at the API level, the merge it does is just: 1. wbmergeitems (with `ignoreconflicts=description`) 2. purge (“// XXX: Do we even need to purge? Why?”) 3. load target item There is no extra API request for blanking the item or anything – if the merge makes several edits, they all come from `wbmergeitems`. If you just create two new items and then merge them with `wbmergeitems` on Test Wikidata <https://test.wikidata.org/wiki/Special:BlankPage>, and then immediately get their `entityterms`: api = new mw.Api(); now = new Date().toISOString(); item1 = ( await api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: `${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: `${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, summary: 'T309445' } ); await api.get( { action: 'query', prop: 'entityterms', wbetlanguage: 'en', titles: item1, formatversion: 2 } ).then( r => r.query.pages[ 0 ].entityterms ); Then the `entityterms` will fairly often just have the terms of the original item (a `label`), without the merged terms of the other item (a `description`). But this resolves itself after a second or so (once the secondary data updates of the merge finish), and if you repeat the `entityterms` request it’ll show all the terms as expected. So I tried matching the merge gadget more closely, by also purging the page and then getting its HTML: api = new mw.Api(); now = new Date().toISOString(); item1 = ( await api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: `${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: `${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, summary: 'T309445' } ); await api.post( { action: 'purge', titles: item1 } ); await fetch( `https://test.wikidata.org/wiki/${item1}` ); await api.get( { action: 'query', prop: 'entityterms', wbetlanguage: 'en', titles: item1, formatversion: 2 } ).then( r => r.query.pages[ 0 ].entityterms ); But with this version, the logged `entityterms` are actually always complete (at least every time I tried this). It looks like the purge + render gives the secondary data updates enough time to finish. In theory, I think both the merge and the page view could attempt to write terms to the term store. (Though I’m not sure if loading an item page that isn’t in the parser cache will actually cause the secondary data updates to run again.) However, the page view should always have the merged item state, thanks to MediaWiki’s ChronologyProtector, so there shouldn’t be any race condition there. The only possible race condition I can see at the moment is if someone //else// happens to view the item at the critical moment (ChronologyProtector only protects the chronology of requests from one source): 1. Client A merges the items. Client A’s secondary data updates start. 2. Client A purges the item. 3. Client A loads the item. 4. Client B loads the item (randomly?) and happens to get a database replica that hasn’t seen the merge yet. 5. Client A’s secondary data updates finish, writing post-merge terms to the term store. 6. Client B’s secondary data updates start, based on the pre-merge revision from the replica. 7. Client B’s secondary data updates finish, writing pre-merge terms to the term store. This seems very unlikely, and I’m still not sure if it’s even possible (does a pageview really run the secondary data updates again?). I think this still needs more investigation; I might try to just merge enough duplicates on Wikidata, with the network panel open, and if the bug happens, then I would look at the network panel more closely and see what requests happened in what order. TASK DETAIL https://phabricator.wikimedia.org/T309445 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org