Lucas_Werkmeister_WMDE added a comment.

  In T309445#8023699 <https://phabricator.wikimedia.org/T309445#8023699>, 
@Lydia_Pintscher wrote:
  
  > Does any of the lists of potential merge candidates at 
https://www.wikidata.org/wiki/Help:Merge#Finding_items_to_merge help?
  
  Thanks, I might try some of those later.
  
  ----
  
  (Aside about localhost: I tried to make my wiki serve API requests in 
parallel more, because I now suspect this is a race condition. Currently, I can 
send requests in parallel and they’ll be processed in parallel, and after 
setting `output_buffering = 0;` in php.init and therefore letting MediaWiki 
taking care of buffering and sending a `Content-Length` response header, the 
browser will also perceive the API requests to have completed as soon as the 
main request is done, so that secondary data updates actually take place 
post-send as expected and can potentially run in parallel to subsequent API 
requests. But for some reason, a `sleep()` in a post-send update will still 
prevent //later// API requests from starting processing, so that post-send 
updates still won’t overlap with requests that came in after the first request 
finished sending the response. It’s possible that session_write_close() 
<https://www.php.net/manual/en/function.session-write-close.php> should help 
with this, or at least that it’s related to sessions – I stopped looking into 
it at this point, and started experimenting on Test Wikidata instead.)
  
  It turns out the merge gadget doesn’t do as many different API things as I 
thought. It has some pre-merge checks and some error handling, but at the API 
level, the merge it does is just:
  
  1. wbmergeitems (with `ignoreconflicts=description`)
  2. purge (“// XXX: Do we even need to purge? Why?”)
  3. load target item
  
  There is no extra API request for blanking the item or anything – if the 
merge makes several edits, they all come from `wbmergeitems`.
  
  If you just create two new items and then merge them with `wbmergeitems` on 
Test Wikidata <https://test.wikidata.org/wiki/Special:BlankPage>, and then 
immediately get their `entityterms`:
  
    api = new mw.Api(); now = new Date().toISOString(); item1 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: 
`${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: 
`${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await 
api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, 
summary: 'T309445' } ); await api.get( { action: 'query', prop: 'entityterms', 
wbetlanguage: 'en', titles: item1, formatversion: 2 } ).then( r => 
r.query.pages[ 0 ].entityterms );
  
  Then the `entityterms` will fairly often just have the terms of the original 
item (a `label`), without the merged terms of the other item (a `description`). 
But this resolves itself after a second or so (once the secondary data updates 
of the merge finish), and if you repeat the `entityterms` request it’ll show 
all the terms as expected.
  
  So I tried matching the merge gadget more closely, by also purging the page 
and then getting its HTML:
  
    api = new mw.Api(); now = new Date().toISOString(); item1 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { labels: { en: { language: 'en', value: 
`${now} (label)` } } } ) } ) ).entity.id; console.log( item1 ); item2 = ( await 
api.postWithEditToken( { action: 'wbeditentity', new: 'item', summary: 
'T309445', data: JSON.stringify( { descriptions: { en: { language: 'en', value: 
`${now} (description)` } } } ) } ) ).entity.id; console.log( item2 ); await 
api.postWithEditToken( { action: 'wbmergeitems', fromid: item2, toid: item1, 
summary: 'T309445' } ); await api.post( { action: 'purge', titles: item1 } ); 
await fetch( `https://test.wikidata.org/wiki/${item1}` ); await api.get( { 
action: 'query', prop: 'entityterms', wbetlanguage: 'en', titles: item1, 
formatversion: 2 } ).then( r => r.query.pages[ 0 ].entityterms );
  
  But with this version, the logged `entityterms` are actually always complete 
(at least every time I tried this). It looks like the purge + render gives the 
secondary data updates enough time to finish.
  
  In theory, I think both the merge and the page view could attempt to write 
terms to the term store. (Though I’m not sure if loading an item page that 
isn’t in the parser cache will actually cause the secondary data updates to run 
again.) However, the page view should always have the merged item state, thanks 
to MediaWiki’s ChronologyProtector, so there shouldn’t be any race condition 
there. The only possible race condition I can see at the moment is if someone 
//else// happens to view the item at the critical moment (ChronologyProtector 
only protects the chronology of requests from one source):
  
  1. Client A merges the items. Client A’s secondary data updates start.
  2. Client A purges the item.
  3. Client A loads the item.
  4. Client B loads the item (randomly?) and happens to get a database replica 
that hasn’t seen the merge yet.
  5. Client A’s secondary data updates finish, writing post-merge terms to the 
term store.
  6. Client B’s secondary data updates start, based on the pre-merge revision 
from the replica.
  7. Client B’s secondary data updates finish, writing pre-merge terms to the 
term store.
  
  This seems very unlikely, and I’m still not sure if it’s even possible (does 
a pageview really run the secondary data updates again?). I think this still 
needs more investigation; I might try to just merge enough duplicates on 
Wikidata, with the network panel open, and if the bug happens, then I would 
look at the network panel more closely and see what requests happened in what 
order.

TASK DETAIL
  https://phabricator.wikimedia.org/T309445

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE
Cc: Lydia_Pintscher, karapayneWMDE, Addshore, Manuel, Lucas_Werkmeister_WMDE, 
Aklapper, Moebeus, Astuthiodit_1, Invadibot, maantietaja, ItamarWMDE, 
Akuckartz, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to