Lucas_Werkmeister_WMDE added a comment.
Proposed procedure, refined from @hoo’s comment: - for each sitelink: - get the database lock, with the site + title as name - if we fail to acquire it, report conflict (with unknown other item) - getWithSetCallback the memcached “lock”, with the site + title as key, the item ID as the value, and a low TTL (60 seconds?) - if the returned value is not the item ID, load the other item (from the primary DB); if it indeed has a sitelink conflict, report conflict - on transaction commit, free the database lock Rationale: - The database lock is the main protection. If the connection drops while we hold the lock, MediaWiki will not automatically reconnect – so if the connection drops right after we get the lock, then the save will fail, which is good because otherwise we might save a conflicting item. However, we don’t want this to happen during the secondary data updates (where `items_per_site` is updated), which is why we release the lock on transaction commit. - The memcached “lock” bridges the gap between the commit of the main transaction and the secondary data update that writes the sitelinks to `items_per_site` (at which point the regular conflict detection can pick them up, see `SqlSiteLinkConflictLookup`). The case that two requests concurrently get the key from memcached, see that it’s free, and each write their item ID to memcached, is mostly prevented by the database lock. - We write the item ID to memcached, and check the mentioned item before reporting a conflict, to avoid a scenario where a sitelink is quickly added, removed, and then added to another item (we want to allow it on the other item, not detect that as a conflict). A tiny race condition is still possible with this scheme, I believe: - request A gets the database lock - request A’s database connection dies, releasing the lock - request A gets the value from memcached, sees it’s empty - request B gets the database lock - request B gets the value from memcached, sees it’s empty - request B writes its item ID to memcached - request B proceeds to save the item and release the database lock - request A writes its item ID to memcached - request A tries to save the item, fails because the connection died (and won’t be automatically restored, because a lock was held) - request C gets the database lock - request C gets the value from memcached, sees request A’s item ID - request C tries to get request A’s item, fails because request A never managed to save its item - request C believes there is no conflict, saves the item - undetected conflict between request B and C I can’t think of a realistic way to solve this, and I think we can probably just ignore it: in practice, it should be very unlikely. (If we didn’t write the item ID to memcached, and instead assumed that the presence of any key means a conflict, that would prevent this issue, but block more legitimate edits than we want, I believe.) TASK DETAIL https://phabricator.wikimedia.org/T291377 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Lucas_Werkmeister_WMDE Cc: Aklapper, hoo, Lea_Lacroix_WMDE, Mohammed_Sadat_WMDE, Lydia_Pintscher, Addshore, Lucas_Werkmeister_WMDE, Invadibot, maantietaja, Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org