Smalyshev created this task.
Smalyshev added a project: Wikidata-Query-Service.
Restricted Application added a subscriber: Aklapper.
Restricted Application added a project: Wikidata.

TASK DESCRIPTION
  If we kept a map of all latest revision IDs for all items we've recently 
updated (not derived from events but from actually fetched data sent to the 
database), we could eliminate a lot of stale updates - especially when we're 
catching up after the lag. The first mention of the item would fetch the latest 
rev, and then all the following events would basically be ignored.
  
  Right now we do something like that within the batch, and again match the 
revision IDs against the database after the fetches - but this way we can do it 
cross-batch and eliminate the unnecessary fetches. Basically that'd solve the 
problem of lots of fetches (while the cache is active) since each item will be 
fetched only once per backlog. I think with proper data structure (like 
SparseArray maybe?) we could keep a lot of history there relatively cheaply (we 
just need one 64-bit int per item). Also probably won't work for changes that 
lack revision ID - like deletes - but we could either ignore those (they are 
relatively rare) or also use timestamps (dangerous).
  
  It's a bit risky since we'd be basing updates on non-database information 
(i.e. if the database somehow fails the update but we think it's successful 
we'd be wrongly dropping the updates) but I think it's acceptable and since the 
map would be ephemeral, it would be gone after restart.
  
  We could optimize it by only keeping the map for Q-ids - we could probably 
then use integer keys, and 2G of integer space would last us for a while. Or 
maybe more efficient to use regular HashMap and benefit from cache eviction 
support built in.

TASK DETAIL
  https://phabricator.wikimedia.org/T217925

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Gehel, Aklapper, Smalyshev, alaa_wmde, Nandana, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to