[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
hoo added a comment. I talked to Gabriel about plans to obsolete link tables and he told me, that there are no immediate plans to replace local link tables (which includes this table). Due to this, I will pick this up in the next days if @daniel is ok with that. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Izno, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
hoo added a comment. In that case I think we should pick this up soon. If my above analysis is correct and per the IRC discussion I had with @daniel yesterday, I don't think is very hard to do. First step is to stop using `eu_touched`, then stop updating (touching) it and finally we can drop it. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
jcrespo added a subscriber: jcrespo. jcrespo added a comment. As per @hoo request, I am sharing my own opinion here: This is blocking ROW based replication application, which should be rolled in soon. In my own opinion (database-focused) I think wikidata updates (this is one of them) are the worse mediawiki-related production cluster problem happening right now. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jcrespo Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
hoo added a comment. We talked about this on IRC some more and came to the conclusion that it would probably work without `eu_touched`. It's probably not worth implementing that, as dependency graphs are upcoming anyway. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
hoo added a comment. In https://phabricator.wikimedia.org/T124737#1966530, @daniel wrote: > > I don't see why that would be the case. It seems to me you are under the > > misconception that we purge caches for individual languages, but we don't > > (and can't). > > True, we don't right now. But that is being worked on. Then again, once we > have proper dependency tracking of derived resources, that will itself > replace the entity_usage table (which is just tracking of entities by derived > resources). I see that is planned, but I don't think that we should try to cater for that right now (also eu_touched would be a poor mean to handle that). >> Once MediaWiki decides that the page caches need to get cleared, we throw >> away all entity usages rows (effectively, as we purge all that are older >> than wfTimestampNow()) and then insert the new usages of the (at that point) >> single new ParserCache entry. > > Yes, that was my initial plan. But that means we have to be sure that the > re-rendering and putting-into-the-parser-cache happens after the save & > purge. Which, surprisingly, it often doesn't. In particular, ApiStashEdit > stores the new rendered version into the parser cache before it gets saved. > And it will do so even for previews that never get saved. As discussed on IRC, I don't think that's so hard to get right and stashed edits aren't a problem here, as they don't trigger any of our hook handlers. Keep in mind that core also does post edit updates for all of its tracking tables and it works at least reasonably well. > Btw: conceptually, when tracking dependencies, we need to track the identity > of resource that depends. eu_cache_key would provide this, eu_touched is a > hacky way to get around that requirement. Not necessary (if I understand you correctly). We just need to treat all existing and valid cached versions of a page as a single entity and we're conceptually ok. That doesn't make for an exhaustive list of "dependencies" (across all cached versions that could theoretically exist)… which might not be a very nice way to handle that, but it's enough for our use case. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
daniel added a comment. Btw: conceptually, when tracking dependencies, we need to track the identity of resource that depends. eu_cache_key would provide this, eu_touched is a hacky way to get around that requirement. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
daniel added a comment. > I don't see why that would be the case. It seems to me you are under the > misconception that we purge caches for individual languages, but we don't > (and can't). True, we don't right now. But that is being worked on. Then again, once we have proper dependency tracking of derived resources, that will itself replace the entity_usage table (which is just tracking of entities by derived resources). > Once MediaWiki decides that the page caches need to get cleared, we throw > away all entity usages rows (effectively, as we purge all that are older than > wfTimestampNow()) and then insert the new usages of the (at that point) > single new ParserCache entry. Yes, that was my initial plan. But that means we have to be sure that the re-rendering and putting-into-the-parser-cache happens after the save & purge. Which, surprisingly, it often doesn't. In particular, ApiStashEdit stores the new rendered version into the parser cache before it gets saved. And it will do so even for previews that never get saved. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
hoo added a comment. > So, as far as I can see, we can get rid of eu_touched only if we replace it > with eu_cache_key. That would effectively mean tracking usage for each target > language separately. I don't see why that would be the case. It seems to me you are under the misconception that we purge caches for individual languages, but we don't (and can't). Once something relevant for one page changes, we need to throw away all cache entries for that page anyway. What we already do right now (if you read my above analysis closely) is tracking all entity usages that are associated with a page (that can mean one ore more ParserCache entries with different cache keys and whatnot). We don't need to know which cache entry these keys belong to in the end, we just need to know which page we need to purge in case. Once MediaWiki decides that the page caches need to get cleared, we throw away all entity usages rows (effectively, as we purge all that are older than `wfTimestampNow()`) and then insert the new usages of the (at that point) single new ParserCache entry. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched
daniel added a comment. Note that our primary use case for usage tracking is purging the parser cache. So we need to track usages for every rendered version of a page that gets stored in the parser cache. When a page is changed or not, and what the revision timestamp is, is really not relevant. So, we can add tracking entries whenever something is cached. How do we purge stale entries? E.g. we are tracking entries for a rendering in fr, and then the page is re-rendered in fr, without a new revision being created (because a template has changed). We can now compare the new usages for fr with the old usages recorded in the database. But for usage entries that aren't in the new rendering, we don't know if they are used by a rendering for a different language, e.g. de (entity_usage does not track the target language). We can only know that based on eu_touched: if it's larger or equal to page_touched (which gets updated when the page gets re-parsed), then the records are still needed. If touched is smaller than touched, the entries are stale, and no current rendering uses them. So, as far as I can see, we can get rid of eu_touched only if we replace it with eu_cache_key. That would effectively mean tracking usage for each target language separately. That means a lot more rows in this table on multi-lingual projects like commons. For other projects, it would not make a difference. I kind of like the idea of having eu_cache_key in the table: it makes explicit the fact that we are tracking parser cache entries. TASK DETAIL https://phabricator.wikimedia.org/T124737 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs