[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-02-01 Thread hoo
hoo added a comment.

I talked to Gabriel about plans to obsolete link tables and he told me, that 
there are no immediate plans to replace local link tables (which includes this 
table). Due to this, I will pick this up in the next days if @daniel is ok with 
that.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, 
Lydia_Pintscher, hoo, Izno, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-27 Thread hoo
hoo added a comment.

In that case I think we should pick this up soon. If my above analysis is 
correct and per the IRC discussion I had with @daniel yesterday, I don't think 
is very hard to do.

First step is to stop using `eu_touched`, then stop updating (touching) it and 
finally we can drop it.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, 
Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-27 Thread jcrespo
jcrespo added a subscriber: jcrespo.
jcrespo added a comment.

As per @hoo request, I am sharing my own opinion here:

This is blocking ROW based replication application, which should be rolled in 
soon. In my own opinion (database-focused) I think wikidata updates (this is 
one of them) are the worse mediawiki-related production cluster problem 
happening right now.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jcrespo
Cc: jcrespo, Aklapper, StudiesWorld, JanZerebecki, aude, daniel, 
Lydia_Pintscher, hoo, Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread hoo
hoo added a comment.

We talked about this on IRC some more and came to the conclusion that it would 
probably work without `eu_touched`.

It's probably not worth implementing that, as dependency graphs are upcoming 
anyway.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread hoo
hoo added a comment.

In https://phabricator.wikimedia.org/T124737#1966530, @daniel wrote:

> > I don't see why that would be the case. It seems to me you are under the 
> > misconception that we purge caches for individual languages, but we don't 
> > (and can't).
>
> True, we don't right now. But that is being worked on. Then again, once we 
> have proper dependency tracking of derived resources, that will itself 
> replace the entity_usage table (which is just tracking of entities by derived 
> resources).


I see that is planned, but I don't think that we should try to cater for that 
right now (also eu_touched would be a poor mean to handle that).

>> Once MediaWiki decides that the page caches need to get cleared, we throw 
>> away all entity usages rows (effectively, as we purge all that are older 
>> than wfTimestampNow()) and then insert the new usages of the (at that point) 
>> single new ParserCache entry.
> 
> Yes, that was my initial plan. But that means we have to be sure that the 
> re-rendering and putting-into-the-parser-cache happens after the save & 
> purge. Which, surprisingly, it often doesn't. In particular, ApiStashEdit 
> stores the new rendered version into the parser cache before it gets saved. 
> And it will do so even for previews that never get saved.

As discussed on IRC, I don't think that's so hard to get right and stashed 
edits aren't a problem here, as they don't trigger any of our hook handlers.
Keep in mind that core also does post edit updates for all of its tracking 
tables and it works at least reasonably well.

> Btw: conceptually, when tracking dependencies, we need to track the identity 
> of resource that depends. eu_cache_key would provide this, eu_touched is a 
> hacky way to get around that requirement.

Not necessary (if I understand you correctly). We just need to treat all 
existing and valid cached versions of a page as a single entity and we're 
conceptually ok. That doesn't make for an exhaustive list of "dependencies" 
(across all cached versions that could theoretically exist)… which might not be 
a very nice way to handle that, but it's enough for our use case.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread daniel
daniel added a comment.

Btw: conceptually, when tracking dependencies, we need to track the identity of 
resource that depends. eu_cache_key would provide this, eu_touched is a hacky 
way to get around that requirement.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread daniel
daniel added a comment.

> I don't see why that would be the case. It seems to me you are under the 
> misconception that we purge caches for individual languages, but we don't 
> (and can't).

True, we don't right now. But that is being worked on. Then again, once we have 
proper dependency tracking of derived resources, that will itself replace the 
entity_usage table (which is just tracking of entities by derived resources).

> Once MediaWiki decides that the page caches need to get cleared, we throw 
> away all entity usages rows (effectively, as we purge all that are older than 
> wfTimestampNow()) and then insert the new usages of the (at that point) 
> single new ParserCache entry.

Yes, that was my initial plan. But that means we have to be sure that the 
re-rendering and putting-into-the-parser-cache happens after the save & purge. 
Which, surprisingly, it often doesn't. In particular, ApiStashEdit stores the 
new rendered version into the parser cache before it gets saved. And it will do 
so even for previews that never get saved.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread hoo
hoo added a comment.

> So, as far as I can see, we can get rid of eu_touched only if we replace it 
> with eu_cache_key. That would effectively mean tracking usage for each target 
> language separately.

I don't see why that would be the case. It seems to me you are under the 
misconception that we purge caches for individual languages, but we don't (and 
can't). Once something relevant for one page changes, we need to throw away all 
cache entries for that page anyway.

What we already do right now (if you read my above analysis closely) is 
tracking all entity usages that are associated with a page (that can mean one 
ore more ParserCache entries with different cache keys and whatnot). We don't 
need to know which cache entry these keys belong to in the end, we just need to 
know which page we need to purge in case.

Once MediaWiki decides that the page caches need to get cleared, we throw away 
all entity usages rows (effectively, as we purge all that are older than 
`wfTimestampNow()`) and then insert the new usages of the (at that point) 
single new ParserCache entry.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T124737: [RfC] Implement usage tracking without eu_touched

2016-01-26 Thread daniel
daniel added a comment.

Note that our primary use case for usage tracking is purging the parser cache. 
So we need to track usages for every rendered version of a page that gets 
stored in the parser cache. When a page is changed or not, and what the 
revision timestamp is, is really not relevant.

So, we can add tracking entries whenever something is cached. How do we purge 
stale entries? E.g. we are tracking entries for a rendering in fr, and then the 
page is re-rendered in fr, without a new revision being created (because a 
template has changed). We can now compare the new usages for fr with the old 
usages recorded in the database. But for usage entries that aren't in the new 
rendering, we don't know if they are used by a rendering for a different 
language, e.g. de (entity_usage does not track the target language). We can 
only know that based on eu_touched: if it's larger or equal to page_touched 
(which gets updated when the page gets re-parsed), then the records are still 
needed. If touched is smaller than touched, the entries are stale, and no 
current rendering uses them.

So, as far as I can see, we can get rid of eu_touched only if we replace it 
with eu_cache_key. That would effectively mean tracking usage for each target 
language separately. That means a lot more rows in this table on multi-lingual 
projects like commons. For other projects, it would not make a difference.

I kind of like the idea of having  eu_cache_key in the table: it makes explicit 
the fact that we are tracking parser cache entries.


TASK DETAIL
  https://phabricator.wikimedia.org/T124737

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: Aklapper, StudiesWorld, JanZerebecki, aude, daniel, Lydia_Pintscher, hoo, 
Wikidata-bugs, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs