On 1/31/13, Max Semenik <maxsem.w...@gmail.com> wrote:
> A month ago, PageImages extension[1] was black-deployed, intended to
> automatically associate images with articles. It populates its data
> when LinksUpdate is run, i.e. when a page or templates it trascludes
> is edited or purged. Since then, most of pages were re-parsed, however
> slightly less than a million English WP articles remain:
>
> select count(*), avg(page_len) from page where page_namespace=0 and
> page_is_redirect=0 and page_touched < '20121229000000';
> +----------+---------------+
> | count(*) | avg(page_len) |
> +----------+---------------+
> |   977568 |     3172.0948 |
> +----------+---------------+
> 1 row in set (5 min 59.55 sec)
[..]

You do realize that page_touched gets updated by a bunch of things,
many of which do not cause a LinksUpdate to happen? So running the
script as you proposed will not populate the table for all data.

Of course there really isn't any way to figure out when the last
LinksUpdate happened, so I suppose page_touched is as close as we can
get. I guess in most cases if something has had its page_touched
updated by a non-LinksUpdate event, that probably means people
actually look at the article, so someone has or will probably edit the
article soon.

--bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to