Bjoern Hoehrmann wrote:
> When making http://katograph.appspot.com/ which renders the german Wiki-
> pedia category system as an interactive "treemap" based on information
> like number of articles in them and requests during a 3 day period, I
> found that the proxy logs used for stats.grok.se are rather unreliable,
> with many of the "top" pages being inplausible (articles on not very
> notable subjects that have existed only for a very short time show up in
> the top ten, for instance). On http://stats.grok.se/en/top you can see
> this aswell, 40 million views for `Special:Export/Robert L. Bradley, Jr`
> is rather implausible, as far as human users are concerned.

Yes, the data is susceptible to manipulation, both intentional and
unintentional. As I said, this was a first-pass implementation on Domas'
part. As far as I know, this hasn't been touched by anyone in years. You're
absolutely correct that, at the end of the day, until the data itself is
better (more reliable), the resulting tools/graphs/scripts/everything that
rely on it will be bound by its limitations.

> MZMcBride wrote:
>> Is it worth a Toolserver user's time to try to create a database of
>> per-project, per-page page view statistics? Is it worth a grant from the
>> Wikimedia Foundation to have someone work on this? Is it worth trying to
>> convince Wikimedia Deutschland to assign resources? And, of course, it
>> wouldn't be a bad idea if Domas' first-pass implementation was improved on
>> Wikimedia's side, regardless.
> 
> The data that powers stats.grok.se is available for download, it should
> be rather trivial to feed it into toolserver databases and query it as
> desired, ignoring performance problems.

Not simply performance. It's a lot of data and it needs to be indexed. That
has a real cost. There are also edge cases and corner cases (different
encodings of requests, etc.) that need to be accounted for. It's not a
particularly small undertaking, if it's to be done properly.

MZMcBride



_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to