Bjoern Hoehrmann wrote: > When making http://katograph.appspot.com/ which renders the german Wiki- > pedia category system as an interactive "treemap" based on information > like number of articles in them and requests during a 3 day period, I > found that the proxy logs used for stats.grok.se are rather unreliable, > with many of the "top" pages being inplausible (articles on not very > notable subjects that have existed only for a very short time show up in > the top ten, for instance). On http://stats.grok.se/en/top you can see > this aswell, 40 million views for `Special:Export/Robert L. Bradley, Jr` > is rather implausible, as far as human users are concerned.
Yes, the data is susceptible to manipulation, both intentional and unintentional. As I said, this was a first-pass implementation on Domas' part. As far as I know, this hasn't been touched by anyone in years. You're absolutely correct that, at the end of the day, until the data itself is better (more reliable), the resulting tools/graphs/scripts/everything that rely on it will be bound by its limitations. > MZMcBride wrote: >> Is it worth a Toolserver user's time to try to create a database of >> per-project, per-page page view statistics? Is it worth a grant from the >> Wikimedia Foundation to have someone work on this? Is it worth trying to >> convince Wikimedia Deutschland to assign resources? And, of course, it >> wouldn't be a bad idea if Domas' first-pass implementation was improved on >> Wikimedia's side, regardless. > > The data that powers stats.grok.se is available for download, it should > be rather trivial to feed it into toolserver databases and query it as > desired, ignoring performance problems. Not simply performance. It's a lot of data and it needs to be indexed. That has a real cost. There are also edge cases and corner cases (different encodings of requests, etc.) that need to be accounted for. It's not a particularly small undertaking, if it's to be done properly. MZMcBride _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l