cc-ing our friends in research and wikitech (sorry I forgot initially)
We're happy to announce a few improvements to Analytics data releases on > dumps.wikimedia.org: > > * We are releasing a new dataset, an estimate of Unique Devices accessing > our projects [1] > * We are officially making available a better Pageviews dataset [2] > * We are deprecating two older pageview statistics datasets > * We moved Analytics data from /other to /analytics [3] > > Details follow: > > > *Unique Devices:* Since 2009, the Wikimedia Foundation used comScore to > report data about unique web visitors. In January 2016, however, we > decided to stop reporting comScore numbers [4] because of certain > limitations in the methodology, these limitations translated into > misreported mobile usage. We are now ready to replace comscore numbers with > the Unique Devices Dataset [5][1]. While unique devices does not equal > unique visitors, it is a good proxy for that metric, meaning that a major > increase in the number of unique devices is likely to come from an increase > in distinct users. We understand that counting uniques raises fairly big > privacy concerns and we use a very private conscious way to count unique > devices, it does not include any cookie by which your browser history can > be tracked [6]. > > We invite you to explore this new dataset and hope it’s helpful for the > Wikimedia community in better understanding our projects. This data can > help measurethe reach of wikimedia projects on the web. > > *Pageviews:* This [2] is the best quality data available for counting the > number of pageviews our projects receive at the article and project level. > We've upgraded from pagecounts-raw to pagecounts-all-sites, and now to > pageviews, in order to filter out more spider traffic and measure something > closer to what we think is a real user viewing content. A short history > might be useful: > > * pagecounts-raw: was maintained by Domas Mituzas originally and taken > over by the analytics team. It was and still is the most used dataset, > though it has some majore problems. It does not count access to the mobile > site, it does not filter out spider or bot traffic, and it suffers from > unknown loss due to logging infrastructure limitations. > * pagecounts-all-sites: uses the same pageview definition as > pagecounts-raw, and so also does not filter out spider or bot traffic. But > it does include access to mobile and zero sites, and is built on a more > reliable logging infrastructure. > * pagecounts-ez: is derived from the best data available at the time. > So until December 2015, it was based on pagecounts-raw and > pagecounts-all-sites, and now it's based on pageviews. This dataset is > great because it compresses very large files without losing any > information, still providing hourly page and project level statistics. > > So the new dataset, pageviews, is what's behind our pageview API and is > now available in static files for bulk download back to May 2015. But the > multiple ways to download pageview data is confusing for consumers, so > we're keeping only pageviews and pagecounts-ez and deprecating the other > two. If you'd like to read more about the current pageview definition, > details are on the research page [7]. > > *Deprecating:* We are deprecating the pagecounts-raw and > pagecounts-all-sites datasets in May 2016 (discussion here: > https://phabricator.wikimedia.org/T130656 ). This data suffers from many > artifacts, lack of mobile data, and/or infrastructure problems, and so is > not comparable to the new way we track pageviews. It will remain here > because we have historical data that may be useful, but it will not be > maintained or updated beyond May 2016. > > *Clean-up:* Analytics data on dumps was crammed into /other with > unrelated datasets. We made a new page to receive current and future > datasets [3] and linked to it from /other and /. Please let us know if > anything there looks confusing or opaque and I'll be happy to clarify. > > > [1] http://dumps.wikimedia.org/other/unique_devices > [2] http://dumps.wikimedia.org/other/pageviews > [3] http://dumps.wikimedia.org/analytics/ > [4] https://meta.wikimedia.org/wiki/ComScore/Announcement > [5] https://meta.wikimedia.org/wiki/Research:Unique_Devices > [6] > https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_unique_devices.3F > [7] https://meta.wikimedia.org/wiki/Research:Page_view > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l