Re: [Analytics] Request for analytics data

2017-03-06 Thread Dan Andreescu
Dumps files add .m to the project short name as specified in the documentation. So en is english wikipedia and en.m is mobile web english wikipedia. You're right that the numbers for app access aren't there, but those relatively small. On Mon, Mar 6, 2017 at 9:50 PM, Jörg Jung wrote: > Dan, gu

Re: [Analytics] Request for analytics data

2017-03-06 Thread Jörg Jung
Dan, guys, me again. I crosschecked the numbers from for example pagecounts-2017-02-views-ge-5-totals.bz2 with the tools at tools.wmflabs.org (here for page "Falco"). It seems, that the dump only has "Desktop" numbers, not "Mobile Web" and "Mobile App" when it comes to the platform. Is

Re: [Analytics] web log data

2017-03-06 Thread Leila Zia
Hi Genevieve, This is Leila from Research. Thanks for reaching out. Access to non-public data through the Research team happens if we create a formal research collaboration with you and your team. Whether a formal collaboration can be created is a function of some requirements to be met [1] and o

Re: [Analytics] Request for analytics data

2017-03-06 Thread Jörg Jung
Ok, guys, thanx alot ! Am 06.03.2017 um 17:33 schrieb Dan Andreescu: > Jorg, the project abbreviations are explained in depth > here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews > > On Mon, Mar 6, 2017 at 11:15 AM, Jörg Jung > wrote: > > Yea

Re: [Analytics] Request for analytics data

2017-03-06 Thread Dan Andreescu
Jorg, the project abbreviations are explained in depth here: https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageviews On Mon, Mar 6, 2017 at 11:15 AM, Jörg Jung wrote: > Yeah, Dan, that will work, thanx. > > Just out of curiosity: Why are there three projects for "de" and what is > the diffe

Re: [Analytics] Request for analytics data

2017-03-06 Thread Jörg Jung
Yeah, Dan, that will work, thanx. Just out of curiosity: Why are there three projects for "de" and what is the difference between them ? /de/,/de.m/ and /de.zero/ Cheers, JJ Am 06.03.2017 um 15:45 schrieb Dan Andreescu: > Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ > w

Re: [Analytics] Request for analytics data

2017-03-06 Thread Dan Andreescu
Jorg, take a look at https://dumps.wikimedia.org/other/pagecounts-ez/ which has compressed data without losing granularity. You can get monthly files here and download a lot less data. On Mon, Mar 6, 2017 at 5:40 AM, Jörg Jung wrote: > Marcel, > > thanx for ur quick answer. > My main issue with

Re: [Analytics] web log data

2017-03-06 Thread Dan Andreescu
Hi Genevieve & Jelena, We have a process for working with external researchers, and it starts here: https://meta.wikimedia.org/wiki/Research:Access_to_non-public_data It certainly sounds like the data we have could help you. We have some requirements listed there and your project should get the

[Analytics] web log data

2017-03-06 Thread Genevieve Bartlett
Hi All - Emanuele Rocca suggested we reach out to you guys and see if you guys would be willing to share web log/content access data. Jelena and I are network security researchers at University of Southern California's Information Sciences Institute. We're working on a project for application-lev

Re: [Analytics] Request for analytics data

2017-03-06 Thread Jörg Jung
Marcel, thanx for ur quick answer. My main issue with dumps (or i don't get something) is: I need to download them first to be able to aggregate and filter. Which for the year 2016 would be: 40MB(middle) * 24h * 30d * 12m = about 350TB As i am not sitting directly at DE-CIX but in my private off

Re: [Analytics] Request for analytics data

2017-03-06 Thread Marcel Ruiz Forns
Hi Jörg, :] Do you mean top 250K most viewed *articles* in de.wikipedia.org? If so, I think you can get that from the dumps indeed. You can find 2016 hourly pageview stats by article for all wikis here: https://dumps.wikimedia.org/other/pageviews/2016/ Note that the wiki codes (first column) you