Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Marcel Ruiz Forns
Just spoke with Jaime Crespo and he confirmed that: - m4-master (master EL database) only holds events for the last 45 days to avoid space problems. That's for all tables including Echo. - analytics-storage is the replica that keeps the historical data and is meant to apply the specif

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Dan Andreescu
> > Just spoke with Jaime Crespo and he confirmed that: > >- m4-master (master EL database) only holds events for the last 45 >days to avoid space problems. That's for all tables including Echo. > >- analytics-storage is the replica that keeps the historical data and >is meant to ap

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Marcel Ruiz Forns
> > Sure, it doesn't have space problems, but the problem remains that with a > table this large, it's impossible to query and get results in our lifetime. I see, makes sense. I think in this case moving all of the data to Hadoop and blacklisting it > from the mysql inserter seems like the right

[Analytics] [Outage] Small data loss in raw_webrequest on 2015-12-15

2015-12-16 Thread Marcel Ruiz Forns
Hi Analytics, Yesterday, Dec 15, during the course of 1 hour (17h to 18h UTC) there was an irrecoverable raw_webrequest data loss of ~30%: 25.6% (misc), 19.5% (mobile), 19.1% (text), 39.1% (upload). This represents around 1% of the data for that day. The loss was due to the enabling of IPSec, whi

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Nuria Ruiz
>I think in this case moving all of the data to Hadoop and blacklisting it from the mysql inserter seems like the right thing to do. >I agree. We should implement partial auto-purging in Hadoop though. In the Echo schema some fields should still be purged. Right, being able to move all this data to

[Analytics] Goals of analytics team for next quarter

2015-12-16 Thread Nuria Ruiz
Hello! The final goals of the analytics team for next quarter are published: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Analytics TL:DR: We are going to mostly work on replacing reports on stats.wikimedia.org plus do operational maintenance and upgrades on the analytic

[Analytics] Top Edits/Views in 2015 per project?

2015-12-16 Thread Itzik - Wikimedia Israel
Hi, I see that the (amazing!) API still can't give us results for the whole 2015. So any way we can get this pages views per project? And also, the most edited articles in 2015 per project? This can be a great PR information for the communication representatives around to world to release to loca

Re: [Analytics] Top Edits/Views in 2015 per project?

2015-12-16 Thread Dan Andreescu
Itzik, The way we're computing top pageviews right now doesn't scale very well, we aren't even able to properly do monthly top pages. So we opened this issue: https://phabricator.wikimedia.org/T120113. When we fix that, it's possible we'll be able to get yearly top pages too, but I'm not promisi

Re: [Analytics] Top Edits/Views in 2015 per project?

2015-12-16 Thread Itzik - Wikimedia Israel
Thank you Dan. It worth to mention that the data, from a press perspective, is relevant only if we will have him in the next week or so.. :)) *Regards,Itzik Edri* Chairperson, Wikimedia Israel +972-(0)-54-5878078 | http://www.wikimedia.org.il Imagine a world in which every single human being ca

Re: [Analytics] Goals of analytics team for next quarter

2015-12-16 Thread Federico Leva (Nemo)
Nuria Ruiz, 16/12/2015 18:42: TL:DR: We are going to mostly work on replacing reports on stats.wikimedia.org Do you mean traffic reports? See also https://phabricator.wikimedia.org/T107175#1498819 and edit the task summary there please. Is https://phabricator.wi

Re: [Analytics] Goals of analytics team for next quarter

2015-12-16 Thread Nuria Ruiz
>Do you mean traffic reports? Yes, as in what pertains to browser data, sorry, that should have been more clear. >See also https://phabricator.wikimedia.org/T107175#1498819 and edit the task summary there please. >Is https://phabricator.wikimedia.org/T118329 a duplicate? The parent task is much t

[Analytics] Inconsistent user IDs between EventLogging and main database

2015-12-16 Thread Neil P. Quinn
While doing some analysis, I found a strange inconsistency. The ServerSideAccountCreation event logs show a user with the ID *26048397* and the name *Dhava2nd* being created on the English Wikipedia at 2015-08-20 01:38:57. (Those l

Re: [Analytics] Inconsistent user IDs between EventLogging and main database

2015-12-16 Thread Federico Leva (Nemo)
Neil P. Quinn, 16/12/2015 21:40: Does anyone know what's going on? Is this issue documented anywhere? There is already at least one phabricator report IIRC. Nemo ___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/ma