Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Dan Andreescu
s Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject: *Re: [Analytics] Echo schema eventlogging > > > > On Wed, Mar 2, 2016 at 9:34 AM, Neil P. Quinn > wrote: > >> *Schema:Edit contains no useful information that isn't already i

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Dan Andreescu
K, I'll delete Schema:Edit:) just kiddingOk so we will just set the policy for Schema:Echo to purge after 90 days, so the data will delete itself and give yall time to do any last queries you might want.

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Roan Kattouw
On Wed, Mar 2, 2016 at 9:34 AM, Neil P. Quinn wrote: > *Schema:Edit contains no useful information that isn't already in the >> database apart from which button people use to thank each other,* > > > I assume you mean Schema:Echo? :) > YES. Yes. ECHO, not Edit. I saw myself make this mistake in

Re: [Analytics] Echo schema eventlogging

2016-03-02 Thread Neil P. Quinn
> > *Schema:Edit contains no useful information that isn't already in the > database apart from which button people use to thank each other,* I assume you mean Schema:Echo? :) On Tue, Mar 1, 2016 at 11:58 PM, Roan Kattouw wrote: > [Reviving old thread] > > I was looking at our EventLogging dat

Re: [Analytics] Echo schema eventlogging

2016-03-01 Thread Roan Kattouw
[Reviving old thread] I was looking at our EventLogging data today, and discovered that Schema:Edit contains no useful information that isn't already in the database apart from which button people use to thank each other, and if we really care about that we can measure it separately without produc

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Nuria Ruiz
>I think in this case moving all of the data to Hadoop and blacklisting it from the mysql inserter seems like the right thing to do. >I agree. We should implement partial auto-purging in Hadoop though. In the Echo schema some fields should still be purged. Right, being able to move all this data to

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Marcel Ruiz Forns
> > Sure, it doesn't have space problems, but the problem remains that with a > table this large, it's impossible to query and get results in our lifetime. I see, makes sense. I think in this case moving all of the data to Hadoop and blacklisting it > from the mysql inserter seems like the right

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Dan Andreescu
> > Just spoke with Jaime Crespo and he confirmed that: > >- m4-master (master EL database) only holds events for the last 45 >days to avoid space problems. That's for all tables including Echo. > >- analytics-storage is the replica that keeps the historical data and >is meant to ap

Re: [Analytics] Echo schema eventlogging

2015-12-16 Thread Marcel Ruiz Forns
Just spoke with Jaime Crespo and he confirmed that: - m4-master (master EL database) only holds events for the last 45 days to avoid space problems. That's for all tables including Echo. - analytics-storage is the replica that keeps the historical data and is meant to apply the specif

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Jonathan Morgan
Hi Nuria, FWIW: Although I'm not using this right now, but I could see it being useful for understanding the impact of new notification updates that are coming down the pike.[1][2] What are the costs involved in keeping this schema up? Best, J 1. https://meta.wikimedia.org/wiki/Research:Cross-w

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Andrew Otto
We could blacklist this schema from the mysql database, and still keep producing it. It would be available in Hadoop either way. > On Dec 15, 2015, at 12:22, Jonathan Morgan wrote: > > Hi Nuria, > > FWIW: Although I'm not using this right now, but I could see it being useful > for understan

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
>What are the costs involved in keeping this schema up? Well, usage of database space in a not so smart manner (huge tables that become unquery-able basically). This table is now 9G and doesn't look like anyone is looking at this data. On Tue, Dec 15, 2015 at 9:22 AM, Jonathan Morgan wrote: >

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Aaron Halfaker
No! Please do not nuke old data. +1 to J-Mo. This will probably be useful for long-term studies of notifications. If I had the time, I'd pick it up right now based on this reminder! I'm happy with having historical data preserved (please makes sure that it is) and the MySQL table dropped until

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
>maybe we could nuke data that's more than a year old (or 6 months old or something) from mysql? With eventlogging data we "normally" drop data that is older than 90 days, will this work? Thanks for the prompt response. On Tue, Dec 15, 2015 at 11:27 AM, Roan Kattouw wrote: > If the data is goi

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Madhumitha Viswanathan
I want to mention that data in Hadoop is only available from Aug 27th 2015. Older data is only available in mysql. On Tue, Dec 15, 2015 at 11:27 AM, Roan Kattouw wrote: > If the data is going to be retained but would just become harder to query > (i.e. still in Hadoop but not in mysql), maybe we

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Roan Kattouw
If the data is going to be retained but would just become harder to query (i.e. still in Hadoop but not in mysql), maybe we could nuke data that's more than a year old (or 6 months old or something) from mysql? On Tue, Dec 15, 2015 at 9:35 AM, Andrew Otto wrote: > We could blacklist this schema

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Jonathan Morgan
Hi Nuria! Speaking for *my own particular scenario*, that solution sounds like it will be fine, since I don't plan on immediately performing research with these data. But it's obviously still the Collab team's call here--they likely have needs I know nothing about. Cc'ing Joe Matazzoni in case he

Re: [Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
>We could blacklist this schema from the mysql database, and still keep producing it. It would be available in Hadoop either way. Right but I would also like to drop the table if it is not being used, if data is not going to be looked at soonish there is no point in storing as it will likely be d

[Analytics] Echo schema eventlogging

2015-12-15 Thread Nuria Ruiz
Roan: The data for Echo schema(https://meta.wikimedia.org/wiki/Schema:Echo) is quite large and we are not sure is even used. Can you confirm either way? If it is no longer used we will stop collecting it. Thanks, Nuria ___ Analytics mailing list Anal