[Analytics] purging old data from eventlogging db

2014-05-20 Thread Sean Pringle
Hi! I'd like to hear from stakeholders about purging old data from the eventlogging database. Yes, no, why [not], etc. I understand from Ori that there is a 90 day retention policy, and that purging has been discussed previously but not addressed for various reasons. Certainly there are many time

Re: [Analytics] purging old data from eventlogging db

2014-05-20 Thread Dario Taraborelli
On May 20, 2014, at 10:09 PM, Sean Pringle wrote: > Hi! > > I'd like to hear from stakeholders about purging old data from the > eventlogging database. Yes, no, why [not], etc. > > I understand from Ori that there is a 90 day retention policy, and that > purging has been discussed previously

Re: [Analytics] purging old data from eventlogging db

2014-05-21 Thread Ori Livneh
On Tue, May 20, 2014 at 10:36 PM, Dario Taraborelli < dtarabore...@wikimedia.org> wrote: > On May 20, 2014, at 10:09 PM, Sean Pringle wrote: > > Hi! > > I'd like to hear from stakeholders about purging old data from the > eventlogging database. Yes, no, why [not], etc. > > I understand from Ori t

Re: [Analytics] purging old data from eventlogging db

2014-05-21 Thread Nuria Ruiz
>Not to hijack the thread, but: to do this in the schema itself confuses the >structure of the data >with the mechanics of its use. I think having a couple of helpers in >JavaScript and PHP > for simple random sampling is sufficient. Much agree with ori here. We would be bloating schema with prop

Re: [Analytics] purging old data from eventlogging db

2014-05-21 Thread Dario Taraborelli
> The motivation behind your proposal is (I think) a desire to have a unified > configuration interface for data collection jobs. This makes total sense and > it's worth pursuing. I just don't think we should stuff everything into the > schema. The schema is just that: a schema. It's a data mode

Re: [Analytics] purging old data from eventlogging db

2014-05-27 Thread Sean Pringle
On Wed, May 21, 2014 at 5:03 PM, Ori Livneh wrote: > > On Tue, May 20, 2014 at 10:36 PM, Dario Taraborelli < > dtarabore...@wikimedia.org> wrote: > >> On May 20, 2014, at 10:09 PM, Sean Pringle >> wrote: >> > > >> *Existing schemas* would need to be audited on a case by case basis. >> > > By who

Re: [Analytics] purging old data from eventlogging db

2014-05-27 Thread Dario Taraborelli
On May 27, 2014, at 7:49 PM, Sean Pringle wrote: > On Wed, May 21, 2014 at 5:03 PM, Ori Livneh wrote: > > On Tue, May 20, 2014 at 10:36 PM, Dario Taraborelli > wrote: > On May 20, 2014, at 10:09 PM, Sean Pringle wrote: > > Existing schemas would need to be audited on a case by case basis.

Re: [Analytics] purging old data from eventlogging db

2014-05-27 Thread Nuria
Second Dario for NavigationTiming data. Before archiving it I would like us to have a project for processing it. Also, graphs directly query the EL data store in many instances. Removing the data would mean we will only be showing 90 days of data on dashboards, that will send many complaints o

Re: [Analytics] purging old data from eventlogging db

2014-05-28 Thread Dan Andreescu
I just announced this potential change in Scrum of Scrums and the Mobile team said they also would like to keep old data, but not for all of their schemas. They're cleaning up their graphs and we should check with them when we start deleting. On Wed, May 28, 2014 at 2:56 AM, Nuria wrote: > Sec

Re: [Analytics] purging old data from eventlogging db

2014-05-28 Thread Steven Walling
On Wed, May 28, 2014 at 10:50 AM, Dan Andreescu wrote: > I just announced this potential change in Scrum of Scrums and the Mobile > team said they also would like to keep old data, but not for all of their > schemas. They're cleaning up their graphs and we should check with them > when we start

Re: [Analytics] purging old data from eventlogging db

2014-05-29 Thread Aaron Halfaker
+1 to Dario's mention of the many schemas that just capture production DB stuff in a better way. Re. growth: Old growth experiment schemas continue to be a great resource for checking old work and sometimes even new hypotheses. When Dario and Kevin get around to us, I'll have a complete list of s

Re: [Analytics] purging old data from eventlogging db

2014-05-29 Thread Ori Livneh
On Wed, May 28, 2014 at 11:26 PM, Steven Walling wrote: > My main question is what the rationale is. Is it to improve query > performance on analytics dbs? > I imagine it will help, but it's probably not the primary reason. I imagine Sean would like to have the database in a state of equilibrium

Re: [Analytics] purging old data from eventlogging db

2014-05-29 Thread Sean Pringle
On Fri, May 30, 2014 at 3:28 PM, Ori Livneh wrote: > On Wed, May 28, 2014 at 11:26 PM, Steven Walling > wrote: > >> My main question is what the rationale is. Is it to improve query >> performance on analytics dbs? >> > > I imagine it will help, but it's probably not the primary reason. I > imag

Re: [Analytics] purging old data from eventlogging db

2014-05-29 Thread Nuria
I see, I thought concern was privacy rather than capacity. In that case we should put in our backlog an item to short out schemas and find the ones whose data can be deleted. I will file an item to this extent. In the future we hopefully have this metadata about the schema available somewhere.

Re: [Analytics] purging old data from eventlogging db

2014-05-30 Thread Aaron Halfaker
Nuria, I believe that Dario already did that[1]. 1. https://trello.com/c/F0DsiSXn/305-audit-historical-el-data-for-retention On Fri, May 30, 2014 at 1:33 AM, Nuria wrote: > I see, I thought concern was privacy rather than capacity. In that case we > should put in our backlog an item to short o

Re: [Analytics] purging old data from eventlogging db

2014-05-30 Thread Steven Walling
On Thu, May 29, 2014 at 11:03 PM, Sean Pringle wrote: > On Fri, May 30, 2014 at 3:28 PM, Ori Livneh wrote: > >> On Wed, May 28, 2014 at 11:26 PM, Steven Walling >> wrote: >> >>> My main question is what the rationale is. Is it to improve query >>> performance on analytics dbs? >>> >> >> I imagi