Re: [Analytics] [Research-Internal] Revision history of deleted pages

2015-06-25 Thread Robert West
Ok, thanks for clarifying and this pointer, Aaron. Bob On Thu, Jun 25, 2015 at 3:20 PM, Aaron Halfaker wrote: > No way of searching the content of deleted pages. You can start with the > `archive` table. You might find that you can identify edits that add 'hoax' > templates by performing a rege

Re: [Analytics] [Research-Internal] Revision history of deleted pages

2015-06-25 Thread Aaron Halfaker
No way of searching the content of deleted pages. You can start with the `archive` table. You might find that you can identify edits that add 'hoax' templates by performing a regex match on `archive.ar_comment`. -Aaron On Thu, Jun 25, 2015 at 5:16 PM, Robert West wrote: > Thanks, Aaron! > > O

Re: [Analytics] [Research-Internal] Revision history of deleted pages

2015-06-25 Thread Robert West
Thanks, Aaron! On Thu, Jun 25, 2015 at 3:06 PM, Aaron Halfaker wrote: > Ahh yes. Sorry for not responding sooner. The best way to get deleted > article text is by getting the appropriate permission with a Wikimedia user > account and then using that account to hit the web API. E.g. > https://e

Re: [Analytics] [Research-Internal] Revision history of deleted pages

2015-06-25 Thread Aaron Halfaker
Ahh yes. Sorry for not responding sooner. The best way to get deleted article text is by getting the appropriate permission with a Wikimedia user account and then using that account to hit the web API. E.g. https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bdeletedrevisions The best

Re: [Analytics] [Research-Internal] Revision history of deleted pages

2015-06-25 Thread Leila Zia
Aaron, any chance you know the answer to this question? I have a vague memory that we talked about deleted pages and their text some time back. This data should live somewhere, right? given that deleted pages can be restored. Thanks, Leila On Wed, Jun 24, 2015 at 2:03 PM, Leila Zia wrote: > swi

Re: [Analytics] "If it didn't happen in HDFS, it didn't happen"

2015-06-25 Thread Dan Andreescu
Theoretically we should be able to request the Event Logging endpoint URI from anywhere. But I don't know how CORS is set up on that endpoint after this recent change. On Thu, Jun 25, 2015 at 1:01 PM, Oliver Keyes wrote: > Gotcha. And we can put EL on labs? > > On 25 June 2015 at 09:56, Dan And

Re: [Analytics] "If it didn't happen in HDFS, it didn't happen"

2015-06-25 Thread Oliver Keyes
Gotcha. And we can put EL on labs? On 25 June 2015 at 09:56, Dan Andreescu wrote: > Update on this: > > * Piwik is not finding a lot of love. The readership team is working on > puppetizing it and we theoretically have hardware to run it, but we haven't > decided it's a good idea for Analytics t

Re: [Analytics] "If it didn't happen in HDFS, it didn't happen"

2015-06-25 Thread Dan Andreescu
Update on this: * Piwik is not finding a lot of love. The readership team is working on puppetizing it and we theoretically have hardware to run it, but we haven't decided it's a good idea for Analytics to support this yet. * We're a (bit?) more optimistic about parallel Event Logging processors.

Re: [Analytics] Monthly compressed traffic delay

2015-06-25 Thread Dan Andreescu
FYI, the two places where people are talking about the new Pageview API that we are building are here: * Original bugzilla (now phabricator) ticket (yes the title no longer applies): https://phabricator.wikimedia.org/T44259 * Analytics list thread: https://lists.wikimedia.org/pipermail/analytics/2

Re: [Analytics] Pageview API Status update

2015-06-25 Thread Dan Andreescu
Two quick updates: What Oliver said resonates with us, we are doing everything possible to focus and keep the project moving instead of satisfying all possible requirements at launch. We have been working our goals (not yet finalized) to include "Pageview API by September". There is quite a bit

Re: [Analytics] analtyics-store eventlogging UNION queries

2015-06-25 Thread Dan Andreescu
Sean, fully agreed this is a problem. The way I see it, I don't think there's an easy solution. This query is made with a template that fills in different versions of an Event Logging schema. This is bad for more than just this performance reason: * when a new schema revision is implemented, ei