Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Gilles Dubuc
Right -- couldn't we just tag the URL? The event of the user actually viewing the image is completely disconnected from the URL hit in Media Viewer, which is why we need EL and can't rely on existing server logs. Eventlogging data currently does go to files, as well as to the DB. Great,

Re: [Analytics] Per-namespace pageview data from half a year ago

2015-01-07 Thread Oliver Keyes
On 8 January 2015 at 02:12, Gergo Tisza gti...@wikimedia.org wrote: On Wed, Jan 7, 2015 at 5:59 PM, Nuria Ruiz nu...@wikimedia.org wrote: Back when MediaViewer was launched, I added a namespace parameter to NavigationTiming to be able to track per-namespace pageviews, Navigation timing is

Re: [Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Gergo Tisza
On Wed, Jan 7, 2015 at 11:15 PM, Federico Leva (Nemo) nemow...@gmail.com wrote: Then you probably want something like https://stats.wikimedia.org/ EN/TablesWikipediaHU.htm#editor_activity_levels but with File namespace disaggregated from Other. I was looking for the number of edits; that's

Re: [Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Oliver Keyes
On 8 January 2015 at 02:31, Gergo Tisza gti...@wikimedia.org wrote: On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes oke...@wikimedia.org wrote: places to get edits? Wellthe revision table? I'm sort of confused as to what you're looking for, I guess, that the db wouldn't have. There are a

Re: [Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Gergo Tisza
On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes oke...@wikimedia.org wrote: places to get edits? Wellthe revision table? I'm sort of confused as to what you're looking for, I guess, that the db wouldn't have. There are a thousand or so wikis; it would be nice if there was a single table with

Re: [Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Federico Leva (Nemo)
Gergo Tisza, 08/01/2015 02:52: Even better if it can be filtered by the editcount of the user at the time of the edit. Then you probably want something like https://stats.wikimedia.org/EN/TablesWikipediaHU.htm#editor_activity_levels but with File namespace disaggregated from Other. Nemo

[Analytics] A good Samza article

2015-01-07 Thread Andrew Otto
http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/ http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/ ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] A good Samza article

2015-01-07 Thread Aaron Halfaker
Stateful Stream Processing /me drools On Wed, Jan 7, 2015 at 5:30 PM, Andrew Otto ao...@wikimedia.org wrote: http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/ ___ Analytics mailing list Analytics@lists.wikimedia.org

Re: [Analytics] Pageviews update

2015-01-07 Thread Aaron Halfaker
That's great and it will serve most of my use cases. Any chance we can get that field added to the sampled logs hourly counts? On Wed, Jan 7, 2015 at 5:40 PM, Nuria Ruiz nu...@wikimedia.org wrote: I am not sure if this is quite what you are asking but just in case: For streaming is probably

[Analytics] Pageviews update

2015-01-07 Thread Oliver Keyes
I'm pleased to say we now have the prototype pageviews definition as a UDF! For those with cluster access: CREATE TEMPORARY FUNCTION pageview as 'org.wikimedia.analytics.refinery.hive.isPageviewUDF'; ...and then just apply it. It outputs a boolean, so you can easily go WHERE is.Pageview(fields)

Re: [Analytics] Pageviews update

2015-01-07 Thread Nuria Ruiz
I am not sure if this is quite what you are asking but just in case: For streaming is probably easier for you to use the newly created webrequest tables: https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29 Those include an isPageview field so requests are

Re: [Analytics] Pageviews update

2015-01-07 Thread Andrew Otto
I am not sure if this is quite what you are asking but just in case: For streaming is probably easier for you to use the newly created webrequest tables: For Hadoop Streaming, it’ll be a little annoying. This new data is in Parquet. Hadoop Streaming is still using the old MapReduce 1 API,

Re: [Analytics] Pageviews update

2015-01-07 Thread Aaron Halfaker
Great! On Wed, Jan 7, 2015 at 5:49 PM, Andrew Otto ao...@wikimedia.org wrote: I am not sure if this is quite what you are asking but just in case: For streaming is probably easier for you to use the newly created webrequest tables: For Hadoop Streaming, it’ll be a little annoying. This

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Incident documentation updated: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150107-EventLogging On Wed, Jan 7, 2015 at 10:58 AM, Nuria Ruiz nu...@wikimedia.org wrote: Team: Issues on event logging have been solved, outage of client side events (did not affected server side

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Leila Zia
Thanks everyone for chiming in. Your comments were very helpful. :-) Nuria, I checked the per second pageview count for the pages wikigrok will be live on for 3 hours in 2015-01-07 (as a sample). We're talking about a total of ~170 events per sec for these pages. Of course major events can affect

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Ryan Kaldari
Who is actually maintaining the EventLogging Extension now? As far as I can tell, none of the members of the Analytics-EventLogging project in Phabricator are developers. This makes it hard to know who to ping when there is a problem. For example, this EL bug that I filed a month ago was never

[Analytics] Mail list in subscription

2015-01-07 Thread masssly
Please unsubscribe me from this mailing list.  Thank you.   -Masssly  Sent from Samsung Mobile___ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Dan Andreescu
Ryan - I'm sorry I was not aware of this. The Analytics team is responsible for Event Logging, and you can ping any of us if we're not paying attention to an issue. Christian has been largely taking care of EL by himself, and was kept quite busy with Event Logging reliability and the need to

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Aaron Halfaker
Leila, It might be worthwhile to merge that article set with the webrequest data we have in order to get a sense for how many pageloads/second to expect. -Aaron On Tue, Jan 6, 2015 at 7:50 PM, Ryan Kaldari rkald...@wikimedia.org wrote: The highest volume events we are going to log will be:

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Dario Taraborelli
On Jan 7, 2015, at 6:42 AM, Gilles Dubuc gil...@wikimedia.org wrote: Right -- couldn't we just tag the URL? The event of the user actually viewing the image is completely disconnected from the URL hit in Media Viewer, which is why we need EL and can't rely on existing server logs.

Re: [Analytics] Beta Labs EventLogging logs

2015-01-07 Thread Nuria Ruiz
Ahem they are there: nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log eventlogging_processor-client-side-events.log eventlogging_processor-server-side-events.log On Wed, Jan 7, 2015 at 12:57 PM, Ryan Kaldari rkald...@wikimedia.org wrote: It seems the EventLogging

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Kaldari: Expanding a bit to what Dan said: We took up EL from ori's basically 6 months ago. The operational support analytics provide is documented here: https://www.mediawiki.org/wiki/EventLogging/OperationalSupport EL has several parts and while we have not done much development on the mw

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Kevin Leduc
Hey Ryan, I put this bug on our agenda for our tasking meeting so we can scope it out and decide if we can commit to accomplishing it in the next sprint. On Wed, Jan 7, 2015 at 1:46 PM, Nuria Ruiz nu...@wikimedia.org wrote: Kaldari: Expanding a bit to what Dan said: We took up EL from ori's

Re: [Analytics] Beta Labs EventLogging logs

2015-01-07 Thread Ryan Kaldari
Ah, sorry, I was looking on the wrong server (deployment-bastion). Thanks! On Wed, Jan 7, 2015 at 1:21 PM, Nuria Ruiz nu...@wikimedia.org wrote: Ahem they are there: nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log eventlogging_processor-client-side-events.log

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Nuria Ruiz
Sorry, I send it too soon, trying again: We're talking about a total of ~170 events per sec for these pages. This is to high to log in 1:1 rate, we would need to do 1:10. At this time most events on EL logging log at a much lower rate, events over 1 per sec are the following, as you can see

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Ryan Kaldari
Thanks everyone for the research on this! I'll go ahead and create a card for implementing sampling on the high-throughput WikiGrok events. Kaldari On Wed, Jan 7, 2015 at 5:20 PM, Nuria Ruiz nu...@wikimedia.org wrote: Sorry, I send it too soon, trying again: We're talking about a total of

[Analytics] Per-namespace pageview data from half a year ago

2015-01-07 Thread Gergo Tisza
I would like to graph the correlation between file namespace page views and MediaViewer image views. Back when MediaViewer was launched, I added a namespace parameter to NavigationTiming to be able to track per-namespace pageviews, but I messed up and it only got deployed around the time

[Analytics] Per-namespace daily edit numbers

2015-01-07 Thread Gergo Tisza
I want to check what effect MediaViewer had on file namespace edits. Aggregating the standard MediaWiki dumps over all wikis seems like a pain; is there a more convenient source for that data? Even better if it can be filtered by the editcount of the user at the time of the edit. I looked at the

Re: [Analytics] WikiGrok and EventLogging

2015-01-07 Thread Dario Taraborelli
agreed. Many of these articles will see spikes in traffic during the test (as the sample includes many celebrity articles) but the historical volume of traffic for the whole sample should give us a decent estimate of the throughput. I also wouldn’t worry about any events other than

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Nuria Ruiz
I see. My main point was that -regardless of collection method- we might not need every single data point to calculate uniques. On Wed, Jan 7, 2015 at 10:38 AM, Toby Negrin tneg...@wikimedia.org wrote: Yes -- we disabled it because there wasn't a use case. We have one now :) On Wed, Jan 7,

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Nuria Ruiz
Team: Issues on event logging have been solved, outage of client side events (did not affected server side events) lasted about 12 hours. Please see: http://picpaste.com/Screen_Shot_2015-01-07_at_10.50.28_AM-NsMSPgHp.png Thanks, Nuria On Wed, Jan 7, 2015 at 3:57 AM, Christian Aistleitner

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Toby Negrin
I think Gilles and Erik want to calculate page views for GLAM mainly (although there are some other good reasons too) -- sampling would probably be ok but we'd miss the long tail of views. On Wed, Jan 7, 2015 at 10:56 AM, Nuria Ruiz nu...@wikimedia.org wrote: I see. My main point was that

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Dan Andreescu
I talked about this at Scrum of Scrums, and added this image to the notes I just sent out. I said we're leaning towards not backfilling and are willing to be convinced otherwise. We'll see what people say. On Wed, Jan 7, 2015 at 1:58 PM, Nuria Ruiz nu...@wikimedia.org wrote: Team: Issues on

Re: [Analytics] Only parts of EventLogging events getting written to the database since 2015-01-07 ~1:55

2015-01-07 Thread Toby Negrin
Folks -- thanks for owning this. One concern -- this is the second deployment related problem in the last couple of months. I'm concerned that we need to investigate more resources in a testing environment as well as a deployment checklist. I'm also considering having EL added to Greg's deployment

Re: [Analytics] Setting up eventlogging-devserver

2015-01-07 Thread Nuria Ruiz
Roxana: You are correct, the devserver is broken in vagrant at this time. However that doesn't mean you cannot instrument your code and see events on console. We shall try to have a patch for the devserver soon but, as I said, that should not block your development. Thanks, Nuria On Tue, Jan 6,

Re: [Analytics] Making EventLogging output to a log file instead of the DB

2015-01-07 Thread Toby Negrin
I'd also like us to consider routing this dataset to hadoop. I believe there is already an EL-Kafka pipeline and this would make it easy to integrate page views with our regular processing. Gilles -- are mobile page views included in your stream? -Toby On Wed, Jan 7, 2015 at 9:27 AM, Nuria Ruiz