Right -- couldn't we just tag the URL?
The event of the user actually viewing the image is completely disconnected
from the URL hit in Media Viewer, which is why we need EL and can't rely on
existing server logs.
Eventlogging data currently does go to files, as well as to the DB.
Great,
On 8 January 2015 at 02:12, Gergo Tisza gti...@wikimedia.org wrote:
On Wed, Jan 7, 2015 at 5:59 PM, Nuria Ruiz nu...@wikimedia.org wrote:
Back when MediaViewer was launched, I added a namespace parameter to
NavigationTiming to be able to track per-namespace pageviews,
Navigation timing is
On Wed, Jan 7, 2015 at 11:15 PM, Federico Leva (Nemo) nemow...@gmail.com
wrote:
Then you probably want something like https://stats.wikimedia.org/
EN/TablesWikipediaHU.htm#editor_activity_levels but with File namespace
disaggregated from Other.
I was looking for the number of edits; that's
On 8 January 2015 at 02:31, Gergo Tisza gti...@wikimedia.org wrote:
On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes oke...@wikimedia.org wrote:
places to get edits? Wellthe revision table? I'm sort of confused
as to what you're looking for, I guess, that the db wouldn't have.
There are a
On Wed, Jan 7, 2015 at 6:26 PM, Oliver Keyes oke...@wikimedia.org wrote:
places to get edits? Wellthe revision table? I'm sort of confused
as to what you're looking for, I guess, that the db wouldn't have.
There are a thousand or so wikis; it would be nice if there was a single
table with
Gergo Tisza, 08/01/2015 02:52:
Even better if it can be filtered by the editcount of the user at the
time of the edit.
Then you probably want something like
https://stats.wikimedia.org/EN/TablesWikipediaHU.htm#editor_activity_levels
but with File namespace disaggregated from Other.
Nemo
http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/
http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/
___
Analytics mailing list
Analytics@lists.wikimedia.org
Stateful Stream Processing
/me drools
On Wed, Jan 7, 2015 at 5:30 PM, Andrew Otto ao...@wikimedia.org wrote:
http://thenewstack.io/apache-samza-linkedins-framework-for-stream-processing/
___
Analytics mailing list
Analytics@lists.wikimedia.org
That's great and it will serve most of my use cases. Any chance we can get
that field added to the sampled logs hourly counts?
On Wed, Jan 7, 2015 at 5:40 PM, Nuria Ruiz nu...@wikimedia.org wrote:
I am not sure if this is quite what you are asking but just in case:
For streaming is probably
I'm pleased to say we now have the prototype pageviews definition as a UDF!
For those with cluster access:
CREATE TEMPORARY FUNCTION pageview as
'org.wikimedia.analytics.refinery.hive.isPageviewUDF';
...and then just apply it. It outputs a boolean, so you can easily go
WHERE is.Pageview(fields)
I am not sure if this is quite what you are asking but just in case:
For streaming is probably easier for you to use the newly created
webrequest tables:
https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Hive#Webrequest_Table.28s.29
Those include an isPageview field so requests are
I am not sure if this is quite what you are asking but just in case:
For streaming is probably easier for you to use the newly created webrequest
tables:
For Hadoop Streaming, it’ll be a little annoying. This new data is in Parquet.
Hadoop Streaming is still using the old MapReduce 1 API,
Great!
On Wed, Jan 7, 2015 at 5:49 PM, Andrew Otto ao...@wikimedia.org wrote:
I am not sure if this is quite what you are asking but just in case:
For streaming is probably easier for you to use the newly created
webrequest tables:
For Hadoop Streaming, it’ll be a little annoying. This
Incident documentation updated:
https://wikitech.wikimedia.org/wiki/Incident_documentation/20150107-EventLogging
On Wed, Jan 7, 2015 at 10:58 AM, Nuria Ruiz nu...@wikimedia.org wrote:
Team:
Issues on event logging have been solved, outage of client side events
(did not affected server side
Thanks everyone for chiming in. Your comments were very helpful. :-)
Nuria, I checked the per second pageview count for the pages wikigrok will
be live on for 3 hours in 2015-01-07 (as a sample). We're talking about a
total of ~170 events per sec for these pages. Of course major events can
affect
Who is actually maintaining the EventLogging Extension now? As far as I can
tell, none of the members of the Analytics-EventLogging project in
Phabricator are developers. This makes it hard to know who to ping when
there is a problem. For example, this EL bug that I filed a month ago was
never
Please unsubscribe me from this mailing list.
Thank you.
-Masssly
Sent from Samsung Mobile___
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
Ryan - I'm sorry I was not aware of this. The Analytics team is
responsible for Event Logging, and you can ping any of us if we're not
paying attention to an issue.
Christian has been largely taking care of EL by himself, and was kept quite
busy with Event Logging reliability and the need to
Leila,
It might be worthwhile to merge that article set with the webrequest data
we have in order to get a sense for how many pageloads/second to expect.
-Aaron
On Tue, Jan 6, 2015 at 7:50 PM, Ryan Kaldari rkald...@wikimedia.org wrote:
The highest volume events we are going to log will be:
On Jan 7, 2015, at 6:42 AM, Gilles Dubuc gil...@wikimedia.org wrote:
Right -- couldn't we just tag the URL?
The event of the user actually viewing the image is completely disconnected
from the URL hit in Media Viewer, which is why we need EL and can't rely on
existing server logs.
Ahem they are there:
nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log
eventlogging_processor-client-side-events.log
eventlogging_processor-server-side-events.log
On Wed, Jan 7, 2015 at 12:57 PM, Ryan Kaldari rkald...@wikimedia.org
wrote:
It seems the EventLogging
Kaldari:
Expanding a bit to what Dan said:
We took up EL from ori's basically 6 months ago. The operational support
analytics provide is documented here:
https://www.mediawiki.org/wiki/EventLogging/OperationalSupport
EL has several parts and while we have not done much development on the mw
Hey Ryan, I put this bug on our agenda for our tasking meeting so we can
scope it out and decide if we can commit to accomplishing it in the next
sprint.
On Wed, Jan 7, 2015 at 1:46 PM, Nuria Ruiz nu...@wikimedia.org wrote:
Kaldari:
Expanding a bit to what Dan said:
We took up EL from ori's
Ah, sorry, I was looking on the wrong server (deployment-bastion). Thanks!
On Wed, Jan 7, 2015 at 1:21 PM, Nuria Ruiz nu...@wikimedia.org wrote:
Ahem they are there:
nuria@deployment-eventlogging02:/var/log/upstart$ ls eventlogging_*log
eventlogging_processor-client-side-events.log
Sorry, I send it too soon, trying again:
We're talking about a total of ~170 events per sec for these pages.
This is to high to log in 1:1 rate, we would need to do 1:10. At this time
most events on EL logging log at a much lower rate, events over 1 per sec
are the following, as you can see
Thanks everyone for the research on this! I'll go ahead and create a card
for implementing sampling on the high-throughput WikiGrok events.
Kaldari
On Wed, Jan 7, 2015 at 5:20 PM, Nuria Ruiz nu...@wikimedia.org wrote:
Sorry, I send it too soon, trying again:
We're talking about a total of
I would like to graph the correlation between file namespace page views and
MediaViewer image views. Back when MediaViewer was launched, I added a
namespace parameter to NavigationTiming to be able to track per-namespace
pageviews, but I messed up and it only got deployed around the time
I want to check what effect MediaViewer had on file namespace edits.
Aggregating the standard MediaWiki dumps over all wikis seems like a pain;
is there a more convenient source for that data? Even better if it can be
filtered by the editcount of the user at the time of the edit.
I looked at the
agreed. Many of these articles will see spikes in traffic during the test (as
the sample includes many celebrity articles) but the historical volume of
traffic for the whole sample should give us a decent estimate of the throughput.
I also wouldn’t worry about any events other than
I see. My main point was that -regardless of collection method- we might
not need every single data point to calculate uniques.
On Wed, Jan 7, 2015 at 10:38 AM, Toby Negrin tneg...@wikimedia.org wrote:
Yes -- we disabled it because there wasn't a use case. We have one now :)
On Wed, Jan 7,
Team:
Issues on event logging have been solved, outage of client side events (did
not affected server side events) lasted about 12 hours.
Please see:
http://picpaste.com/Screen_Shot_2015-01-07_at_10.50.28_AM-NsMSPgHp.png
Thanks,
Nuria
On Wed, Jan 7, 2015 at 3:57 AM, Christian Aistleitner
I think Gilles and Erik want to calculate page views for GLAM mainly
(although there are some other good reasons too) -- sampling would probably
be ok but we'd miss the long tail of views.
On Wed, Jan 7, 2015 at 10:56 AM, Nuria Ruiz nu...@wikimedia.org wrote:
I see. My main point was that
I talked about this at Scrum of Scrums, and added this image to the notes I
just sent out. I said we're leaning towards not backfilling and are
willing to be convinced otherwise. We'll see what people say.
On Wed, Jan 7, 2015 at 1:58 PM, Nuria Ruiz nu...@wikimedia.org wrote:
Team:
Issues on
Folks -- thanks for owning this. One concern -- this is the second
deployment related problem in the last couple of months. I'm concerned that
we need to investigate more resources in a testing environment as well as a
deployment checklist. I'm also considering having EL added to Greg's
deployment
Roxana: You are correct, the devserver is broken in vagrant at this time.
However that doesn't mean you cannot instrument your code and see events on
console. We shall try to have a patch for the devserver soon but, as I
said, that should not block your development.
Thanks,
Nuria
On Tue, Jan 6,
I'd also like us to consider routing this dataset to hadoop. I believe
there is already an EL-Kafka pipeline and this would make it easy to
integrate page views with our regular processing.
Gilles -- are mobile page views included in your stream?
-Toby
On Wed, Jan 7, 2015 at 9:27 AM, Nuria Ruiz
36 matches
Mail list logo