Re: [Analytics] [Offline-l] Fwd: Reasons you use the XML dumps or want to, but can't?

2015-02-24 Thread Emmanuel Engelhart
Hi Thank you Nemo for adverting that interesting page about how to improve Wikimedia dumping processes. This topic is of course a primary concern for the Kiwix developer team. Here my contribution:

Re: [Analytics] [Offline-l] Fwd: Reasons you use the XML dumps or want to, but can't?

2015-02-24 Thread Andrew Otto
I also added some Hadoop based used cases to that document. https://www.mediawiki.org/w/index.php?title=Wikimedia_MediaWiki_Core_Team%2FBacklog%2FImprove_dumpsdiff=1422073oldid=1421455 On Feb 21, 2015, at 05:03, Emmanuel Engelhart kel...@kiwix.org wrote: Hi Thank you Nemo for adverting

Re: [Analytics] more public discussions, new tags

2015-02-24 Thread Paul J. Weiss
I think this is a great idea! At 2015-02-23 08:33 AM, you wrote: Dear List, We, the WMF Analytics team, want to bring more of our internal discussions public. We benefit tremendously from everyone who participates on this list and want to have as much transparency as possible into what

[Analytics] [Technical] further pageviews QA work

2015-02-24 Thread Oliver Keyes
After the discovery of the duplication problem I reran the comparative analysis of pageviews implementations. The result: there really isn't a difference![0] Actually I had to generate a plot with jittered lines just to be able to /find/ one of them.[1] Thanks to Christian and Otto for the

Re: [Analytics] Provenance Params

2015-02-24 Thread Adam Baso
Hi Nemo - I think the concern was that it might be the case that the 'title' parameter may be at the end of the URL, and the 'title' parameter could in principle support a value with forward slashes potentially indistinguishable from the string in option #2. Of course, regular expressions can make

Re: [Analytics] Provenance Params

2015-02-24 Thread Gergo Tisza
On Tue, Feb 24, 2015 at 3:48 PM, Nuria Ruiz nu...@wikimedia.org wrote: 2. What about caching? Is this page:* http://wikipedia.org/BarackObama?some_param=some-value http://wikipedia.org/BarackObama?some_param=some-value* being served from the cache as it should be? The file download

Re: [Analytics] Monthly compressed traffic delay

2015-02-24 Thread Erik Zachte
Michael, a quick heads-up: So I finally found the time to look into this. Sorry that it took so long. https://phabricator.wikimedia.org/T90230 Bug has been analyzed and fixed. The underlying problem is a record in an hourly pageview dump with empty title. My script now patches such

Re: [Analytics] Provenance Params

2015-02-24 Thread Nuria Ruiz
If there’s no other objection, we can safely fold this under the discussion of long-term options and go ahead with the proposed implementation, per Dan. I think there are some technical issues to be ironed right? 1. How are we doing so a request like:

Re: [Analytics] Provenance Params

2015-02-24 Thread Dario Taraborelli
it sounds like we have consensus for a short-term solution based on a vanilla parameter, as long as it doesn’t clash with other internal parameters. I agree with Gergo that a shortener is appealing as a long-term solution, this is what the vast majority of platforms are using for analytics