Re: [Analytics] [Technical] parsing error in the pageview dumps

2015-03-10 Thread Oliver Keyes
Bah; belay that. Chalk it up to spending too long trying to turn the project names into something human ;). The files are MEANT to include en.zero et al (I'm not entirely sure why those are being split out - presumably it was a request at some point). On 11 March 2015 at 00:50, Oliver Keyes wrote

[Analytics] [Technical] parsing error in the pageview dumps

2015-03-10 Thread Oliver Keyes
Hey, This may be a known, but just in case it isn't; the pageview dumps at http://dumps.wikimedia.org/other/pagecounts-all-sites/ are meant to follow the spec set out at http://dumps.wikimedia.org/other/pagecounts-all-sites/README.txt Instead, it appears that for (presumably, zero-rated) requests

[Analytics] Fwd: [Engineering] Wikimedia REST content API is now available in beta

2015-03-10 Thread Dario Taraborelli
Cross-posting from wikitech-l, this will definitely be of interest to those of you on this list who work with our APIs. Begin forwarded message: > From: Gabriel Wicke > Date: March 10, 2015 at 15:23:03 PDT > To: Wikimedia developers , > wikitech-ambassd...@lists.wikimedia.org, Development and

Re: [Analytics] Provenance Params

2015-03-10 Thread Bernd Sitzmann
Sounds good to me. On Mar 10, 2015 5:58 PM, "Adam Baso" wrote: > wprov didn't seem to show up as a parameter in looking at the query field > on an hour of logs on en.m.wikipedia.org via Hadoop, so I think we're > okay there. > > As for that additional data point, that's a good idea. Bernd, Dmitry

Re: [Analytics] Provenance Params

2015-03-10 Thread Adam Baso
wprov didn't seem to show up as a parameter in looking at the query field on an hour of logs on en.m.wikipedia.org via Hadoop, so I think we're okay there. As for that additional data point, that's a good idea. Bernd, Dmitry, how about we do: sfi (image) and sft (text) ? -Adam On Tue, Mar 10, 20

Re: [Analytics] Provenance Params

2015-03-10 Thread Dario Taraborelli
On Mar 10, 2015, at 11:26 AM, Adam Baso wrote: > > We're going to use the following format: > > ?wprov=<3_char_feature> > > For the first version on iOS, this will be > > ?wprov=safi1 > > And Android: > > ?wprov=safa1 Thanks for the closing the loop on this. Dan, Adam – have you guys consid

[Analytics] index.html for dumps.wikimedia.org [was: Re: [Technical] inaccuracy in our pageview dump documentation]

2015-03-10 Thread Christian Aistleitner
Hi Timo, On Tue, Mar 10, 2015 at 09:46:53PM +0100, Timo Tijhof wrote: > Is that in public version control somewhere? The real documentation is under revision control (through wikitech). As explained in the first section of the README, that README is just a pointer to the authorative Documentatio

Re: [Analytics] [Technical] inaccuracy in our pageview dump documentation

2015-03-10 Thread Timo Tijhof
Is that in public version control somewhere? Assuming not, is there a path towards that? While I don't mind so much the README, I'm more concerned about the landing page at http://dumps.wikimedia.org/ which is quite dated and would benefit from being in public version control so that maintainer

Re: [Analytics] Provenance Params

2015-03-10 Thread Bernd Sitzmann
Is wprov only used by the apps? On Mar 10, 2015 12:59 PM, "Gergo Tisza" wrote: > On Tue, Mar 10, 2015 at 11:26 AM, Adam Baso wrote: > >> We're going to use the following format: >> >> ?wprov=<3_char_feature> >> > Don't forget to document this publicly once it is deployed. > https://www.mediawiki

Re: [Analytics] Provenance Params

2015-03-10 Thread Gergo Tisza
On Tue, Mar 10, 2015 at 11:26 AM, Adam Baso wrote: > We're going to use the following format: > > ?wprov=<3_char_feature> > Don't forget to document this publicly once it is deployed. https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php is probably a good place for that (even though tech

Re: [Analytics] [Technical] inaccuracy in our pageview dump documentation

2015-03-10 Thread Christian Aistleitner
Hi, [ just to keep archives happy ] On Tue, Mar 10, 2015 at 05:16:30PM +0100, Christian Aistleitner wrote: > After the next rsync (in ~1 hour) the new README should be live. The new README is live now at: http://dumps.wikimedia.org/other/pagecounts-all-sites/README.txt Have fun, Christian

Re: [Analytics] Provenance Params

2015-03-10 Thread Adam Baso
We're going to use the following format: ?wprov=<3_char_feature> For the first version on iOS, this will be ?wprov=safi1 And Android: ?wprov=safa1 -Adam On Mon, Mar 9, 2015 at 1:39 PM, Adam Baso wrote: > Okay, we'll plan on wprov. > > On Wed, Mar 4, 2015 at 12:44 PM, Dan Garry wrote: >

Re: [Analytics] [Technical] inaccuracy in our pageview dump documentation

2015-03-10 Thread Oliver Keyes
Yay; thank you! :) On 10 March 2015 at 12:16, Christian Aistleitner wrote: > Hi, > > On Tue, Mar 10, 2015 at 03:45:53AM -0400, Oliver Keyes wrote: >> [ Typo in secondary documentation of pagecounts-all-sites ] > > Thanks, fixed in HDFS. > > After the next rsync (in ~1 hour) the new README should

Re: [Analytics] [Technical] inaccuracy in our pageview dump documentation

2015-03-10 Thread Christian Aistleitner
Hi, On Tue, Mar 10, 2015 at 03:45:53AM -0400, Oliver Keyes wrote: > [ Typo in secondary documentation of pagecounts-all-sites ] Thanks, fixed in HDFS. After the next rsync (in ~1 hour) the new README should be live. Have fun, Christian -- quelltextlich e.U. \\ Christian Aistle

Re: [Analytics] [Cluster] Monitoring the impact Hive jobs have on the Analytics cluster

2015-03-10 Thread Andrew Otto
I just want to make sure it can be found. I see you added it to the ToC at https://wikitech.wikimedia.org/wiki/Analytics/Cluster, so I think it’ll be fine. > On Mar 9, 2015, at 18:51, Christian Aistleitner > wrote: > > Hi Andrew, > > On Mon, Mar 09, 2015 at 11:54:56AM -0400, Andrew Otto wro

[Analytics] [Technical] inaccuracy in our pageview dump documentation

2015-03-10 Thread Oliver Keyes
I think. Well, I hope. The whitelist at http://dumps.wikimedia.org/other/pagecounts-all-sites/README.txt claims that meta.mediawiki.org is whitelisted. As is usability.mediawiki.org. As is...you get the picture ;) Unless I've had a stroke and am hallucinating the *.mediawiki.org, we mean wikimedi