Thanks Felipe! Yes, I think this is a really interesting tool to explore. Another quick example: List articles attract between 2-3% of pageviews on the English Wikipedia:
SELECT SUM(requests) FROM [fh-bigquery:wikipedia.pagecounts_20150715_14] WHERE LEFT(TITLE, 8) = 'List_of_' AND language = 'en' 244091 SELECT SUM(requests) FROM [fh-bigquery:wikipedia.pagecounts_20150715_14] WHERE language = 'en' 8870277 (Caveats: during one hour this Wednesday, using the old pageview definition, i.e. not excluding spiders and bots, and relying on the article name instead of categories.) I understand Felipe has already been talking to the WMF Analytics team, who are making major progress on https://phabricator.wikimedia.org/T44259 currently. On Thu, Jul 16, 2015 at 11:32 AM, Felipe Hoffa <felipe.ho...@gmail.com> wrote: > Hi! I'm currently attending Wikimania (I have a session on Friday at > 4.30pm). > > Tilman Bayer suggested to share this tool and techniques here, so I am > following his advice :). > > I've been using Google BigQuery for a while to analyze Wikipedia's publicly > available data. It's main advantages: > > - It's unbelievable fast (try it - operations that you might expect to run > in minutes or hours run in seconds). > - It's secure, but you can also instantly share data (no need to download > and setup locally before being able to analyze - BigQuery is always on). > - Everyone can use BigQuery with a free quota of 1TB of monthly analysis. > > > Interesting links: > > - Quick getting started: > https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wikipedia_pageviews_in_5/ > - Analyzing the gender gap in Wikipedia (Freebase, and joining it with > pageviews): https://www.youtube.com/watch?v=lV5vk3higvA > - Massive Geo-Ip geolocation from the changelog: > https://www.reddit.com/r/bigquery/comments/1zh7ty/massive_geoip_geolocation_with_google_bigquery/ > - Just for fun, the most popular numbers: > https://www.reddit.com/r/bigquery/comments/2p0vz4/query_of_the_day_the_most_popular_numbers_in/ > - Top Wikipedia Entries Which Are Most-Edited by Members of the U.S. > Congress http://minimaxir.com/2014/07/caucus-needed/ > - Music recommendations: > http://apassant.net/2014/07/11/music-recommendations-300m-data-points-sql/ > > > I have a couple other interesting examples I haven't written about, but the > invitation here is for you to try your own :). > > My main challenge today: How to get more publicly available data into > BigQuery. Let's work together :). I'm sitting around the big data analytics > team today at the Wikimedia hackathon - and as said earlier, I'll do a > session on this topic on Friday at 4:30pm. > > Thanks! > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l