The problem (as always) is that there is a difference between pages served
(by the web server) and pages actually wanted and read by the user. 

It would be interesting to have referrer statistics. I'm guessing that many
of Wikipedia pages are being referred by Google (and other general search
engines). If so, people may just be clicking through a list of search
results, which causes them to download a WP page but then immediately move
onto the next search result because it isn't what they are looking for. I
rather suspect the prominence of Facebook in the English Wikipedia results
is due to this effect, as I often find myself on the Wikipedia page for
Facebook instead of Facebook itself following a google search. I think the
use of mobile devices (with small screens) probably encourages this sort of
behaviour.

Kerry


-----Original Message-----
From: wiki-research-l-boun...@lists.wikimedia.org
[mailto:wiki-research-l-boun...@lists.wikimedia.org] On Behalf Of Andrew G.
West
Sent: Sunday, 30 December 2012 2:06 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] 2012 top pageview list

The WMF aggregates them as (page,views) pairs on an hourly basis:

http://dumps.wikimedia.org/other/pagecounts-raw/

I've been parsing these and storing them in a query-able DB format (for 
en.wp exclusively; though the files are available for all projects I 
think) for about two years. If you want to maintain such a fine 
granularity, it can quickly become a terrabyte scale task that eats up a 
lot of processing time.

If your looking for more coarse granularity reports (like top views for 
day, week, month) a lot of efficient aggregation can be done.

See also: http://en.wikipedia.org/wiki/Wikipedia:5000

Thanks, -AW


On 12/28/2012 07:28 PM, John Vandenberg wrote:
> There is a steady stream of blogs and 'news' about these lists
>
>
https://encrypted.google.com/search?client=ubuntu&channel=fs&q=%22Sean+hoyla
nd%22&ie=utf-8&oe=utf-8#q=wikipedia+top+2012&hl=en&safe=off&client=ubuntu&tb
o=d&channel=fs&tbm=nws&source=lnt&tbs=qdr:w&sa=X&psj=1&ei=GzjeUOPpAsfnrAeQk4
DgCg&ved=0CB4QpwUoAw&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.1355534169,d.aW
M&fp=4e60e761ee133369&bpcl=40096503&biw=1024&bih=539
>
> How does a researcher go about obtaining access logs with useragents
> in order to answer some of these questions?
>

-- 
Andrew G. West, Doctoral Candidate
Dept. of Computer and Information Science
University of Pennsylvania, Philadelphia PA
Website: http://www.andrew-g-west.com

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to