Hi Bruno,

Actually I'm not going to answer your question and leave it for others who
have developed tools to parse the pagecount files, but while we're on the
topic just wanted to point out the "redirects" and title changes. This is
something that a good number of people who work with the viewership data
overlook. If the title of a page is changed, the history of the page will
be moved under the new title and the old title will become a redirect page
(normally). But the viewership data will be split. So if you want to, for
example, now the viewership of a page with current title B and old title A,
you have to add up the viewership to both pages within the period under
study. Just something to note... and sorry if you're already doing this!

Good luck,
Taha

On Thu, Jul 28, 2016 at 9:00 PM, Bruno Goncalves <bgoncal...@gmail.com>
wrote:

> Hi,
>
> I've been trying to match edit activity with pagecounts but I've
> encountered a couple of problems. The amazing pagecounts dumps (
> https://dumps.wikimedia.org/other/pagecounts-raw/) use the page url to
> identify the individual page:
>
>       fr.b Special:Recherche/Achille_Baraguey_d%5C%27Hilliers 1 624
>
> while the stub-meta-history uses the "raw" title:
>
>   <page>
>     <title>Wikipedia:Community Portal</title>
>     <ns>4</ns>
>     <id>1270</id>
>
>
> so I need an easy way to map title to url. I imagine there some rules on
> how this "translation" is done? My google-fu has failed to encounter them.
>
> Also, are is timezones mentioned in the meta-history files:
>
> <timestamp>2006-02-18T19:29:10Z</timestamp>
>
>
> the same as the one used in the pagecount filenames:
>
> pagecounts-20140725-070000.gz
>
>
> Best,
>
> B
>
> *******************************************
> Bruno Miguel Tavares Gonçalves, PhD
> Homepage: www.bgoncalves.com
> Email: bgoncal...@gmail.com
> *******************************************
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>


-- 
-- 

==New Paper==
Wikipedia traffic data and electoral prediction: towards theoretically
informed models
<http://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-016-0083-3>
Taha Yasseri and Jonathan Bright
EPJ Data Science, 5:22 (2016).
=============

Dr Taha Yasseri
http://www.oii.ox.ac.uk/people/yasseri/
Research Fellow in Computational Social Science, Oxford Internet Institute,
Research Fellow in Humanities and Social Sciences, Wolfson College,
University of Oxford,
and
Faculty Fellow, Alan Turing Institute for Data Science.

Tel. +44-1865-287229
1 St. Giles
Oxford OX1 3JS
UK
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to