Hello Noah, Thank you for reaching out to us :) The reason for which we have not backfilled the "top pageview per country" data is because, to secure privacy of our users, we use a filtering mechanism to remove pages that have been seen by less than 1000 actors a day, and that the data allowing us to do so is kept only for 90 days. I have just created a task in our phabricator board for us to investigate other filtering methods that could allow us to release historical data, even if less detailed (https://phabricator.wikimedia.org/T299627). Sorry for not being able to help and best of luck for your studies :)
Joseph for the Data Engineering (ex-Analytics) team On Tue, Jan 18, 2022 at 1:33 AM Noah Brunken Syrkis <n...@itu.dk> wrote: > Hello, > > > I noticed that the public api for daily top viewed pages per country[1] > only goes back to Jan 1st, 2021. Could this be backfilled from other > datasets to 2015, without too much effort on Your part? The research team > encouraged me to ask here, when I spoke with them about my need for the > data—I'm a data science student at the IT University of Copenhagen doing a > thesis on predicting country level human value survey responses[2] based on > the top read Wikipedia pages in the given country. > > > Thanks! > Noah > > > [1] > https://wikimedia.org/api/rest_v1/#/Pageviews%20data/get_metrics_pageviews_top_per_country__country___access___year___month___day_ > > [2] http://www.europeansocialsurvey.org/downloadwizard/ > _______________________________________________ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org > -- Joseph Allemandou (joal) (he / him) Staff Data Engineer Wikimedia Foundation
_______________________________________________ Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-le...@lists.wikimedia.org