I've been using the monthly page view summaries from pagecounts-ez. Now on 
https://dumps.wikimedia.org/other/pagecounts-ez/ it says:

"NOTE: This dataset has had some problems and we are no longer generating new 
data, since September 2020. We are phasing it out in favor of Pageviews 
Complete... When it's finished we will announce it widely and explain how to 
migrate."

Is the announcement and explanation available somewhere. I'm having problems 
because 

1. The "totals" files, such as 
https://dumps.wikimedia.org/other/pagecounts-ez/merged/pagecounts-2020-08-views-ge-5-totals.bz2,
 which are of the order of 500Mb per month seem to have no equivalents in the 
new pageview complete dump archives. The monthly files at 
https://dumps.wikimedia.org/other/pageview_complete/monthly/2020/2020-08/ are 
10x larger (and I can't find any description of what the "automated" "user" and 
"spider" files represent, although I can guess)
2. If I download (say)  
https://dumps.wikimedia.org/other/pageview_complete/monthly/2020/2020-08/pageviews-202008-user.bz2,
 and peek at the file using bzless, it seems to contain lots of binary 
characters: it's not clear to me what the format is, or how to decode it. Is 
there any information online to help me?

Thanks for any pointers that might help.
_______________________________________________
Xmldatadumps-l mailing list -- xmldatadumps-l@lists.wikimedia.org
To unsubscribe send an email to xmldatadumps-l-le...@lists.wikimedia.org

Reply via email to