Hi Michael,

The new data is available, but we found a small formatting bug that we have
to fix.  Because of that, we haven't announced it widely yet, and we
haven't rolled up the data to the monthly level.

The data: https://dumps.wikimedia.org/other/pageview_complete/
The bug: some rows have 6 columns and some rows have 5 columns, where
page_id is missing.  We are inserting "null" and re-writing the files, but
it's almost 3 Terabytes so it'll take a while.  If you want to download and
use the data in the meantime, you're welcome to, just make your parsing
robust to the inconsistency.

Thanks for your patience.  Once we have this sorted out we will make a wide
announcement and explain the history of this data and how going forward
there will be a single unified dataset with all the history we have.

Good suggestion to post updates on the -ez page.  We will do that.

On Mon, Nov 16, 2020 at 9:46 AM Michael Tartre <mich...@predata.com> wrote:

> This was brought up in a previous thread (link here
> <https://lists.wikimedia.org/pipermail/wikitech-l/2020-October/093935.html>),
> but the aggregated hourly view dumps haven't been published since
> 2020-09-24 (see here
> <https://dumps.wikimedia.org/other/pagecounts-ez/merged/2020/>, also
> mirrored here
> <http://ftp.acc.umu.se/mirror/wikimedia.org/other/pagecounts-ez/merged/2020/>).
> The response to the previous thread by Dan suggested that the new data
> would be available in a week, but it's already a month past that expected
> deadline. Are there any updates on the status of that new dump, any new
> estimates of when it would become available? I would also suggest posting
> information about the pending change and new system to the information page
> (at https://dumps.wikimedia.org/other/pagecounts-ez/) -- from reading
> that page, there is no indication that data delivery has stopped or that a
> new pipeline will be available shortly.
>
> Thanks for any information,
>
> Michael
>
> --
> *Michael Tartre*
> Senior Machine Learning Engineer
>
> mich...@predata.com
> t: +1 415 857 0967
> 1 Liberty Plaza
> New York, NY 10006
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to