Re: [Wiki-research-l] Fwd: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with multiple processes

2018-07-23 Thread Shani Evenstein
Thanks for sharing!

Shani.

On Mon, 23 Jul 2018 23:38 Pine W,  wrote:

> Forwarding in case this is of interest to anyone on the Analytics or
> Research lists who doesn't subscribe to Wikitech-l or Xmldatadumps-l.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> -- Forwarded message --
> From: Ariel Glenn WMF 
> Date: Fri, Jul 20, 2018 at 5:53 AM
> Subject: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with
> multiple processes
> To: Wikipedia Xmldatadumps-l ,
> Wikimedia developers 
>
>
> Good morning!
>
> The pages-meta-history dumps for hewiki take 70 hours these days, the
> longest of any wiki not already running with parallel jobs. I plan to add
> it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will
> run in parallel producing the usual numbered file output; look at e.g.
> frwiki dumps for an example.
>
> Please adjust any download/processing scripts accordingly.
>
> Thanks!
>
> Ariel
> ___
> Wikitech-l mailing list
> wikitec...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Fwd: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with multiple processes

2018-07-23 Thread Pine W
Forwarding in case this is of interest to anyone on the Analytics or
Research lists who doesn't subscribe to Wikitech-l or Xmldatadumps-l.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

-- Forwarded message --
From: Ariel Glenn WMF 
Date: Fri, Jul 20, 2018 at 5:53 AM
Subject: [Wikitech-l] hewiki dump to be added to 'big wikis' and run with
multiple processes
To: Wikipedia Xmldatadumps-l ,
Wikimedia developers 


Good morning!

The pages-meta-history dumps for hewiki take 70 hours these days, the
longest of any wiki not already running with parallel jobs. I plan to add
it to the list of 'big wikis' starting August 1st, meaning that 6 jobs will
run in parallel producing the usual numbered file output; look at e.g.
frwiki dumps for an example.

Please adjust any download/processing scripts accordingly.

Thanks!

Ariel
___
Wikitech-l mailing list
wikitec...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l