Στις 24-04-2012, ημέρα Τρι, και ώρα 07:17 -0700, ο/η Napolitano, Diane
έγραψε:
> Hello, I was wondering how the decision is reached to split enwiki 
> pages-meta-history into, say, N XML files.  How is N determined?  Is it based 
> on something like "let's try to have X many pages per XML file" or "Y many 
> revisions per XML file" or trying to keep the size (GB) of each XML file 
> roughly equivalent?  Or is N just an arbitrary number chosen because it 
> sounds nice? :)
> 
We have N = 27 because more than that overloads the cpus on the box with
the result that we wind up with a pile of truncated files.

We guess at the number of pages to go into each file hoping to get
roughly the same execution time to produce each piece.

Ariel


_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to