Re: [Wikitech-l] downloading wikipedia database dumps

Robert Rohde Sat, 09 Jan 2010 02:37:54 -0800

On Fri, Jan 8, 2010 at 6:06 PM, Gregory Maxwell <gmaxw...@gmail.com> wrote:
<snip>


> No one wants the monolithic tarball. The way I got updates previously
> was via a rsync push.
>
> No one sane would suggest a monolithic tarball: it's too much of a
> pain to produce!

I know that You didn't want or use a tarball, but requests for an
"image dump" are not that uncommon and often the requester is
envisioning something like a tarball.  Arguably that is what the
originator of this thread seems to have been asking for.  I think you
and I are probably mostly on the same page about the virtue of
ensuring that images can be distributed and that monolithic approaches
are bad.

<snip>

> But I think producing subsets is pretty much worthless. I can't think
> of a valid use for any reasonably sized subset.  ("All media used on
> big wiki X" is a useful subset I've produced for people before, but
> it's not small enough to be a big win vs a full copy)

Wikipedia itself has gotten so large that increasingly people are
mirroring subsets rather than allocate the space for a full mirror
(e.g. 10000 pages on cooking, or medicine, or whatever).  Grabbing
images needed for such an application would be useful.  I can also see
virtues in having a way grab all images in a category (or set of
categories).  For example, grab all images of dogs, or all images of
Barack Obama.  In case you think this is all hypothetical, I've
actually downloaded tens of thousands of images on more than one
occasion to support topical projects.

<snip>

> If all is made available then everyone's wants can be satisfied. No
> subset is going to get us there. Of course, there are a lot of
> possibilities for the means of transmission, but I think it would be
> most useful to assume that at least a few people are going to want to
> grab everything.

Of course, strictly speaking we already provide HTTP access to
everything.  So the real question is how can we make access easier,
more reliable, and less burdensome.  You or someone else suggested an
API for grabbing files and that seems like a good idea.  Ultimately
the best answer may well be to take multiple approaches to accommodate
both people like you who want everything as well as people that want
only more modest collections.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] downloading wikipedia database dumps

Reply via email to