William Pietri wrote:
> On 01/07/2010 01:40 AM, Jamie Morken wrote:
>> I have a
>> suggestion for wikipedia!!  I think that the database dumps including
>> the image files should be made available by a wikipedia bittorrent
>> tracker so that people would be able to download the wikipedia backups
>> including the images (which currently they can't do) and also so that
>> wikipedia's bandwidth costs would be reduced. [...]
>>    
> 
> Is the bandwidth used really a big problem? Bandwidth is pretty cheap 
> these days, and given Wikipedia's total draw, I suspect the occasional 
> dump download isn't much of a problem.

No, bandwidth is not really the problem here. I think the core issue is 
to have bulk access to images.

There have been a number of these requests in the past and after talking 
  back and forth, it has usually been the case that a smaller subset of 
the data works just as well.

A good example of this was the Deutsche Fotokek archive made late last 
year.

http://download.wikipedia.org/images/Deutsche_Fotothek.tar ( 11GB )

This provided an easily retrievable high quality subset of our image 
data which researchers could use.

Now if we were to snapshot image data and store them for a particular 
project the amount of duplicate image data would become significant. 
That's because we re-use a ton of image data between projects and 
rightfully so.

If instead we package all of commons into a tarball then we get roughly 
6T's of image data which after numerous conversation has been a bit more 
then most people want to process.

So what does everyone think of going down the collections route?

If we provide enough different and up to date ones then we could easily 
give people a large but manageable amount of data to work with.

If there is a page already for this then please feel free to point me to 
it otherwise I'll create one.

--tomasz


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to