Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Anthony
On Sat, Jan 9, 2010 at 11:40 PM, Aryeh Gregor > wrote: > On Sat, Jan 9, 2010 at 11:26 PM, Anthony wrote: > > Depends on the machine's securelevel. > > Google informs me that securelevel is a BSD feature. Wikimedia uses > Linux and Solaris. Well, Greg's comment wasn't specific to Linux or Sola

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Aryeh Gregor
On Sat, Jan 9, 2010 at 11:26 PM, Anthony wrote: > Depends on the machine's securelevel. Google informs me that securelevel is a BSD feature. Wikimedia uses Linux and Solaris. It might make sense to have backups be sent to a server that no one has remote access to, say, but the point is that the

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Anthony
On Sat, Jan 9, 2010 at 11:09 PM, Aryeh Gregor > wrote: > On Fri, Jan 8, 2010 at 9:40 PM, Anthony wrote: > > Isn't that what the system immutable flag is for? > > No, that's for confusing the real roots while providing only a speed > bump to an actual hacker. Anyone with root access can always j

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Aryeh Gregor
On Fri, Jan 8, 2010 at 9:40 PM, Anthony wrote: > Isn't that what the system immutable flag is for? No, that's for confusing the real roots while providing only a speed bump to an actual hacker. Anyone with root access can always just unset the flag. Or, failing that, dd if=/dev/zero of=/dev/sda

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Dmitriy Sintsov
* Gregory Maxwell [Fri, 8 Jan 2010 21:06:11 -0500]: > > No one wants the monolithic tarball. The way I got updates previously > was via a rsync push. > > No one sane would suggest a monolithic tarball: it's too much of a > pain to produce! > > Image dump != monolithic tarball. > Why not to extend

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Chad
On Sat, Jan 9, 2010 at 9:27 AM, Carl (CBM) wrote: > On Sat, Jan 9, 2010 at 8:50 AM, Anthony wrote: >> The original version of Instant Commons had it right.  The files were sent >> straight from the WMF to the client.  That version still worked last I >> checked, but my understanding is that it wa

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Carl (CBM)
On Sat, Jan 9, 2010 at 8:50 AM, Anthony wrote: > The original version of Instant Commons had it right.  The files were sent > straight from the WMF to the client.  That version still worked last I > checked, but my understanding is that it was deprecated in favor of the > bandwidth-wasting "store

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Anthony
On Sat, Jan 9, 2010 at 5:37 AM, Robert Rohde wrote: > I know that You didn't want or use a tarball, but requests for an > "image dump" are not that uncommon and often the requester is > envisioning something like a tarball. Arguably that is what the > originator of this thread seems to have been

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Chad
On Sat, Jan 9, 2010 at 7:44 AM, Platonides wrote: > Robert Rohde wrote: >> Of course, strictly speaking we already provide HTTP access to >> everything.  So the real question is how can we make access easier, >> more reliable, and less burdensome.  You or someone else suggested an >> API for grabb

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Platonides
Robert Rohde wrote: > Of course, strictly speaking we already provide HTTP access to > everything. So the real question is how can we make access easier, > more reliable, and less burdensome. You or someone else suggested an > API for grabbing files and that seems like a good idea. Ultimately >

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-09 Thread Robert Rohde
On Fri, Jan 8, 2010 at 6:06 PM, Gregory Maxwell wrote: > No one wants the monolithic tarball. The way I got updates previously > was via a rsync push. > > No one sane would suggest a monolithic tarball: it's too much of a > pain to produce! I know that You didn't want or use a tarball, but requ

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Anthony
On Fri, Jan 8, 2010 at 9:06 PM, Gregory Maxwell wrote: > Yea, well, you can't easily eliminate all the internal points of > failure. "someone with root loses control of their access and someone > nasty wipes everything" is really hard to protect against with online > systems. > Isn't that what t

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Anthony
On Fri, Jan 8, 2010 at 10:56 AM, Aryeh Gregor > wrote: > The sensible bandwidth-saving way to do it would be to set up an rsync > daemon on the image servers, and let people use that. > The bandwidth-saving way to do things would be to just allow mirrors to use hotlinking. Requiring a middle ma

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Gregory Maxwell
On Fri, Jan 8, 2010 at 8:25 PM, Robert Rohde wrote: > While I certainly can't fault your good will, I do find it disturbing > that it was necessary.  Ideally, Wikimedia should have internal > backups of sufficient quality that we don't have to depend on what > third parties happen to have saved fo

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread David Gerard
2010/1/9 Robert Rohde : > The general point I am trying to make is that if we think about what > people really want, and how the files are likely to be used, then > there may be better delivery approaches than trying to create huge > image dumps. Whilst, I'd hope, not letting the quest for the p

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Robert Rohde
On Fri, Jan 8, 2010 at 2:37 PM, Gregory Maxwell wrote: > Er. I've maintained a non-WMF disaster recovery archive for a long > time, though its no longer completely current since the rsync went > away and web fetching is lossy. > > It saved our rear a number of times, saving thousands of images fro

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Platonides
Gregory Maxwell wrote: > Er. I've maintained a non-WMF disaster recovery archive for a long > time, though its no longer completely current since the rsync went > away and web fetching is lossy. And the box run out of disk space. We could try until it fills again, though. A sysadmin fixing images

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Gregory Maxwell
On Fri, Jan 8, 2010 at 3:55 PM, Robert Rohde wrote: > Can someone articulate what the use case is? > > Is there someone out there who could use a 5 TB image archive but is > disappointed it doesn't exist?  Seems rather implausible. > > If not, then I assume that everyone is really after only some

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Robert Rohde
Can someone articulate what the use case is? Is there someone out there who could use a 5 TB image archive but is disappointed it doesn't exist? Seems rather implausible. If not, then I assume that everyone is really after only some subset of the files. If that's the case we should try to figur

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Domas Mituzas
> Well, if there were an rsyncd you could just fetch the ones you wanted > arbitrarily. rsyncd is fail for large file mass delivery, and it is fail when exposed to masses. Domas ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Aryeh Gregor
On Fri, Jan 8, 2010 at 3:28 PM, Bilal Abdul Kader wrote: > I think having access to them on Commons repository is much easier to > handle. A subset should be good enough. > > Having 11 TB of images needs huge research capabilities in order to handle > all of them and work with all of them. > > May

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Bilal Abdul Kader
I think having access to them on Commons repository is much easier to handle. A subset should be good enough. Having 11 TB of images needs huge research capabilities in order to handle all of them and work with all of them. Maybe a special API or advanced API functions would allow people enough a

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Robert Rohde
On Fri, Jan 8, 2010 at 8:24 AM, Gregory Maxwell wrote: > s/terabyte/several terabytes/  My copy is not up to date, but it's not > smaller than 4. Top most versions of Commons files are about 4.9 TB, files on enwiki but not Commons add another 200 GB or so. -Robert Rohde

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Tomasz Finc
William Pietri wrote: > On 01/07/2010 01:40 AM, Jamie Morken wrote: >> I have a >> suggestion for wikipedia!! I think that the database dumps including >> the image files should be made available by a wikipedia bittorrent >> tracker so that people would be able to download the wikipedia backups >>

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Gregory Maxwell
On Fri, Jan 8, 2010 at 10:56 AM, Aryeh Gregor wrote: > On Fri, Jan 8, 2010 at 10:31 AM, Jamie Morken wrote: >> I am not sure about the cost of the bandwidth, but the wikipedia image dumps >> are no longer available on the wikipedia dump anyway.  I am guessing they >> were removed partly because

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Aryeh Gregor
On Fri, Jan 8, 2010 at 10:31 AM, Jamie Morken wrote: > I am not sure about the cost of the bandwidth, but the wikipedia image dumps > are no longer available on the wikipedia dump anyway.  I am guessing they > were removed partly because of the bandwidth cost, or else image licensing > issues p

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Bryan Tong Minh
On Fri, Jan 8, 2010 at 4:31 PM, Jamie Morken wrote: > > Bittorrent is simply a more efficient method to distribute files, especially > if the much larger wikipedia image files were made available again.  The last > dump from english wikipedia including images is over 200GB but is > understandab

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-08 Thread Jamie Morken
Hello, > Is the bandwidth used really a big problem? Bandwidth is pretty > cheap > these days, and given Wikipedia's total draw, I suspect the > occasional > dump download isn't much of a problem. I am not sure about the cost of the bandwidth, but the wikipedia image dumps are no longer ava

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread Pascal Martin
It s possible to download our zeno ( articles or images ) dump with http://okawix..com - Original Message - From: "William Pietri" To: "Wikimedia developers" Sent: Thursday, January 07, 2010 5:52 PM Subject: Re: [Wikitech-l] downloading wikipedia database dumps

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread William Pietri
On 01/07/2010 01:40 AM, Jamie Morken wrote: > I have a > suggestion for wikipedia!! I think that the database dumps including > the image files should be made available by a wikipedia bittorrent > tracker so that people would be able to download the wikipedia backups > including the images (which

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread Bilal Abdul Kader
I have been using the dumps for few months and I think this kind of dumps is much better than a torrent. Yes bandwidth can be saved but I do not think the the cost of bandwidth is higher than the cost of maintaining the torrents. If people are not hosting the files so the value of torrents is limi

Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread Platonides
Jamie Morken wrote: > Hi, > > I have a > suggestion for wikipedia!! I think that the database dumps including > the image files should be made available by a wikipedia bittorrent > tracker so that people would be able to download the wikipedia backups > including the images (which currently they

[Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread Jamie Morken
Hi, I have a suggestion for wikipedia!!  I think that the database dumps including the image files should be made available by a wikipedia bittorrent tracker so that people would be able to download the wikipedia backups including the images (which currently they can't do) and also so that wikiped