Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-27 Thread Samuel Klein
Thank you, Emijrp! What about the dump of Commons images? [for those with 10TB to spare] SJ On Sun, Jun 26, 2011 at 8:53 AM, emijrp wrote: > Hi all; > > Can you imagine a day when Wikipedia is added to this list?[1] > > WikiTeam have developed a script[2] to download all the Wikipedia dumps (

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-27 Thread emijrp
Hi SJ; You know that that is an old item in our TODO list ; ) I heard that Platonides developed a script for that task long time ago. Platonides, are you there? Regards, emijrp 2011/6/27 Samuel Klein > Thank you, Emijrp! > > What about the dump of Commons images? [for those with 10TB to sp

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-27 Thread Platonides
emijrp wrote: > Hi SJ; > > You know that that is an old item in our TODO list ; ) > > I heard that Platonides developed a script for that task long time ago. > > Platonides, are you there? > > Regards, > emijrp Yes, I am. :) ___ foundation-l mailing li

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
Can you share your script with us? 2011/6/27 Platonides > emijrp wrote: > >> Hi SJ; >> >> You know that that is an old item in our TODO list ; ) >> >> I heard that Platonides developed a script for that task long time ago. >> >> Platonides, are you there? >> >> Regards, >> emijrp >> > > Yes, I a

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
Hi; @Derrick: I don't trust Amazon. Really, I don't trust Wikimedia Foundation either. They can't and/or they don't want to provide image dumps (what is worst?). Community donates images to Commons, community donates money every year, and now community needs to develop a software to extract all th

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Milos Rancic
On 06/28/2011 07:21 PM, emijrp wrote: > @Milos: Instead of spliting image dump using the first letter of filenames, > I thought about spliting using the upload date (-MM-DD). So, first > chunks (2005-01-01) will be tiny, and recent ones of several GB (a single > day). That would be better, ind

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Platonides
emijrp wrote: > Hi; > > @Derrick: I don't trust Amazon. I disagree. Note that we only need them to keep a redundant copy of a file. If they tried to tamper the file we could detect it with the hashes (which should be properly secured, that's no problem). I'd like having the hashes for the xml d

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread emijrp
2011/6/28 Platonides > emijrp wrote: > >> Hi; >> >> @Derrick: I don't trust Amazon. >> > > I disagree. Note that we only need them to keep a redundant copy of a file. > If they tried to tamper the file we could detect it with the hashes (which > should be properly secured, that's no problem). > >

Re: [Foundation-l] [Wiki-research-l] Wikipedia dumps downloader

2011-06-28 Thread Platonides
emijrp wrote: > I didn't mean security problems. I meant just deleted files by weird > terms of service. Commons hosts a lot of images which can be > problematic, like nudes or copyrighted materials in some jurisdictions. > They can deleted what they want and close every account they want, and > we