Any chance that these archived can be served via bittorent - so that even 
partial downloaders can become servers - leveraging p2p to reduce overall 
bandwidth load on the servers and increase download times?


-----Original Message-----
From: wikitech-l-boun...@lists.wikimedia.org 
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Mike Dupont
Sent: Saturday, June 02, 2012 1:28 AM
To: Wikimedia developers; wikiteam-disc...@googlegroups.com
Subject: Re: [Wikitech-l] [Xmldatadumps-l] XML dumps/Media mirrors update

I have run cron archiving now every 30 minutes, 
http://ia700802.us.archive.org/34/items/wikipedia-delete-2012-06/
it is amazing how fast the stuff gets deleted on wikipedia.
what about the proposed deletes are there categories for that?
thanks
mike

On Wed, May 30, 2012 at 6:26 AM, Mike  Dupont <jamesmikedup...@googlemail.com> 
wrote:
> https://github.com/h4ck3rm1k3/wikiteam code here
>
> On Wed, May 30, 2012 at 6:26 AM, Mike  Dupont 
> <jamesmikedup...@googlemail.com> wrote:
>> Ok, I merged the code from wikteam and have a full history dump 
>> script that uploads to archive.org, next step is to fix the bucket 
>> metadata in the script mike
>>
>> On Tue, May 29, 2012 at 3:08 AM, Mike  Dupont 
>> <jamesmikedup...@googlemail.com> wrote:
>>> Well, I have now updated the script to include  the xml dump in raw 
>>> format. I will have to add more information the achive.org item, at 
>>> least a basic readme.
>>> other thing is that the wikipybot does not support the full history 
>>> it seems, so that I will have to move over to the wikiteam version 
>>> and rework it, I just spent 2 hours on this so i am pretty happy for 
>>> the first version.
>>>
>>> mike
>>>
>>> On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <ad...@alphacorp.tk> 
>>> wrote:
>>>> This is quite nice, though the item's metadata is too little :)
>>>>
>>>> On Tue, May 29, 2012 at 3:40 AM, Mike Dupont 
>>>> <jamesmikedup...@googlemail.com
>>>>> wrote:
>>>>
>>>>> first version of the Script is ready , it gets the versions, puts 
>>>>> them in a zip and puts that on archive.org 
>>>>> https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_de
>>>>> leted.py
>>>>>
>>>>> here is an example output :
>>>>> http://archive.org/details/wikipedia-delete-2012-05
>>>>>
>>>>> http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/a
>>>>> rchive2012-05-28T21:34:02.302183.zip
>>>>>
>>>>> I will cron this, and it should give a start of saving deleted data.
>>>>> Articles will be exported once a day, even if they they were 
>>>>> exported yesterday as long as they are in one of the categories.
>>>>>
>>>>> mike
>>>>>
>>>>> On Mon, May 21, 2012 at 7:21 PM, Mike  Dupont 
>>>>> <jamesmikedup...@googlemail.com> wrote:
>>>>> > Thanks! and run that 1 time per day, they dont get deleted that quickly.
>>>>> > mike
>>>>> >
>>>>> > On Mon, May 21, 2012 at 9:11 PM, emijrp <emi...@gmail.com> wrote:
>>>>> >> Create a script that makes a request to Special:Export using 
>>>>> >> this
>>>>> category
>>>>> >> as feed
>>>>> >> https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_de
>>>>> >> letion
>>>>> >>
>>>>> >> More info
>>>>> https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
>>>>> >>
>>>>> >>
>>>>> >> 2012/5/21 Mike Dupont <jamesmikedup...@googlemail.com>
>>>>> >>>
>>>>> >>> Well I whould be happy for items like this :
>>>>> >>> http://en.wikipedia.org/wiki/Template:Db-a7
>>>>> >>> would it be possible to extract them easily?
>>>>> >>> mike
>>>>> >>>
>>>>> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn 
>>>>> >>> <ar...@wikimedia.org>
>>>>> >>> wrote:
>>>>> >>> > There's a few other reasons articles get deleted: copyright 
>>>>> >>> > issues, personal identifying data, etc.  This makes 
>>>>> >>> > maintaning the sort of mirror you propose problematic, although a 
>>>>> >>> > similar mirror is here:
>>>>> >>> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>>>>> >>> >
>>>>> >>> > The dumps contain only data publically available at the time 
>>>>> >>> > of the
>>>>> run,
>>>>> >>> > without deleted data.
>>>>> >>> >
>>>>> >>> > The articles aren't permanently deleted of course.  The 
>>>>> >>> > revisions
>>>>> texts
>>>>> >>> > live on in the database, so a query on toolserver, for 
>>>>> >>> > example,
>>>>> could be
>>>>> >>> > used to get at them, but that would need to be for research 
>>>>> >>> > purposes.
>>>>> >>> >
>>>>> >>> > Ariel
>>>>> >>> >
>>>>> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike 
>>>>> >>> > Dupont
>>>>> έγραψε:
>>>>> >>> >> Hi,
>>>>> >>> >> I am thinking about how to collect articles deleted based 
>>>>> >>> >> on the
>>>>> "not
>>>>> >>> >> notable" criteria,
>>>>> >>> >> is there any way we can extract them from the mysql 
>>>>> >>> >> binlogs? how are these mirrors working? I would be 
>>>>> >>> >> interested in setting up a mirror
>>>>> of
>>>>> >>> >> deleted data, at least that which is not spam/vandalism 
>>>>> >>> >> based on
>>>>> tags.
>>>>> >>> >> mike
>>>>> >>> >>
>>>>> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
>>>>> ar...@wikimedia.org>
>>>>> >>> >> wrote:
>>>>> >>> >> > We now have three mirror sites, yay!  The full list is 
>>>>> >>> >> > linked to
>>>>> from
>>>>> >>> >> > http://dumps.wikimedia.org/ and is also available at
>>>>> >>> >> >
>>>>> >>> >> >
>>>>> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dum
>>>>> ps#Current_Mirrors
>>>>> >>> >> >
>>>>> >>> >> > Summarizing, we have:
>>>>> >>> >> >
>>>>> >>> >> > C3L (Brazil) with the last 5 good known dumps, Masaryk 
>>>>> >>> >> > University (Czech Republic) with the last 5 known good
>>>>> dumps,
>>>>> >>> >> > Your.org (USA) with the complete archive of dumps, and
>>>>> >>> >> >
>>>>> >>> >> > for the latest version of uploaded media, Your.org with 
>>>>> >>> >> > http/ftp/rsync access.
>>>>> >>> >> >
>>>>> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at the 
>>>>> >>> >> > above sites
>>>>> for
>>>>> >>> >> > volunteering space, time and effort to make this happen.
>>>>> >>> >> >
>>>>> >>> >> > As people noticed earlier, a series of media tarballs 
>>>>> >>> >> > per-project (excluding commons) is being generated.  As 
>>>>> >>> >> > soon as the first run
>>>>> of
>>>>> >>> >> > these is complete we'll announce its location and start 
>>>>> >>> >> > generating them on a semi-regular basis.
>>>>> >>> >> >
>>>>> >>> >> > As we've been getting the bugs out of the mirroring 
>>>>> >>> >> > setup, it is getting easier to add new locations.  Know 
>>>>> >>> >> > anyone interested?  Please let
>>>>> us
>>>>> >>> >> > know; we would love to have them.
>>>>> >>> >> >
>>>>> >>> >> > Ariel
>>>>> >>> >> >
>>>>> >>> >> >
>>>>> >>> >> > _______________________________________________
>>>>> >>> >> > Wikitech-l mailing list
>>>>> >>> >> > Wikitech-l@lists.wikimedia.org 
>>>>> >>> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>> >>> >>
>>>>> >>> >>
>>>>> >>> >>
>>>>> >>> >
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > _______________________________________________
>>>>> >>> > Wikitech-l mailing list
>>>>> >>> > Wikitech-l@lists.wikimedia.org 
>>>>> >>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> --
>>>>> >>> James Michael DuPont
>>>>> >>> Member of Free Libre Open Source Software Kosova 
>>>>> >>> http://flossk.org Contributor FOSM, the CC-BY-SA map of the 
>>>>> >>> world http://fosm.org Mozilla Rep 
>>>>> >>> https://reps.mozilla.org/u/h4ck3rm1k3
>>>>> >>>
>>>>> >>> _______________________________________________
>>>>> >>> Wikitech-l mailing list
>>>>> >>> Wikitech-l@lists.wikimedia.org 
>>>>> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com 
>>>>> >> Pre-doctoral student at the University of Cádiz (Spain)
>>>>> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | 
>>>>> >> WikiTeam Personal website: 
>>>>> >> https://sites.google.com/site/emijrp/
>>>>> >>
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Xmldatadumps-l mailing list
>>>>> >> xmldatadump...@lists.wikimedia.org
>>>>> >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>>>> >>
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > James Michael DuPont
>>>>> > Member of Free Libre Open Source Software Kosova 
>>>>> > http://flossk.org Contributor FOSM, the CC-BY-SA map of the 
>>>>> > world http://fosm.org Mozilla Rep 
>>>>> > https://reps.mozilla.org/u/h4ck3rm1k3
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> James Michael DuPont
>>>>> Member of Free Libre Open Source Software Kosova http://flossk.org 
>>>>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org 
>>>>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> Wikitech-l@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Hydriz
>>>>
>>>> We've created the greatest collection of shared knowledge in 
>>>> history. Help protect Wikipedia. Donate now: 
>>>> http://donate.wikimedia.org 
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> Wikitech-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>>
>>>
>>> --
>>> James Michael DuPont
>>> Member of Free Libre Open Source Software Kosova http://flossk.org 
>>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org 
>>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>
>>
>>
>> --
>> James Michael DuPont
>> Member of Free Libre Open Source Software Kosova http://flossk.org 
>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org 
>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org 
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org 
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3



--
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org Contributor 
FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep 
https://reps.mozilla.org/u/h4ck3rm1k3

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to