https://github.com/h4ck3rm1k3/wikiteam code here
On Wed, May 30, 2012 at 6:26 AM, Mike Dupont <jamesmikedup...@googlemail.com> wrote: > Ok, I merged the code from wikteam and have a full history dump script > that uploads to archive.org, > next step is to fix the bucket metadata in the script > mike > > On Tue, May 29, 2012 at 3:08 AM, Mike Dupont > <jamesmikedup...@googlemail.com> wrote: >> Well, I have now updated the script to include the xml dump in raw >> format. I will have to add more information the achive.org item, at >> least a basic readme. >> other thing is that the wikipybot does not support the full history it >> seems, so that I will have to move over to the wikiteam version and >> rework it, >> I just spent 2 hours on this so i am pretty happy for the first version. >> >> mike >> >> On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <ad...@alphacorp.tk> wrote: >>> This is quite nice, though the item's metadata is too little :) >>> >>> On Tue, May 29, 2012 at 3:40 AM, Mike Dupont <jamesmikedup...@googlemail.com >>>> wrote: >>> >>>> first version of the Script is ready , it gets the versions, puts them >>>> in a zip and puts that on archive.org >>>> https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py >>>> >>>> here is an example output : >>>> http://archive.org/details/wikipedia-delete-2012-05 >>>> >>>> http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012-05-28T21:34:02.302183.zip >>>> >>>> I will cron this, and it should give a start of saving deleted data. >>>> Articles will be exported once a day, even if they they were exported >>>> yesterday as long as they are in one of the categories. >>>> >>>> mike >>>> >>>> On Mon, May 21, 2012 at 7:21 PM, Mike Dupont >>>> <jamesmikedup...@googlemail.com> wrote: >>>> > Thanks! and run that 1 time per day, they dont get deleted that quickly. >>>> > mike >>>> > >>>> > On Mon, May 21, 2012 at 9:11 PM, emijrp <emi...@gmail.com> wrote: >>>> >> Create a script that makes a request to Special:Export using this >>>> category >>>> >> as feed >>>> >> https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion >>>> >> >>>> >> More info >>>> https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export >>>> >> >>>> >> >>>> >> 2012/5/21 Mike Dupont <jamesmikedup...@googlemail.com> >>>> >>> >>>> >>> Well I whould be happy for items like this : >>>> >>> http://en.wikipedia.org/wiki/Template:Db-a7 >>>> >>> would it be possible to extract them easily? >>>> >>> mike >>>> >>> >>>> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn <ar...@wikimedia.org> >>>> >>> wrote: >>>> >>> > There's a few other reasons articles get deleted: copyright issues, >>>> >>> > personal identifying data, etc. This makes maintaning the sort of >>>> >>> > mirror you propose problematic, although a similar mirror is here: >>>> >>> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page >>>> >>> > >>>> >>> > The dumps contain only data publically available at the time of the >>>> run, >>>> >>> > without deleted data. >>>> >>> > >>>> >>> > The articles aren't permanently deleted of course. The revisions >>>> texts >>>> >>> > live on in the database, so a query on toolserver, for example, >>>> could be >>>> >>> > used to get at them, but that would need to be for research purposes. >>>> >>> > >>>> >>> > Ariel >>>> >>> > >>>> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont >>>> έγραψε: >>>> >>> >> Hi, >>>> >>> >> I am thinking about how to collect articles deleted based on the >>>> "not >>>> >>> >> notable" criteria, >>>> >>> >> is there any way we can extract them from the mysql binlogs? how are >>>> >>> >> these mirrors working? I would be interested in setting up a mirror >>>> of >>>> >>> >> deleted data, at least that which is not spam/vandalism based on >>>> tags. >>>> >>> >> mike >>>> >>> >> >>>> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn < >>>> ar...@wikimedia.org> >>>> >>> >> wrote: >>>> >>> >> > We now have three mirror sites, yay! The full list is linked to >>>> from >>>> >>> >> > http://dumps.wikimedia.org/ and is also available at >>>> >>> >> > >>>> >>> >> > >>>> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors >>>> >>> >> > >>>> >>> >> > Summarizing, we have: >>>> >>> >> > >>>> >>> >> > C3L (Brazil) with the last 5 good known dumps, >>>> >>> >> > Masaryk University (Czech Republic) with the last 5 known good >>>> dumps, >>>> >>> >> > Your.org (USA) with the complete archive of dumps, and >>>> >>> >> > >>>> >>> >> > for the latest version of uploaded media, Your.org with >>>> >>> >> > http/ftp/rsync >>>> >>> >> > access. >>>> >>> >> > >>>> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites >>>> for >>>> >>> >> > volunteering space, time and effort to make this happen. >>>> >>> >> > >>>> >>> >> > As people noticed earlier, a series of media tarballs per-project >>>> >>> >> > (excluding commons) is being generated. As soon as the first run >>>> of >>>> >>> >> > these is complete we'll announce its location and start generating >>>> >>> >> > them >>>> >>> >> > on a semi-regular basis. >>>> >>> >> > >>>> >>> >> > As we've been getting the bugs out of the mirroring setup, it is >>>> >>> >> > getting >>>> >>> >> > easier to add new locations. Know anyone interested? Please let >>>> us >>>> >>> >> > know; we would love to have them. >>>> >>> >> > >>>> >>> >> > Ariel >>>> >>> >> > >>>> >>> >> > >>>> >>> >> > _______________________________________________ >>>> >>> >> > Wikitech-l mailing list >>>> >>> >> > Wikitech-l@lists.wikimedia.org >>>> >>> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > _______________________________________________ >>>> >>> > Wikitech-l mailing list >>>> >>> > Wikitech-l@lists.wikimedia.org >>>> >>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>> >>> >>>> >>> >>>> >>> >>>> >>> -- >>>> >>> James Michael DuPont >>>> >>> Member of Free Libre Open Source Software Kosova http://flossk.org >>>> >>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >>>> >>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >>>> >>> >>>> >>> _______________________________________________ >>>> >>> Wikitech-l mailing list >>>> >>> Wikitech-l@lists.wikimedia.org >>>> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com >>>> >> Pre-doctoral student at the University of Cádiz (Spain) >>>> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam >>>> >> Personal website: https://sites.google.com/site/emijrp/ >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> Xmldatadumps-l mailing list >>>> >> xmldatadump...@lists.wikimedia.org >>>> >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l >>>> >> >>>> > >>>> > >>>> > >>>> > -- >>>> > James Michael DuPont >>>> > Member of Free Libre Open Source Software Kosova http://flossk.org >>>> > Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >>>> > Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >>>> >>>> >>>> >>>> -- >>>> James Michael DuPont >>>> Member of Free Libre Open Source Software Kosova http://flossk.org >>>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >>>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 >>>> >>>> _______________________________________________ >>>> Wikitech-l mailing list >>>> Wikitech-l@lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >>>> >>> >>> >>> >>> -- >>> Regards, >>> Hydriz >>> >>> We've created the greatest collection of shared knowledge in history. Help >>> protect Wikipedia. Donate now: http://donate.wikimedia.org >>> _______________________________________________ >>> Wikitech-l mailing list >>> Wikitech-l@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> >> >> >> -- >> James Michael DuPont >> Member of Free Libre Open Source Software Kosova http://flossk.org >> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org >> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 > > > > -- > James Michael DuPont > Member of Free Libre Open Source Software Kosova http://flossk.org > Contributor FOSM, the CC-BY-SA map of the world http://fosm.org > Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 -- James Michael DuPont Member of Free Libre Open Source Software Kosova http://flossk.org Contributor FOSM, the CC-BY-SA map of the world http://fosm.org Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l