https://github.com/h4ck3rm1k3/wikiteam code here

On Wed, May 30, 2012 at 6:26 AM, Mike  Dupont
<jamesmikedup...@googlemail.com> wrote:
> Ok, I merged the code from wikteam and have a full history dump script
> that uploads to archive.org,
> next step is to fix the bucket metadata in the script
> mike
>
> On Tue, May 29, 2012 at 3:08 AM, Mike  Dupont
> <jamesmikedup...@googlemail.com> wrote:
>> Well, I have now updated the script to include  the xml dump in raw
>> format. I will have to add more information the achive.org item, at
>> least a basic readme.
>> other thing is that the wikipybot does not support the full history it
>> seems, so that I will have to move over to the wikiteam version and
>> rework it,
>> I just spent 2 hours on this so i am pretty happy for the first version.
>>
>> mike
>>
>> On Tue, May 29, 2012 at 1:52 AM, Hydriz Wikipedia <ad...@alphacorp.tk> wrote:
>>> This is quite nice, though the item's metadata is too little :)
>>>
>>> On Tue, May 29, 2012 at 3:40 AM, Mike Dupont <jamesmikedup...@googlemail.com
>>>> wrote:
>>>
>>>> first version of the Script is ready , it gets the versions, puts them
>>>> in a zip and puts that on archive.org
>>>> https://github.com/h4ck3rm1k3/pywikipediabot/blob/master/export_deleted.py
>>>>
>>>> here is an example output :
>>>> http://archive.org/details/wikipedia-delete-2012-05
>>>>
>>>> http://ia601203.us.archive.org/24/items/wikipedia-delete-2012-05/archive2012-05-28T21:34:02.302183.zip
>>>>
>>>> I will cron this, and it should give a start of saving deleted data.
>>>> Articles will be exported once a day, even if they they were exported
>>>> yesterday as long as they are in one of the categories.
>>>>
>>>> mike
>>>>
>>>> On Mon, May 21, 2012 at 7:21 PM, Mike  Dupont
>>>> <jamesmikedup...@googlemail.com> wrote:
>>>> > Thanks! and run that 1 time per day, they dont get deleted that quickly.
>>>> > mike
>>>> >
>>>> > On Mon, May 21, 2012 at 9:11 PM, emijrp <emi...@gmail.com> wrote:
>>>> >> Create a script that makes a request to Special:Export using this
>>>> category
>>>> >> as feed
>>>> >> https://en.wikipedia.org/wiki/Category:Candidates_for_speedy_deletion
>>>> >>
>>>> >> More info
>>>> https://www.mediawiki.org/wiki/Manual:Parameters_to_Special:Export
>>>> >>
>>>> >>
>>>> >> 2012/5/21 Mike Dupont <jamesmikedup...@googlemail.com>
>>>> >>>
>>>> >>> Well I whould be happy for items like this :
>>>> >>> http://en.wikipedia.org/wiki/Template:Db-a7
>>>> >>> would it be possible to extract them easily?
>>>> >>> mike
>>>> >>>
>>>> >>> On Thu, May 17, 2012 at 2:23 PM, Ariel T. Glenn <ar...@wikimedia.org>
>>>> >>> wrote:
>>>> >>> > There's a few other reasons articles get deleted: copyright issues,
>>>> >>> > personal identifying data, etc.  This makes maintaning the sort of
>>>> >>> > mirror you propose problematic, although a similar mirror is here:
>>>> >>> > http://deletionpedia.dbatley.com/w/index.php?title=Main_Page
>>>> >>> >
>>>> >>> > The dumps contain only data publically available at the time of the
>>>> run,
>>>> >>> > without deleted data.
>>>> >>> >
>>>> >>> > The articles aren't permanently deleted of course.  The revisions
>>>> texts
>>>> >>> > live on in the database, so a query on toolserver, for example,
>>>> could be
>>>> >>> > used to get at them, but that would need to be for research purposes.
>>>> >>> >
>>>> >>> > Ariel
>>>> >>> >
>>>> >>> > Στις 17-05-2012, ημέρα Πεμ, και ώρα 13:30 +0200, ο/η Mike Dupont
>>>> έγραψε:
>>>> >>> >> Hi,
>>>> >>> >> I am thinking about how to collect articles deleted based on the
>>>> "not
>>>> >>> >> notable" criteria,
>>>> >>> >> is there any way we can extract them from the mysql binlogs? how are
>>>> >>> >> these mirrors working? I would be interested in setting up a mirror
>>>> of
>>>> >>> >> deleted data, at least that which is not spam/vandalism based on
>>>> tags.
>>>> >>> >> mike
>>>> >>> >>
>>>> >>> >> On Thu, May 17, 2012 at 1:09 PM, Ariel T. Glenn <
>>>> ar...@wikimedia.org>
>>>> >>> >> wrote:
>>>> >>> >> > We now have three mirror sites, yay!  The full list is linked to
>>>> from
>>>> >>> >> > http://dumps.wikimedia.org/ and is also available at
>>>> >>> >> >
>>>> >>> >> >
>>>> http://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_Mirrors
>>>> >>> >> >
>>>> >>> >> > Summarizing, we have:
>>>> >>> >> >
>>>> >>> >> > C3L (Brazil) with the last 5 good known dumps,
>>>> >>> >> > Masaryk University (Czech Republic) with the last 5 known good
>>>> dumps,
>>>> >>> >> > Your.org (USA) with the complete archive of dumps, and
>>>> >>> >> >
>>>> >>> >> > for the latest version of uploaded media, Your.org with
>>>> >>> >> > http/ftp/rsync
>>>> >>> >> > access.
>>>> >>> >> >
>>>> >>> >> > Thanks to Carlos, Kevin and Yenya respectively at the above sites
>>>> for
>>>> >>> >> > volunteering space, time and effort to make this happen.
>>>> >>> >> >
>>>> >>> >> > As people noticed earlier, a series of media tarballs per-project
>>>> >>> >> > (excluding commons) is being generated.  As soon as the first run
>>>> of
>>>> >>> >> > these is complete we'll announce its location and start generating
>>>> >>> >> > them
>>>> >>> >> > on a semi-regular basis.
>>>> >>> >> >
>>>> >>> >> > As we've been getting the bugs out of the mirroring setup, it is
>>>> >>> >> > getting
>>>> >>> >> > easier to add new locations.  Know anyone interested?  Please let
>>>> us
>>>> >>> >> > know; we would love to have them.
>>>> >>> >> >
>>>> >>> >> > Ariel
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > _______________________________________________
>>>> >>> >> > Wikitech-l mailing list
>>>> >>> >> > Wikitech-l@lists.wikimedia.org
>>>> >>> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >
>>>> >>> >
>>>> >>> >
>>>> >>> > _______________________________________________
>>>> >>> > Wikitech-l mailing list
>>>> >>> > Wikitech-l@lists.wikimedia.org
>>>> >>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> --
>>>> >>> James Michael DuPont
>>>> >>> Member of Free Libre Open Source Software Kosova http://flossk.org
>>>> >>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
>>>> >>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>>> >>>
>>>> >>> _______________________________________________
>>>> >>> Wikitech-l mailing list
>>>> >>> Wikitech-l@lists.wikimedia.org
>>>> >>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
>>>> >> Pre-doctoral student at the University of Cádiz (Spain)
>>>> >> Projects: AVBOT | StatMediaWiki | WikiEvidens | WikiPapers | WikiTeam
>>>> >> Personal website: https://sites.google.com/site/emijrp/
>>>> >>
>>>> >>
>>>> >> _______________________________________________
>>>> >> Xmldatadumps-l mailing list
>>>> >> xmldatadump...@lists.wikimedia.org
>>>> >> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > James Michael DuPont
>>>> > Member of Free Libre Open Source Software Kosova http://flossk.org
>>>> > Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
>>>> > Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>>>
>>>>
>>>>
>>>> --
>>>> James Michael DuPont
>>>> Member of Free Libre Open Source Software Kosova http://flossk.org
>>>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
>>>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> Wikitech-l@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Hydriz
>>>
>>> We've created the greatest collection of shared knowledge in history. Help
>>> protect Wikipedia. Donate now: http://donate.wikimedia.org
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> Wikitech-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
>> --
>> James Michael DuPont
>> Member of Free Libre Open Source Software Kosova http://flossk.org
>> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
>> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
>
>
>
> --
> James Michael DuPont
> Member of Free Libre Open Source Software Kosova http://flossk.org
> Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
> Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3



-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org
Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to