---------- Forwarded message ----------
From: jamesmikedup...@googlemail.com <jamesmikedup...@googlemail.com>
Date: Sun, Oct 18, 2009 at 3:39 AM
Subject: Re: [Foundation-l] Wikipedia meets git
To: Wikimedia Foundation Mailing List <foundatio...@lists.wikimedia.org>


see my new blogpost word leve blaming for wikipedia via git and perl ...
http://fmtyewtk.blogspot.com/2009/10/mediawiki-git-word-level-blaming-one.html


Next step is ready :

1. I have a single script that will pull a given article and check in
the revisions into git,
it is not perfect, but works.

http://bazaar.launchpad.net/~jamesmikedupont/+junk/wikiatransfer/revision/8
you run it like this,from inside a git repo :

perl GetRevisions.pl "Article_Name"

git blame Article_Name/Article.xml
git push origin master

The code that splits up the line is in Process File, this splits all
spaces into newlines.
that way we get a word level blame.

    if ($insidetext)
    {
 ## split all lines on the space
 s/(\ )/\\\n/g;


 print OUT  $_;
    }


The Article is here:
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/article.xml


here are the blame results.
http://github.com/h4ck3rm1k3/KosovoWikipedia/blob/master/Wiki/2008_Kosovo_declaration_of_independence/wordblame.txt


Problem is that github does not like this amount of processor power
begin used and kills the process, you can do a local git blame.

Now we have the tool to easily create a repository from wikipedia, or
any other export enabled mediawiki.

mike

_______________________________________________
foundation-l mailing list
foundatio...@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to