Hi,
as Luca already mentioned, we (my colleagues Maribel Acosta and Felix Keppmann
and me) are also working on an algorithm for authorship detection. Our approach
is somewhat different than Luca and Michael's in that we rebuild authorship
information for words in paragraphs and sentences via MD5
On 02/26/2013 02:29 AM, Luca de Alfaro wrote:
>- We need a way to poll the database for things like what are all
>revision_ids of a given page. We could use the API instead, but it's less
>efficient.
Yes, as others have said LAbs should allow that either now or shortly.
You should sig
Hi Luca,
we are working on somewhat related issues in Parsoid [1][2]. The
modified HTML DOM is diffed vs. the original DOM on the way in. Each
modified node is annotated with the base revision. We don't store this
information yet- right now we use it to selectively serialize modified
parts of the
your site doesn't work
http://blamemaps.wmflabs.org/mw/index.php/Main_Page -> the connection timed out
On Tue, Feb 26, 2013 at 5:52 PM, Bartosz Dziewoński wrote:
> I have briefly toyed with something similar. Unlike yours, it has a (very
> simple and rudimentary) interface, but no sophisticated
I have briefly toyed with something similar. Unlike yours, it has a (very
simple and rudimentary) interface, but no sophisticated algorithms inside :) –
just a standard LCS diff library. It also works in real time (but is awfully
slow).
It can be seen at http://wikiblame.heroku.com/ (source at
On 02/25/2013 09:21 PM, Luca de Alfaro wrote:
> The problem is of putting together a bit of effort to get to that first
> running version.
How big are the wikis that you've tried this on? Would smaller academic
wikis be able to use this code?
I may have a use for your code since one of the wiki
It sounds like some of those things should be working in labs soon with
DB replication. I doubt they'll let you store terabytes though.
Alex Monk
On 26/02/13 07:29, Luca de Alfaro wrote:
What we wrote can work also on labs, but:
- We need a way to poll the database for things like what ar
On 02/25/2013 06:21 PM, Luca de Alfaro wrote:
> I am writing this message as we hope this might be of interest, and as we
> would be quite happy to find people willing to collaborate. Is anybody
> interested in developing a GUI for it and talk to us about what API we
> should have for retrieving t
I agree: in fact we don't do it in the write pipeline. The code we wrote
implements a simple queue, where page_id are queued for processing. The
processing job then gets a page_id out of that table, and processes all the
missing revisions for that page_id. So this is useful also if (say) there
i
On 02/25/2013 09:21 PM, Luca de Alfaro wrote:
> I am writing this message as we hope this might be of interest, and as we
> would be quite happy to find people willing to collaborate. Is anybody
> interested in developing a GUI for it and talk to us about what API we
> should have for retrieving t
Dear All,
Michael Shavlovky and I have been working on blame maps (authorship
detection) for the various Wikipedias.
We have code in the WikiMedia repository that has been written with the
goal to obtain a production system capable of attributing all content (not
just a research demo). Here are s
11 matches
Mail list logo