Re: [Wiki-research-l] Tracking authorship of wiki content

2015-08-22 Thread Luca de Alfaro
Sorry, I meant to say: if there is interest in the code for the Mediawiki
extension, let me know, and _we_ will clean it up and put on github (you
won't have to clean it up :-).
Luca

On Sat, Aug 22, 2015 at 7:25 AM, Luca de Alfaro l...@dealfaro.com wrote:

 Thank you Federico.  Done.

 BTW, we also had code for a Mediawiki extension that computed this in real
 time.  That code has not yet been cleaned up, but it is available from
 here:
 https://sites.google.com/a/ucsc.edu/luca/the-wikipedia-authorship-project
 If there is interest, I don't think it would be hard to clean up and post
 better to github.
 The extension uses the edit hook to attribute the content of every new
 revision of a wiki page, using the earliest plausible attribution idea 
 algo we used in the paper.

 Luca

 On Sat, Aug 22, 2015 at 12:20 AM, Federico Leva (Nemo) nemow...@gmail.com
  wrote:

 Luca de Alfaro, 22/08/2015 01:51:

 So I got inspired, and I cleaned up some code that Michael Shavlovsky
 and I had written for this:

 https://github.com/lucadealfaro/authorship-tracking


 Great! It's always good when code behind a paper is published, it's never
 too late.
 If you can please add a link from wikipapers:
 http://wikipapers.referata.com/wiki/Form:Tool

 Nemo



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Tracking authorship of wiki content

2015-08-22 Thread Luca de Alfaro
Thank you Federico.  Done.

BTW, we also had code for a Mediawiki extension that computed this in real
time.  That code has not yet been cleaned up, but it is available from
here:
https://sites.google.com/a/ucsc.edu/luca/the-wikipedia-authorship-project
If there is interest, I don't think it would be hard to clean up and post
better to github.
The extension uses the edit hook to attribute the content of every new
revision of a wiki page, using the earliest plausible attribution idea 
algo we used in the paper.

Luca

On Sat, Aug 22, 2015 at 12:20 AM, Federico Leva (Nemo) nemow...@gmail.com
wrote:

 Luca de Alfaro, 22/08/2015 01:51:

 So I got inspired, and I cleaned up some code that Michael Shavlovsky
 and I had written for this:

 https://github.com/lucadealfaro/authorship-tracking


 Great! It's always good when code behind a paper is published, it's never
 too late.
 If you can please add a link from wikipapers:
 http://wikipapers.referata.com/wiki/Form:Tool

 Nemo

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Tracking authorship of wiki content

2015-08-21 Thread Luca de Alfaro
Dear All,

I was yesterday at OpenSym (many thanks to Dirk for organizing this!), and
I was chatting with some people about attribution of content to its authors
in a wiki.
So I got inspired, and I cleaned up some code that Michael Shavlovsky and I
had written for this:

https://github.com/lucadealfaro/authorship-tracking

The way to use it is super simple (see below).  The attribution object can
also be serialized and de-serialized to/from json (see documentation on
github).

The idea behind the code is to attribute the content to the *earliest
revision *where the content was inserted, not the latest as diff tools
usually do.  So if some piece of text is inserted, then deleted, then
re-inserted (in a revert or a normal edit), we still attribute it to the
earliest revision.  This is somewhat similar to what we tried to do in
WikiTrust, but it's better done, and far more efficient.

The algorithm details can be found in
http://www2013.wwwconference.org/proceedings/p343.pdf

I hope this might be of interest!

Luca

import authorship_attribution

a = authorship_attribution.AuthorshipAttribution.new_attribution_processor(N=4)
a.add_revision(I like to eat pasta.split(), revision_info=rev0)
a.add_revision(I like to eat pasta with tomato sauce.split(),
revision_info=rev1)
a.add_revision(I like to eat rice with tomato sauce.split(),
revision_info=rev3)print a.get_attribution()

['rev0', 'rev0', 'rev0', 'rev0', 'rev3', 'rev1', 'rev1', 'rev1']
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Tracking authorship of wiki content

2015-08-21 Thread Luca de Alfaro
Dear Aaron,

sorry, sorry, thanks for helping clear up some mis-conceptions, and let me
see if I can do more.

The WikiWho API is very nice work, and was presented in WWW 2014.

The work of Michael and myself dates from one year before, WWW 2013 (see
http://www2013.wwwconference.org/proceedings/p343.pdf).

This is why in our work we don't give credit to WikiWho.  In fact, it is
them who most politely cite us.

Now, you say, why don't I give them more credit now?  Because I haven't
really done anything new.  I am not claiming anything new, I have just
taken code that was written in 2013, and made it better available on
github, with a moderate clean-up of its API.  We tried to make that code
available in 2013 to the community, by putting it into gerritt (we were
told it was the proper place), but it didn't really work out.  Again, I am
not pushing a new result out.  I am simply making code available that dates
from some time back, and that I realized yesterday might be useful to
others.

There are many many ways to attribute content.  Even if you go for the
theory of earliest possible attribution, which is what we do in the paper
and code, it would certainly be better done using language models of
average text, to better distinguish casual from intentional repetition.

I put the code on github because I was inspired by our conversation
yesterday.  If you like, I'd be happy to give you access to the repo (write
access I mean) so you can both do the code reviews we had been mentioning,
and improve the README.md file with more considerations and references.
Let me know.

Again, what I wanted to do is make better available code written 2-3 years
ago, not really make any new claims.

Luca




On Fri, Aug 21, 2015 at 5:49 PM, Aaron Halfaker aaron.halfa...@gmail.com
wrote:

 Hey Luca!

 Welcome back to the content persistence tracking club!

 I feel like I should clear up some misconceptions.  1st, yours is not the
 first python library that is useful for determining the authorship of
 content in versioned text and I don't think you have given fair treatment
 to the work we have been doing since you last worked on WikiTrust.  For
 example, its hard to tell from your description whether you are doing
 anything different than the wikiwho api[2] with tracking content
 historically.  Further the work I have been doing with diff-based content
 persistence (e.g. [1]) is not so simple as to not notice removals and
 re-additions under most circumstances.

 In my opinion, this is much better for measuring the productivity of a
 contribution (adding content that looks like content that was removed long
 ago is still productive, isn't it?), but maybe less useful for attributing
 a first contributor status to a particular sub-statement.  Regardless, it
 seems that a qualitative analysis is necessary to determine whether these
 differences matter and whether one strategy is better than the other.
 AFAICT, the only software that has received this kind of analysis is
 wikiwho (discussed in [3]).

 Regardless, it's great to have you working in this space again and I
 welcome you to help us develop overview of content persistence measurement
 strategies that is complete and allows others to critically decide which
 strategy matches their needs.   See
 https://meta.wikimedia.org/wiki/Research:Content_persistence for such an
 overview.  I encourage you to use this description of persistence measures
 to differentiate your strategy from the work we have been doing over the
 last 5 years.  Edit boldly!

 1.
 https://pythonhosted.org/mediawiki-utilities/lib/persistence.html#mw-lib-persistence
 2. http://people.aifb.kit.edu/ffl/wikiwho/
 3. http://people.aifb.kit.edu/ffl/wikiwho/fp715-floeck.pdf

 -Aaron


 On Aug 21, 2015 4:52 PM, Luca de Alfaro l...@dealfaro.com wrote:

 Dear All,

 I was yesterday at OpenSym (many thanks to Dirk for organizing this!),
 and I was chatting with some people about attribution of content to its
 authors in a wiki.
 So I got inspired, and I cleaned up some code that Michael Shavlovsky and
 I had written for this:

 https://github.com/lucadealfaro/authorship-tracking

 The way to use it is super simple (see below).  The attribution object
 can also be serialized and de-serialized to/from json (see documentation on
 github).

 The idea behind the code is to attribute the content to the *earliest
 revision *where the content was inserted, not the latest as diff tools
 usually do.  So if some piece of text is inserted, then deleted, then
 re-inserted (in a revert or a normal edit), we still attribute it to the
 earliest revision.  This is somewhat similar to what we tried to do in
 WikiTrust, but it's better done, and far more efficient.

 The algorithm details can be found in
 http://www2013.wwwconference.org/proceedings/p343.pdf

 I hope this might be of interest!

 Luca

 import authorship_attribution

 a = 
 authorship_attribution.AuthorshipAttribution.new_attribution_processor(N=4)
 a.add_revision(I like

Re: [Wiki-research-l] Pageviews, mobile versus desktop

2014-12-15 Thread Luca de Alfaro
So Wikipedia gets 20M pageviews per day total?  I somehow expected more.
Or am I mis-reading the graph?
Luca

On Mon, Dec 15, 2014 at 9:57 AM, Oliver Keyes oke...@wikimedia.org wrote:

 Yep; same timeframe.

 On 15 December 2014 at 12:50, Federico Leva (Nemo) nemow...@gmail.com
 wrote:

 Oliver Keyes, 13/12/2014 21:15:

 http://ironholds.org/misc/pageviews_year_and_week.png - fascinating! It
 reveals a lot of seasonality in the desktop views - again, not
 replicated on mobile (at least, not so strongly)


 Does this graph also go from 2013-02-01 to 2014-12-01?

 Nemo


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



 --
 Oliver Keyes
 Research Analyst
 Wikimedia Foundation

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Fwd: FW: What works for increasing editor engagement?

2014-09-26 Thread Luca de Alfaro
Dear James,

very well argued, thanks for the insightful post.
Saving drafts on the other hand could help avoid many conflicts on
less-trafficked pages.
Right now, on a page that is edited infrequently, this happens:

- User A starts an edit
- User A saves not to lose work, not quite done yet.  Resumes the edit.
- User B (typically an editor) sees the edit by A, and sets to work
polishing it.  Saves.
- User A saves -- conflict

The first edit by A woke up B, and led to the conflict.
If we allowed saving drafts, the following would be more likely:

- User A starts an edit
- User A saves a draft, and continues the edit.
- User A saves the edit.
- User B (typically an editor) sees the edit by A, and sets to work
polishing it.  Saves.

The conflict would occur only if A had second-thoughts about the edit and
continues work after saving it, which might happen, but les frequently.

Of course saving drafts is also cumbersome to implement at scale (how long
would they persist?  there would be clean up needed, etc; maybe they could
persist for one week then be mailed back to the author and deleted?).

Luca

On Fri, Sep 26, 2014 at 11:22 AM, James Forrester jforres...@wikimedia.org
wrote:

 [Re-sending as it bounced first time.]

 On 25 September 2014 22:45, Pine W wiki.p...@gmail.com wrote:

 FWIW there were sessions at Wikimania about concurrent editing. I think
 there is community support for the concept. If it helps us retain good
 faith new editors then that is another good reason to press foward on this
 subject. Perhaps James Forrester can provide an update on the outlook for
 concurrent editing capability.

 ​Hey.

 [This is a bit off-topic for wiki-research-l, but I've been asked to
 answer.]


 First things first: There aren't any plans right now to try to roll this
 out any time soon.


 Collaborative real-time editing is an interesting task in terms of
 engineering, but an exceptional challenge in terms of product. I think that
 it's reasonable to talk about it as a possible solution to issues, but the
 number of problems it raises is so great that people should be careful to
 not talk of it as some magic pixie dust. :-)

 For a couple of brief examples:

 If the objective is to prevent all edit conflicts by making parallel edits
 them impossible, this means either:

 * everyone has to use the collaborative editor;
 * people who can't use the collaborative editor (e.g. old computer, slow
 network, no JavaScript, etc.) can't edit at all;
 * people who don't like the collaborative editor are unable to edit ever
 again; and
 * bots can't edit at all (because they can't react to prompts from other
 users)

 … or:

 * you have to choose to use the collaborative editor for each edit (how do
 newbies know, or is it opt-out somehow?)
 * as soon as someone wants to edit an article collaboratively, everyone
 else's edits die and they're told so (or they all have to wait for the
 collaborative edit session to end and then manually resolve the edit
 conflict);
 * for people who can't or don't want to use the collaborative editor, and
 all bots, the article is essentially locked from their editing until the
 collaborative edit is finished.

 Neither of these are great options.

 ​If instead ​we're happy to keep having edit conflicts, we can allow
 parallel edits, but then the benefit for newbies (and, frankly, the rest of
 us) goes away the second your collaborative edit conflicts with a
 non-collaborative edit. Whoops.




 ​Say that we've decided on a course of action for the above, maybe by
 biting the bullet and denying people with older computers *etc.* the
 ability to edit (which I think would be sad and a dereliction of
 our values); what do you do when there are too many parallel editors of an
 article?

 When you're editing in a real-time collaborative editor, that means you
 see the edits of each of the participants, alongside their
 cursors/selections and comments in the chat system if there is one (which
 there normally is). When there's two or three of these, it's relatively
 easy to see what's happening. But what if there are 1,000 people trying to
 edit the article at once (e.g. the article of a very famous individual just
 after they've died unexpectedly; think Michael Jackson or Robin Williams).
 Showing 1,000 cursors at once isn't just unhelpful – the level of traffic
 would probably kill most users' browsers. Consequently, there needs to be a
 limit somehow on the number of participants; maybe call it 10.

 So, what happens when you click edit on an article where 10 people are
 already editing?
 * Do you just get told tough?
 * Does the least-recently active editor get kicked out so you can join?
 * Does this mean that all I need is 11 bots requesting to edit an article
 to DoS it?

 If you're a special user (e.g. a sysop), can you get into a
 collaborative edit even if it's at the limit?
 * If yes, doesn't this go against our values to place some editors above
 others?
 * If yes, do we just let 

Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
Re. the edit conflicts happening when a new user is editing:

Can't one add some AJAX to the editor that notifies that one still has the
editing window open? Maybe editors could wait to modify work in progress,
if they had that indication, and if the content does not seem vandalism?

Luca

On Thu, Sep 25, 2014 at 12:17 PM, James Salsman jsals...@gmail.com wrote:

 Aaron, would you please post the script you used to create

 https://commons.wikimedia.org/wiki/File:Desirable_newcomer_survival_over_time.png
 ?

 I would be happy to modify it to also collect the number of extant
 non-redirect articles each desirable user created.

  Aaron wrote:
  ... You'll find the hand-coded set of users here
   http://datasets.wikimedia.org/public-datasets/enwiki/rise-and-decline
  ...
   Categories:
  
 1. Vandals - Purposefully malicious, out to cause harm
 2. Bad-faith - Trying to be funny, not here to help or harm
 3. Good-faith - Trying to be productive, but failing
 4. Golden - Successfully contributing productively

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
Better merging would be welcome.  But also less aggressive
editing/policing.

When I edit openstreetmap I have a better overall experience: the edits may
or may not go live immediately, but I don't have the impression that there
is someone aggressively vetting/refining my edits while I am still doing
them.  I feel welcome there.

To make Wikipedia more welcoming, we could do a few things.

We could allow users to save drafts.  In this way, people could work for a
while at their own pace, and then publish the changes.  Currently, saving
is the only way to avoid risking losing changes, but it has the very
undesired effect of inviting editors/vetters to the page before one is
really done.

We could also allow a time window (even 30 minutes) before edits went live
after one is done editing (using above Ajax mechanism to track when editor
open), experienced editors would not need to swoop in quite so fast on the
work of new users, and the whole editing atmosphere would be more relaxed
and welcoming.

The fact is that the Wikipedia editor, with its lack of ability to save
drafts, poor merging, and swooping editors, feels incredibly outdated and
unwelcoming - downright aggressive - to anyone used to WordPress / Google
Docs / Blogger / ...

Luca

On Thu, Sep 25, 2014 at 12:35 PM, James Salsman jsals...@gmail.com wrote:

 Luca wrote:
 
  Re. the edit conflicts happening when a new user is editing:
 
  Can't one add some AJAX to the editor that notifies that one
  still has the editing window open? Maybe editors could wait to
  modify work in progress, if they had that indication, and if the
  content does not seem vandalism?

 Instead of asking editors to wait, we could improve the merge
 algorithm to avoid conflicts:

 https://en.wikipedia.org/wiki/Merge_(revision_control)

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
Flagged revisions is different though, as it requires trusted editors to
flag things as approved.  I am simply advocating the ability to save drafts
visible only to oneself before publishing a change.  WordPress, Blogger,
etc have it.  And so newcomers could edit to their heart content, without
triggering the interest of editors and the consequent conflicts, then save
their changes.

Luca



On Thu, Sep 25, 2014 at 5:15 PM, Scott Hale computermacgy...@gmail.com
wrote:

 On Fri, Sep 26, 2014 at 5:14 AM, Luca de Alfaro l...@dealfaro.com wrote:

 Better merging would be welcome.  But also less aggressive
 editing/policing.

 When I edit openstreetmap I have a better overall experience: the edits
 may or may not go live immediately, but I don't have the impression that
 there is someone aggressively vetting/refining my edits while I am still
 doing them.  I feel welcome there.

 To make Wikipedia more welcoming, we could do a few things.

 We could allow users to save drafts.  In this way, people could work for
 a while at their own pace, and then publish the changes.  Currently, saving
 is the only way to avoid risking losing changes, but it has the very
 undesired effect of inviting editors/vetters to the page before one is
 really done.

 We could also allow a time window (even 30 minutes) before edits went
 live after one is done editing (using above Ajax mechanism to track when
 editor open), experienced editors would not need to swoop in quite so fast
 on the work of new users, and the whole editing atmosphere would be more
 relaxed and welcoming.

 The fact is that the Wikipedia editor, with its lack of ability to save
 drafts, poor merging, and swooping editors, feels incredibly outdated and
 unwelcoming - downright aggressive - to anyone used to WordPress / Google
 Docs / Blogger / ...

 Luca



 The technology exists to do this---[[:en:Wikipedia:Flagged_revisions]].
 The challenge is that many existing users don't want flagged revisions on
 by default.

 And that is the fundamental flaw with this whole email thread. The
 question needing to be answered isn't what increases new user retention.
 The real question is what increases new user retention and is acceptable
 to the most active/helpful existing users. The second question is much
 harder than the first.





___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
You are right about conflicts with fast-updated pages.  Not sure it would
be worse than the current situation though.
For many low traffic articles, drafts only visible to the user would not
have many conflicts -- basically, for all pages with fewer than a couple of
edits per day this would be true, and there are many such pages.
I think a more annoying issue would be how to clean up these drafts; a
policy would be required (one week?), cron jobs, etc, otherwise these
drafts could grow uncontrollably in size due to abandoned edits.  But this
should be solvable, if with some pain.

I tend to think that with a bit of UI tweaking, Wikipedia could be made
more friendly



On Thu, Sep 25, 2014 at 5:58 PM, Scott Hale computermacgy...@gmail.com
wrote:

 Yes, drafts visible only to the user are different. I was thinking of
 flagged revisions in reference to your idea that edits would first go live
 only after a set period of time. This is basically flagged revisions with a
 trivial extension that the flagged revision always be the latest revision
 that is at least X minutes old.

 We could also allow a time window (even 30 minutes) before edits went live
 after one is done editing (using above Ajax mechanism to track when editor
 open), experienced editors would not need to swoop in quite so fast on the
 work of new users, and the whole editing atmosphere would be more relaxed
 and welcoming.


 I think the challenge with drafts visible only to the user is that they
 are very likely to have a conflict and have to merge changes if they wait
 too long between starting the draft and later committing it.



 On Fri, Sep 26, 2014 at 9:20 AM, Luca de Alfaro l...@dealfaro.com wrote:

 Flagged revisions is different though, as it requires trusted editors
 to flag things as approved.  I am simply advocating the ability to save
 drafts visible only to oneself before publishing a change.  WordPress,
 Blogger, etc have it.  And so newcomers could edit to their heart content,
 without triggering the interest of editors and the consequent conflicts,
 then save their changes.

 Luca



 On Thu, Sep 25, 2014 at 5:15 PM, Scott Hale computermacgy...@gmail.com
 wrote:

 On Fri, Sep 26, 2014 at 5:14 AM, Luca de Alfaro l...@dealfaro.com
 wrote:

 Better merging would be welcome.  But also less aggressive
 editing/policing.

 When I edit openstreetmap I have a better overall experience: the edits
 may or may not go live immediately, but I don't have the impression that
 there is someone aggressively vetting/refining my edits while I am still
 doing them.  I feel welcome there.

 To make Wikipedia more welcoming, we could do a few things.

 We could allow users to save drafts.  In this way, people could work
 for a while at their own pace, and then publish the changes.  Currently,
 saving is the only way to avoid risking losing changes, but it has the very
 undesired effect of inviting editors/vetters to the page before one is
 really done.

 We could also allow a time window (even 30 minutes) before edits went
 live after one is done editing (using above Ajax mechanism to track when
 editor open), experienced editors would not need to swoop in quite so fast
 on the work of new users, and the whole editing atmosphere would be more
 relaxed and welcoming.

 The fact is that the Wikipedia editor, with its lack of ability to save
 drafts, poor merging, and swooping editors, feels incredibly outdated and
 unwelcoming - downright aggressive - to anyone used to WordPress / Google
 Docs / Blogger / ...

 Luca



 The technology exists to do this---[[:en:Wikipedia:Flagged_revisions]].
 The challenge is that many existing users don't want flagged revisions on
 by default.

 And that is the fundamental flaw with this whole email thread. The
 question needing to be answered isn't what increases new user retention.
 The real question is what increases new user retention and is acceptable
 to the most active/helpful existing users. The second question is much
 harder than the first.








 --
 Scott Hale
 Oxford Internet Institute
 University of Oxford
 http://www.scotthale.net/
 scott.h...@oii.ox.ac.uk

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
 relaxed
 and welcoming.


 I think the challenge with drafts visible only to the user is that they
 are very likely to have a conflict and have to merge changes if they wait
 too long between starting the draft and later committing it.



 On Fri, Sep 26, 2014 at 9:20 AM, Luca de Alfaro l...@dealfaro.com wrote:

 Flagged revisions is different though, as it requires trusted editors
 to flag things as approved.  I am simply advocating the ability to save
 drafts visible only to oneself before publishing a change.  WordPress,
 Blogger, etc have it.  And so newcomers could edit to their heart content,
 without triggering the interest of editors and the consequent conflicts,
 then save their changes.

 Luca



 On Thu, Sep 25, 2014 at 5:15 PM, Scott Hale computermacgy...@gmail.com
 wrote:

 On Fri, Sep 26, 2014 at 5:14 AM, Luca de Alfaro l...@dealfaro.com
 wrote:

 Better merging would be welcome.  But also less aggressive
 editing/policing.

 When I edit openstreetmap I have a better overall experience: the edits
 may or may not go live immediately, but I don't have the impression that
 there is someone aggressively vetting/refining my edits while I am still
 doing them.  I feel welcome there.

 To make Wikipedia more welcoming, we could do a few things.

 We could allow users to save drafts.  In this way, people could work
 for a while at their own pace, and then publish the changes.  Currently,
 saving is the only way to avoid risking losing changes, but it has the very
 undesired effect of inviting editors/vetters to the page before one is
 really done.

 We could also allow a time window (even 30 minutes) before edits went
 live after one is done editing (using above Ajax mechanism to track when
 editor open), experienced editors would not need to swoop in quite so fast
 on the work of new users, and the whole editing atmosphere would be more
 relaxed and welcoming.

 The fact is that the Wikipedia editor, with its lack of ability to save
 drafts, poor merging, and swooping editors, feels incredibly outdated and
 unwelcoming - downright aggressive - to anyone used to WordPress / Google
 Docs / Blogger / ...

 Luca



 The technology exists to do this---[[:en:Wikipedia:Flagged_revisions]].
 The challenge is that many existing users don't want flagged revisions on
 by default.

 And that is the fundamental flaw with this whole email thread. The
 question needing to be answered isn't what increases new user retention.
 The real question is what increases new user retention and is acceptable
 to the most active/helpful existing users. The second question is much
 harder than the first.








 --
 Scott Hale
 Oxford Internet Institute
 University of Oxford
 http://www.scotthale.net/
 scott.h...@oii.ox.ac.uk

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] FW: What works for increasing editor engagement?

2014-09-25 Thread Luca de Alfaro
This last message of yours Jonathan is very insightful and true.
I wonder how it would be possible to set up some kind of controlled study
on how different edit capabilities lead to different engagements.  One
could always set up controlled mirrors of the Wikipedia for a small set of
pages on a coherent topic, and perhaps measure the difference in
engagement?  Do you think that there is a way to do this?

There are also pages that are very different.  The rapidly evolving page on
a current event requires rapid communication of edits.  Instead, a novice
that edits a page on a topic with little traffic is best left alone (no
tweaking that causes edit conflicts) until she/he is done.

Luca

On Thu, Sep 25, 2014 at 10:13 PM, WereSpielChequers 
werespielchequ...@gmail.com wrote:

 We have had endless discussions about this in the new page patrol
 community. Basically there is a divide between those who think it important
 to communicate with people as quickly as possible so they have a chance to
 fix things before they log off and people such as myself who think that
 this drives people away. So before we try to make people more aware that
 they are dealing with a newbie it would help if we had some neutral
 independent research that indicated which position is more grounded in
 reality. Simply making it clearer to patrollers that they are dealing with
 newbies is solving a non problem, we know the difference between newbies
 and regulars, we just disagree as to the best way to handle newbies.
 Investing in software to tell patrollers when they are dealing with newbies
 is unlikely to help, in fact I would be willing to bet that one of the
 criticisms will be from patrollers saying that it isn't doing that job as
 well as they can because it doesn't spot which editors are obviously
 experienced even if their latest account is not yet auto confirmed.

 There is also the issue that some patrollers may not realise how many edit
 conflicts they cause by templating and categorising articles. Afterall it
 isn't going to be the templater or categoriser who loses the edit conflict,
 that is almost guaranteed to be the newbie. Of course this could be
 resolved by changing the software so that adding a category or template is
 not treated as conflicting with changing the text.

 Regards

 Jonathan Cardy


 On 25 Sep 2014, at 23:23, Luca de Alfaro l...@dealfaro.com wrote:

 Re. the edit conflicts happening when a new user is editing:

 Can't one add some AJAX to the editor that notifies that one still has the
 editing window open? Maybe editors could wait to modify work in progress,
 if they had that indication, and if the content does not seem vandalism?

 Luca

 On Thu, Sep 25, 2014 at 12:17 PM, James Salsman jsals...@gmail.com
 wrote:

 Aaron, would you please post the script you used to create

 https://commons.wikimedia.org/wiki/File:Desirable_newcomer_survival_over_time.png
 ?

 I would be happy to modify it to also collect the number of extant
 non-redirect articles each desirable user created.

  Aaron wrote:
  ... You'll find the hand-coded set of users here
   http://datasets.wikimedia.org/public-datasets/enwiki/rise-and-decline
  ...
   Categories:
  
 1. Vandals - Purposefully malicious, out to cause harm
 2. Bad-faith - Trying to be funny, not here to help or harm
 3. Good-faith - Trying to be productive, but failing
 4. Golden - Successfully contributing productively

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] long in tooth: what outdated looks like

2012-05-03 Thread Luca de Alfaro
This is an EXCELLENT email, Steven.  +1 to it!

Luca

On Thu, May 3, 2012 at 11:17 AM, Steven Walling swall...@wikimedia.orgwrote:

 On Thu, May 3, 2012 at 2:41 AM, Richard Jensen rjen...@uic.edu wrote:

 JSTOR reports there were about 300 articles on Shakespeare a year in
 scholarly journals in 1997 to 2006; none of them are cited, nor any since
 then and only one before then.  This is typical as well of political and
 military history. Wiki editors are not using scholarly journals. I assume
 that is because they are unaware of them.


 Not at all.

 Wikipedians are *very much* aware that these journals exist. They do not
 have access to them, because they are unaffiliated scholars. Dozens of
 editors want access to this content,[1] but can't have it because JSTOR
 locks it down. They just now started letting people access content that is
 in the public domain!

 If as an academic, you see a problem where peer reviewed content is not
 cited in Wikipedia, I would strongly encourage you to join the movement
 lobbying for openness in scholarly work. Otherwise, you're complaining
 about a problem that Wikipedians do not have the power to fix, because
 academics tacitly support a system in which knowledge is kept in the hands
 of the few who can pay for it.

 --
 Steven Walling
 https://wikimediafoundation.org/

 1. https://en.wikipedia.org/wiki/Wikipedia:Requests_for_JSTOR_access

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Request for feedback on new data dump formats

2011-04-01 Thread Luca de Alfaro
Not quite... if I am reading correctly the proposal by Brion, this would
list all the pages that changed in a specific interval.  If the interval is
large, like a month, this could be a very large size, if all the history of
a page is provided.
What I was suggesting is to include only the changes (the revisions) that
occur in a specific time span.

Luca

On Thu, Mar 31, 2011 at 5:33 PM, Yuvi Panda yuvipa...@gmail.com wrote:

 Would incremental dumps, as described by brion long time ago
 (http://leuksman.com/log/2007/10/14/incremental-dumps/) be what you're
 looking for?

 On Fri, Apr 1, 2011 at 5:01 AM, Aaron Halfaker aaron.halfa...@gmail.com
 wrote:
  If periodic update dumps are being considered, information that describes
  changes to old data (page deletes, user renames, etc) would be very
 useful
  to have along with new revisions.
 
  -Aaron
 
  On Mar 31, 2011 6:27 PM, Luca de Alfaro l...@dealfaro.org wrote:
  I think I would be very interested in 3, or even, in having every month
 a
  dump of that month's revisions. As I have built tools for the xml dumps,
  no
  change in format is good for me (and for WikiTrust).
 
  I would find incremental dumps (with occasional, yearly, full dumps)
 much
  easier to manage than full dumps.
 
  Luca
 
  On Thu, Mar 31, 2011 at 2:27 PM, Yuvi Panda yuvipa...@gmail.com
 wrote:
 
  Hi, I'm a student planning on doing GSoC this year on mediawiki.
  Specifically, I'd like to work on data dumps.
 
  I'm writing this to gauge what would be useful to the research
  community. Several ideas thrown about include:
  1. JSON Dumps
  2. Sqlite Dumps
  3. Daily dumps of revisions in last 24 hours
  4. Dumps optimized for very fast import into various external storage
  and smaller size (diffs)
  5. JSON/CSV for Special:Import and Special:Export
 
  Would any of these be useful? Or is there anything else that I'm
  missing, that you would consider much more useful?
 
  Feedback would be invaluable :)
 
  Thanks :)
  --
  Yuvi Panda T
  http://yuvi.in/blog
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 



 --
 Yuvi Panda T
 http://yuvi.in/blog

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Most reverted pages in the en-wikipedia (enwiki-20100130 dump)

2010-08-13 Thread Luca de Alfaro
Thanks, this is great fun!  As an Italian, let me quote:

(0.42525520906166969, (7151, 3041, 59, 514, 63, 2519, 955), 'Penis')
(0.42516069788797062, (1089, 463, 29, 27, 16, 470, 84), 'Inner core')
(0.42490272373540855, (1285, 546, 11, 64, 27, 515, 122), 'Stuff')
(0.42477231329690346, (2745, 1166, 28, 110, 46, 1054, 341), 'Gun')
(0.42474916387959866, (2990, 1270, 37, 149, 23, 1190, 321), 'Monkey')
(0.42443438914027148, (1105, 469, 20, 21, 2, 427, 166), 'Incas')
(0.42433090024330899, (2055, 872, 39, 45, 15, 825, 259), 'Italian
Renaissance')
(0.42375950742484608, (2761, 1170, 34, 94, 24, 978, 461), 'Watermelon')
(0.42362613587191694, (2311, 979, 22, 121, 19, 937, 233), 'Puppy')
(0.4235686492495831, (1799, 762, 20, 83, 34, 669, 231), 'Crap')

It is absolutely great to see that Italian Renaissance (with Incas) is one
of the few cultural topics that makes it as high in the list as the usual
excrement-sex-infantile type of things!!

Luca

On Fri, Aug 13, 2010 at 1:12 PM, Dmitry Chichkov dchich...@gmail.comwrote:

 If anybody is interested, I've made a list of 'most reverted pages' in the
 english wikipedia based on the analysis of the enwiki-20100130 dump. Here is
 the list:
 http://wpcvn.com/enwiki-20100130.most.reverted.tar.bz
 http://wpcvn.com/enwiki-20100130.most.reverted.txt

 This list was calculated using the following sampling criteria:
 * All pages from the enwiki-20100130 dump;
 ** Filtered pages with more than 1000 revisions;
 ** Filtered pages with revert ratios  0.3;
 * Sorted in descending revert ratios.

 Page revision is considered to be a revert if there is a previous revision
 with a matching MD5 checksum;
 BTW, if anybody needs it, the python code that identifies reverts, revert
 wars, self-reverts, etc is available (LGPL).

 -- Regards, Dmitry

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Help to solve three doubts on Wikipedia research data

2010-04-11 Thread Luca de Alfaro
I guess that Wiki(pedia|media) could very well gather statistics on

(revision_id, clicked_link)

pairs without compromising the anonimity of the visitors.  It would be very
useful to have indications on which hyperlinks are most useful.  For
example, I am always curious whether the large editorial effort to curate
categories is worth it.  And also, if one had data on:

(revision_id, search terms used in next search),

one could infer which links are actually missing.
The problem is that many people use search engines rather than Wikipedia's
own search to navigate the Wikipedia... but perhaps the information could
still be reconstructed somehow from session information.

But as far as I know, there is no plan nor current infrastructure to have
such anonymously logged data.  I don't work there, however, so other
better-informed people might comment.

Luca


On Sun, Apr 11, 2010 at 3:19 PM, Ziko van Dijk zvand...@googlemail.comwrote:

 Hello,

 Gregory (? if I remember well) mentioned in August 2009 this:
 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862
 All examined sites spy on their visitors, but Wikimedia and Wikipedia.

 Kind regards
 Ziko


 2010/4/11 Gregory Maxwell gmaxw...@gmail.com:
  On Sun, Apr 11, 2010 at 12:06 PM, Fuster, Mayo mayo.fus...@eui.eu
 wrote:
  * Does the site learn from the navigation and searches? That is, if a
  Wikipedia visitor who reads a Network entry then goes to the Manuel
 Castells
  entry, Will the system understand there is a connexion between them?
 Will
  next time put them together when presenting search results?
 
  No.
 
  Although that is an interesting area of research.
 
  Unfortunately, due to privacy concerns the data that would be required
  to invent such a system (search strings and search click through
  traces) is not available to the public.  (and in fact, the traces
  aren't really collected, currently, as far as I know)
 
  ___
  Wiki-research-l mailing list
  Wiki-research-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 



 --
 Ziko van Dijk
 NL-Silvolde

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Help to solve three doubts on Wikipedia research data

2010-04-11 Thread Luca de Alfaro
The first thing I proposed is innocuous (gathering stats on (revision_id,
clicked_link)), and in fact can be done easily with a minimum of
instrumentation.

The second is very different from the AOL search data.  The AOL search data
was problematic because it associated data on a per-user basis, so you could
use some queries to figure out who the user was, and then see the other
queries of the user.
I am suggesting here to instead gather anonymous statistics on:
(was on page A, did a search, landed on page B), keeping track only of
the (A, B) pairs, without user information.

But the problem is that gathering such anonymous logs takes effort, is
difficult to do securely, is difficult to avoid someone tamper with it and
add back information that should not have been there, and it is difficult to
then present the information to Wikipedia editors in a way that helps them
meaningfully improve pages.

So perhaps the first statistic is the only useful one.

I would also be curious to know, once a user enters, what % of next visits
are due to the visitor clicking on links, vs. doing a search.

Luca

On Sun, Apr 11, 2010 at 6:28 PM, Anthony wikim...@inbox.org wrote:

 On Sun, Apr 11, 2010 at 6:27 PM, Luca de Alfaro l...@dealfaro.org wrote:

 I guess that Wiki(pedia|media) could very well gather statistics on

 (revision_id, clicked_link)

 pairs without compromising the anonimity of the visitors.  It would be
 very useful to have indications on which hyperlinks are most useful.  For
 example, I am always curious whether the large editorial effort to curate
 categories is worth it.  And also, if one had data on:

 (revision_id, search terms used in next search),

 one could infer which links are actually missing.


 Seems to me like that (especially the latter), would need to be done
 extremely carefully to avoid compromising the anonymity of the visitors.
 Although it's not quite as bad it seems reminiscent of the 
 http://en.wikipedia.org/wiki/AOL_search_data_scandal;, especially with
 regard to search terms used in next search.

 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Study on Interfaces to Improving Wikipedia Quality

2008-11-19 Thread Luca de Alfaro
Dear All,

if you go to http://wiki-trust.cse.ucsc.edu/index.php/Main_Page and click on
*Random Page*, you can explore that Wikipedia demo more.   Our trust is
function both of the text age (how many times it has been revised), but
also of the reputation of the revisors.

Let me also point out that our code base has much evolved since then.  In
particular, the latest release of WikiTrust adds:

   - Author tracking.  Hover over a word in the check text tab, and the
   author is displayed in a pop-up window.
   - Origin tracking.  Click on a (non-link) word, and you are sent to the
   diff where the word was introduced.
   - Vote button.  If you are logged in, and you agree with the information
   displayed in the check text tab, you can vote for its correctness, and the
   revision text will gain trust as a consequence.

We do not have a whole-wikipedia demo of that so far, but you can find
various examples linked from http://trust.cse.ucsc.edu/WikiTrust .  Some of
those examples are mirrors of existing wikis (from dumps), but visit our
very own Cookiwiki, which we set up to experiment:

   - www.cookiwiki.org

Register, and edit a page, browse around, etc.  Yes, there is not a lot of
content, but you can experiment with the interface.  Come on, everybody
knows at least one recipe, and we welcome even one for hard-boiled eggs!

Finally, do you have your own wiki?  Then just download our
codehttp://trust.cse.ucsc.edu/WikiTrustand give it a try.
If you follow the link for a
tarballhttp://code.google.com/p/wikitrust/downloads/list,
you can download a tarball which contains a statically-linked executable for
linux (and the source code for any OS), so installation is easy.

Luca

On Wed, Nov 19, 2008 at 12:12 PM, [EMAIL PROTECTED] wrote:

 Dear All,

 My name is Avanidhar Chandrasekaran
 (http://en.wikipedia.org/wiki/User_talk:Avanidhar).

 I work with GroupLens Research at the University of Minnesota, Twin Cities.
 As part of my research, I am involved in analyzing the usefulness and
 Necessity of author reputation in Wikipedia.

 In lieu of this, I have simulated an Interface to color words in an article
 based on their Age.

 Being experienced contributors to Wikipedia, I invite you to participate in
 this study, which involves the following.

 1. Please visit the following Instances of wikipedia and evaluate the
 interface components which have been incorporated into each of them. Each
 of these use their own algorithm to color text.

 a) The Wikitrust project

   http://wiki-trust.cse.ucsc.edu/index.php/Main_Page

 b) The Wiki-reputation project at Grouplens research

   http://wiki-reputation.cs.umn.edu/index.php/Main_Page

 2) Once you have evaluated the two interfaces, kindly complete this survey
 on Wikipedia quality

  http://www.surveymonkey.com/s.aspx?sm=hagN5S1JZHxH6pF9SmXkkA_3d_3d


 We hope to get your valuable feedback on these interfaces and how Wikipedia
 article quality can be improved.

 Thanks for your time

 Avanidhar Chandrasekaran,

 GroupLens Research, University of Minnesota



 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] WikiTrust v2 released: reputation and trust for your wiki in real-time!

2008-08-23 Thread Luca de Alfaro
As some of you might remember, we have been working on author
reputation and text trust systems for wikis; some of you may have seen
our demo at WikiMania 2007, or the on-line demo
http://wiki-trust.cse.ucsc.edu/

Since then, we have been busy at work to build a system that can be
deployed on any wiki, and display the text trust information.
And we finally made it!

We are pleased to announce the release of WikiTrust version 2!

With it, you can compute author reputation and text trust of your
wikis in real-time, as edits to the wiki are made, and you can display
text trust via a new trust tab.
The tool can be installed as a MediaWiki extension, and is released
open-source, under the BSD license; the project page is
http://trust.cse.ucsc.edu/WikiTrust

WikiTrust can be deployed both on new, and on existing, wikis.
WikiTrust stores author reputation and text trust in additional
database tables.  If deployed on an existing wiki, WikiTrust first
computes the reputation and trust information for the current wiki
content, and then processes new edits as they are made.  The
computation is scalable, parallel, and fault-tolerant, in the sense
that WikiTrust adaptively fills in missing trust or reputation
information.

On my MacBook, running under Ubuntu in vmware, WikiTrust can analize
some 10-20 revisions / second of a wiki; so with a little patience,
unless your wiki is truly huge, you can just deploy it and wait a
bit.
Go to http://trust.cse.ucsc.edu/WikiTrust for more information and for
the code!

Feedback, comments, etc are much appreciated!

Luca de Alfaro
(with Ian Pye and Bo Adler)
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Three techreps: assigning trust to Wikipedia content, and reputation, contributions of authors

2008-05-28 Thread Luca de Alfaro
Dear All,

we have three new techreps available:

   - Robust Content-Driven
Reputationhttp://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-09.htmlshows
that the content-driven reputation we proposed in a WWW 2007 paper can
   be made robust to Sybil (sock-puppet) and other coordinated attacks.  In
   WWW 2007, we proposed content-driven reputation for Wikipedia authors,
   where authors gain reputation if their contributions are preserved, and lose
   reputation if their contributions are quickly undone.  The original
   algorithms were very prone to attacks; we show here that they can be made
   resistant.


   - Assigning Trust to Wikipedia
Contenthttp://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-07.htmlproposes
computing the trust of Wikipedia text on the basis of the
   reputation of the author, and the reputation of the people who revised the
   text.   We display text trust by coloring text background.  Many of you have
   seen the on-line demo for the English Wikipedia, at
   http://trust.cse.ucsc.edu/ . This is an improved version of a November
   2007 techrep on the same topic.  In this improved techrep, we show how the
   trust system can be made resistant to attacks.


   - Measuring Author Contributions to the
Wikipediahttp://www.soe.ucsc.edu/%7Eluca/papers/08/ucsc-soe-08-08.htmldefines
and compares various ways for measuring the contribution of
   individual authors to the Wikipedia.  We have our own favorite; read more to
   find out :-)

In these months, we have been busy working at
WikiTrusthttp://trust.cse.ucsc.edu/,
an open-source tool for assigning reputation to wiki authors and trust to
wiki content.  We already have a batch (or off-line) system, which can
compute reputation and trust based on wiki dumps, such as the Wikipedia
dumps made available by the Wikimedia Foundation.  We are developing an
on-line system, which can assign reputation and trust in real-time, as
edits are made.  One of our chief concerns in developing an on-line system
was to ensure that it was robust to attack, and we believe we have made
progress in this direction, as reported in the above techreps.  We are now
proceeding with the implementation; my guess is that we will have a
prototype in a month or so.

By the way, the batch part of  WikiTrust http://trust.cse.ucsc.edu/  can
be easily adapted to carry out various analysis tasks.  Basically, it walks
over all revisions of every page of a wiki, and it contains an efficient
text analysis engine that tells you precisely how text was changed between
versions. So, it is easy to use WikiTrust as a platform to write analysis
algorithms for wikis: you don't have to worry about the boring tasks of
reading and parsing markup language, and computing text diffs in a
reasonable way; you can concentrate on the details of the specific analysis
you want to do.  It is all open source, and we welcome developers or people
interested in it.

All the best,

Luca (with Ian, Bo, and the other wikitrusters).
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The WikiTrust code has been released!

2008-02-18 Thread Luca de Alfaro
Dear All,

we have just released in open-source format  the code of WikiTrust, the tool
we use for the Wikipedia trust coloring.
The project homepage is http://trust.cse.ucsc.edu/ , and from there, you can
follow a link to a live demo.
The code itself is available from http://trust.cse.ucsc.edu/WikiTrust .

The code is suitable to the trust-coloring of a static dump of a wiki; the
code for the coloring of edits in real-time, as they happen, is under
development.

The code is extensible, and it provides a platform over which it is
(relatively) easy to write wiki analysis tools... for instance, we wrote
small analysis procedures that measure the inter-edit time distribution, and
the amount of text contributed by authors of various reputation ranges.
As the text analysis engine and the dump traversal engines are already
built, it is relatively easy to add other analysis modules.

We hope this will be of interest!
All the best,

Luca de Alfaro

(message sent on behalf also of Bo Adler and Ian Pye)
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Demo: coloring the text of the Wikipedia according to its trust

2007-07-30 Thread Luca de Alfaro
Dear Andre,

let me say that the algorithms need tuning, so we are not sure we are doing
the best, but here is the idea:

When a user of reputation 10 (for example) edits the page, the text that is
added only gets trust 6 or so.  It is not immediately considered high trust,
because others have not yet had a chance to vet it.

When a user of reputation 10 edits the page, the trust of the text already
on the page raises a bit (over several edits, it would approach 10).  This
models the fact that the user, by leaving the text there, gave an implicit
vote of assent.

The combination of the two effects explains what you are seeing.
The goal is that even high-reputation authors can only lend part of their
reputation to the text they create; community vetting is still needed to
achieve high trust.

Now as I say, we must still tune the various coefficients in the algorithms
via a learning approach, and there is a bit more in the algorithm than i
describe above, but that's the rough idea.

Another thing I am pondering is how much a reputation change should spill
over paragraph or bullet-point breaks.  I could change easily what I do, but
I will first set up the optimization/learning - I want to have some
quantitative measure of how well the trust algo behaves.

Thanks for your careful analysis of the results!

Luca

On 7/30/07, Andre Engels [EMAIL PROTECTED] wrote:

 2007/7/29, Luca de Alfaro [EMAIL PROTECTED]:

  We first analyze the whole English Wikipedia, computing the reputation
 of
  each author at every point in time, so that we can answer questions like
  what was the reputation of author with id 453 at 5:32 pm of March 14,
  2006.  The reputation is computed according to the idea of
 content-driven
  reputation.
 
  For new portions of text, the trust is equal to (a scaling function of)
 the
  reputation of the text author.
  Portions of text that were already present in the previous revision can
 gain
  reputation when the page is revised by higher-reputation authors,
 especially
  if those authors perform an edit in proximity of the portion of text.
  Portions of text can also lose trust, if low-reputation authors edit in
  their proximity.
  All the algorithms are still very preliminary, and I must still apply a
  rigorous learning approach to optimize the computation.
  Please see the demo page for more details.

 One thing I find peculiar is that adding a text somewhere can lower
 the trust of the surrounding text while at the same thing heightening
 that of far away text. Why is that? See for example

 http://enwiki-trust.cse.ucsc.edu/index.php?title=Collationdiff=prevoldid=102784135
 - trust:6 text is added between trust:8 text, causing the surrounding
 text to go down to trust:6 or even trust:5, but at the same time
 improving text elsewhere in the page from trust:8 to trust:9. Why
 would the author count as low-reputation for the direct environment,
 but high-reputation farther away?

 --
 Andre Engels, [EMAIL PROTECTED]
 ICQ: 6260644  --  Skype: a_engels

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Demo: coloring the text of the Wikipedia according to its trust

2007-07-29 Thread Luca de Alfaro
Dear All:

I would like to tell you about a demo we set up, where we color the text of
Wikipedia articles according to a computed value of trust.  The demo is
available at http://trust.cse.ucsc.edu/

The trust value of each word of each revision is computed according to the
reputation of the original author of the text, as well as the reputation of
all authors that subsequently revised the text.

We have uploaded a few hundred pages; for each page, we display the most
recent 50 revisions (we analyzed them all, but we just uploaded the most
recent 50 to the server).

Of course, there are many other uses of text trust (for example, one could
have the option of viewing a recent high-trust version of each page upon
request), but I believe that this coloring gives an intuitive idea of how it
could work.

I will talk about this at Wikimania, for those of you who will be there.  I
am looking forward to Wikimania!

Details:

We first analyze the whole English Wikipedia, computing the reputation of
each author at every point in time, so that we can answer questions like
what was the reputation of author with id 453 at 5:32 pm of March 14,
2006.  The reputation is computed according to the idea of content-driven
reputation http://www.soe.ucsc.edu/%7Eluca/papers/07/wikiwww2007.html.

For new portions of text, the trust is equal to (a scaling function of) the
reputation of the text author.
Portions of text that were already present in the previous revision can gain
reputation when the page is revised by higher-reputation authors, especially
if those authors perform an edit in proximity of the portion of text.
Portions of text can also lose trust, if low-reputation authors edit in
their proximity.
All the algorithms are still very preliminary, and I must still apply a
rigorous learning approach to optimize the computation.
Please see the demo page for more details.

All the best,

Luca de Alfaro
http://www.soe.ucsc.edu/~luca
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wiki-research-l