[Wiki-research-l] Re: Protection log dataset

2022-11-09 Thread Morten Wang
Hi Giovanni,

Are you thinking of the page protection dataset that Hill and Shaw did for
the paper "Page Protection: Another Missing Dimension of Wikipedia
Research"? If so, they've documented their data gathering and the dataset
here: https://communitydata.science/wiki-protection/


Cheers,
Morten

On Tue, 8 Nov 2022 at 16:41, Giovanni Luca Ciampaglia 
wrote:

> Dear all,
>
> Some time ago someone, possibly on this list, posted an announcement about
> a researcher-friendly dataset covering all page protection log actions.
> Does anybody remember it? I understand that the logging table is also
> dumped as part of the regular database dumps, but being it a snapshot it is
> hard to reconstruct when a page gets in and out of protection. I am pretty
> sure I didn't dream it but since I cannot find it I thought it would worth
> checking with my fellow wiki researchers.
>
> Cheers,
>
> *Giovanni Luca Ciampaglia* • Assistant Professor
> University of Maryland • College of Information Studies (iSchool)
> glciampaglia.com • ischool.umd.edu
> ___
> Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org
>
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org


Re: [Wiki-research-l] Interesting Wikipedia studies

2020-12-18 Thread Morten Wang
In the human-computer interaction field, I'd highlight three seminal papers:

Viégas and Wattenberg's 2004 paper established Wikipedia as an area of
study, and used novel visualization techniques to demonstrate how quickly
vandalism is removed from the encyclopedia. Back in 2004, the main research
question was probably "how does this thing even work?", particularly with
regards to combating vandalism, and this paper starts the path of answering
that question.

Priedhorsky et al's 2007 paper dug into authorship of content that is
viewed, giving us good insights into the "who writes Wikipedia?" question.
It asks some important questions around what "value" is in a
peer-production community like Wikipedia (is content that is viewed more
often more valuable?) There's also some cool methodological aspects of this
paper (it uses MD5 checksums for revert detection, and there's now SHA1
checksums for all revision in Wikipedia's API).

Halfaker et al's 2013 paper digs deeply into answering why the Wikipedia
community started declining in 2007. They find that the quality assurance
processes that were created to deal with the firehose of content coming in
with the exponential growth around 2004–2005 also end up discarding
good-faith contributions. This highlights the problem of how to do quality
assurance while also being a welcoming community to newcomers who are
struggling to learn all of Wikipedia's various rules and conventions (see
also the Teahouse paper).

Another question that I find really interesting and that is perhaps often
overlooked is "why did Wikipedia succeed?" It's easy to think that there
were few or no other competitors in the online encyclopedia space at the
time it got started, but there were a bunch of them. Mako Hill's PhD thesis
has a chapter that looks at that
, and he also gave
a talk at the Berkman Klein Center
 about this.

One thing I've noticed is that all the papers I'm referencing focus on the
English Wikipedia. When it comes to studies of other language editions, or
across multiple ones, I've struggled to come up with a key paper to point
to. Hopefully someone else chimes in and fills that hole, as it's important
to recognize that "Wikipedia" doesn't equal the English one.

Cited papers:

   - Viégas, F. B., Wattenberg, M., & Dave, K. (2004, April). Studying
   cooperation and conflict between authors with history flow visualizations.
   In *Proceedings of the SIGCHI conference on Human factors in computing
   systems* (pp. 575-582).
   - Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K., Terveen, L., &
   Riedl, J. (2007, November). Creating, destroying, and restoring value in
   Wikipedia. In *Proceedings of the 2007 international ACM conference on
   Supporting group work* (pp. 259-268).
   - Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The
   rise and decline of an open collaboration system: How Wikipedia’s reaction
   to popularity is causing its decline. *American Behavioral Scientist*,
   *57*(5), 664-688.
   - Morgan, J. T., Bouterse, S., Walls, H., & Stierch, S. (2013,
   February). Tea and sympathy: crafting positive new user experiences on
   wikipedia. In *Proceedings of the 2013 conference on Computer supported
   cooperative work* (pp. 839-848).
   Hill, Benjamin Mako. “Essays on Volunteer Mobilization in Peer
   Production.” Ph.D. Dissertation, Massachusetts Institute of Technology,
   2013.



On Fri, 18 Dec 2020 at 05:44, Eric Luth  wrote:

> Dear all,
>
> A Swedish professor is writing a piece on Wikipedia for Sweden's largest
> daily newspaper, for the upcoming 20 years anniversary. She asked me for
> "interesting and widespread studies" on Wikipedia – not necessarily within
> any certain focus.
>
> If you would share 2 or 3 studies, that have gained some attention and that
> you find interesting, which would these be?
>
> Would be very happy for any help!
>
> Best
> *Eric Luth*
> Projektledare engagemang och påverkan | Project Manager, Involvement and
> Advocacy
> Wikimedia Sverige
> eric.l...@wikimedia.se
> +46 (0) 765 55 50 95
>
> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
> Läs mer på blimedlem.wikimedia.se
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Standardization of Wikipedia articles according to the lexical constancy of their introductions and body texts

2019-09-27 Thread Morten Wang
Hi Ludovic,

This work sounds interesting, I'm looking forward to learning more about it
as your papers come out!

I read through the post on LinkedIn and from how I interpret it you are
only looking at two quality classes (Features Articles vs other articles).
This seems somewhat odd to me and I'd like to know more about why? The
current trend when it comes to predicting article quality in the English
Wikipedia does not limit the prediction problem to just FAs vs the rest,
instead it's using the whole quality scale[1]. See the list below for some
papers along this line of research.

I'm also really curious about what "standardize the cognitive accessibility
of Wikipedia" means? That might mean more than just "article quality",
hence why I'm asking.

All that being said, I think the approach sounds interesting and probably
adds some signal, so I'm curious to learn more how it works and performs.

References:

   - Warncke-Wang, M., Cosley, D., & Riedl, J. Tell me more: an actionable
   quality model for Wikipedia. OpenSym/WikiSym 2013. [We argue that metadata
   isn't useful because contributors can't change it]
   - Warncke-Wang, M., Ayukaev, V. R., Hecht, B., & Terveen, L. G. The
   success and failure of quality improvement projects in peer production
   communities. CSCW 2015. [See the Appendix for details of the improved model
   and how to get good training data]
   - https://www.mediawiki.org/wiki/ORES builds upon the 2015 paper and is
   a readily accessible API, reference datasets are available on figshare
   

and
   also in the GitHub repository
   . Now the benchmark to
   compare against, as in the three other papers listed below.
   - Dang, Q. V., & Ignat, C. L. Measuring quality of collaboratively
   edited documents: the case of Wikipedia. CIC 2016. [Shows that adding
   readability features can improve predictions]
   - Dang, Q. V., & Ignat, C. L. An end-to-end learning solution for
   assessing the quality of Wikipedia articles. OpenSym 2017. [Shows the
   performance of RNNs, also contains an important discussion of performance,
   interpretability, etc]

I also came across this recent paper by Schmidt and Zangerle that reports
significant improvements, but haven't yet had the time to read the paper
closely:

   - Schmidt, M., & Zangerle, E. Article quality classification on
   Wikipedia: introducing document embeddings and content features. OpenSym
   2019.

Footnotes:

   1. Typically without A-class articles due to how few of them they are.


Cheers,
Morten

On Mon, 23 Sep 2019 at 13:09, Ludovic Bocken  wrote:

> Hello,
>
> I am finishing my PhDs and I think that you could be interested in my last
> main work about the quality of Wikipedia :
>
> https://www.linkedin.com/pulse/standardization-wikipedia-articles-according-lexical-ludovic/
> and in a future collaboration.
>
> I would be very grateful for your feedbacks ! Several publications are in
> preparation... Let me know if you are interested in following this
> thread...
>
> Have a nice week,
>
> Ludovic BOCKEN
> lboc...@gmail.com
> www.ludovicbocken.com
> Skype: ludovic.bocken
> http://www.linkedin.com/in/ludovicbocken
>  Rue Hochelaga,
> Montréal, QC H2K 4N8
> +1 (514) 649 0755
>
> *Avis de confidentialité*
>
> Le présent message transmis par télécopie est confidentiel, et son contenu
> peut être protégé par le secret professionnel. Il est à l’usage exclusif de
> son ou sa destinataire. Toute autre personne est par les présentes avisée
> qu’il lui est strictement interdit de le diffuser, de le distribuer ou de
> le reproduire. Si la ou le destinataire ne peut être joint ou vous est
> inconnu, nous vous prions d’en informer immédiatement l’expéditeur ou
> l’expéditrice et de détruire ce message et toute copie de celui-ci.
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] No. of articles deleted over time

2019-08-16 Thread Morten Wang
A couple of learnings about article deletions from the ACTRIAL analysis:

   1. The logging table does not appear to contain correct page IDs of
   deleted pages until some time in 2014[1]. If you're looking at historical
   data and want to combine earlier deletions with other information,
   following Aaron's lead and using the archive table is probably the way to
   go.
   2. The article namespace doesn't just contain "articles", it also
   contains redirects and disambiguation pages. Particularly redirects can
   affect measurements of number of pages deleted[2] because there have been
   instances of cleanup of substantial numbers of redirects. There's no
   information about redirect status in the archive table, as far as I know,
   but the log comment can be used to identify a substantial number of such
   deletions.

The code I used in our analysis of deletion reasons, which also covers the
article namespace, is on GitHub:
https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py

Footnotes:

   1.
   
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-29
   2.
   
https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-19#Improving_the_data_gathering


Cheers,
Morten

On Fri, 16 Aug 2019 at 05:31, Samuel Klein  wrote:

> Since but 26122 has been fixed, any reason not to use the deletion log
> instead?
>
> On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker 
> wrote:
>
> > Here's a related bit of work:
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
> >
> > In this research project, I used a mix of both the deletion log and the
> > archive table to get a sense for when pages were being deleted.
> >
> > Ultimately, I found that the easiest deletion event to operationalize was
> > to look at the most recent ar_timestamp for a page in the archive table.
> >  I could only go back to 2008 with this metric because the archive table
> > didn't exist before then.
> >
> > The archive table is available in quarry.  See
> > https://quarry.wmflabs.org/query/38414 for an example query that gets
> the
> > timestamp of an article's last revision.
> >
> > The logging table is also in quarry.  See
> > https://quarry.wmflabs.org/query/38415 for an example query that gets
> > deletion events.
> >
> > On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang 
> > wrote:
> >
> > > Dear all,
> > >
> > > Is there an easy way to get the number of articles deleted over time
> > > (e.g., month) in Wikipedia?
> > >
> > > Can I use Quarry? What tables should I use?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Samuel Klein  @metasj   w:user:sj  +1 617 529 4266
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Question on article creation policy

2019-08-09 Thread Morten Wang
In addition to Aaron's helpful pointers, here are the timestamps (all in
UTC) I've found for when ACTRIAL started and ended, and when the permanent
restriction went into place:

ACTRIAL start: 2017-09-14 22:30:00 or thereabouts, ref the first post by
Kaldari in this archived thread from the ACTRIAL talk page

.
ACTRIAL end: 2018-03-14 17:30:00 or thereabouts, based on the timestamp of
the first article created by a non-autoconfirmed user after ACTRIAL ended.
ACPERM start: 2018-04-27 00:00:00 or thereabouts, based on the timestamp in
T192455#4162881 .

You can also see the dates for these in this dashboard showing page
creations by non-autoconfirmed users on enwiki

.


Cheers,
Morten
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Questions about SuggestBot

2019-06-26 Thread Morten Wang
As Stuart Yates kindly pointed out, SuggestBot is alive and well! (And in
case it wasn't obvious, I know this because I'm the one maintaining it :)
It's currently serving up article suggestions in seven languages. It also
updates the list of open tasks (e.g. the one shown on the English Community
Portal ) in a few
languages (those task list updates pick a random selection of articles from
a given set of categories, they're not personalized recommendations).

There is currently not as far as I know any similar tool that does
personalized recommendations. Stuart mentioned some ways that wikipedias
organize work lists and keep track of things that need to be done. There's
also some tools that provide topical suggestions for things to do
(e.g. Citation
Hunt ). I haven't dug into
learning how those work.

When it comes to published research on how Wikipedia contributors work with
tasks, in addition to the two papers that have been published about
SuggestBot there's also this one: Krieger, M., Stark, E. M., & Klemmer, S.
R. "Coordinating tasks on the commons: designing for personal goals,
expertise and serendipity" CHI 2009.

Happy to answer any other questions you (or others) might have about
SuggestBot, of course!


Cheers,
Morten


On Mon, 24 Jun 2019 at 18:21, Haifeng Zhang  wrote:

> Thanks so much for answering my questions, Stuart.
>
> It seems redlinks are related to article creation only.
>
> Could you give me some detail about how "administrative groups" work in
> term of task routing?
>
> I also found the following TASK CENTER page (
> https://en.wikipedia.org/wiki/Wikipedia:Task_Center).
>
> Are the links/lists (under "Do it!") used frequently by editors as routing
> tools?
>
>
> Thanks,
>
> Haifeng Zhang
> 
> From: Wiki-research-l  on
> behalf of Stuart A. Yeates 
> Sent: Sunday, June 23, 2019 11:37:38 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Questions about SuggestBot
>
> (a) SuggestBot visited me in the last week.
>
> https://en.wikipedia.org/w/index.php?title=User_talk%3AStuartyeates&type=revision&diff=902456290&oldid=901462765
>
> (b) There are lots of different task routing approaches: lists of
> redlinks,administrative groups, etc.
>
> (c) Sentences containing the words 'bot' and 'documented' appear to
> mainly exist for comedic value. Bots are typically even less
> documented than usual.
>
> cheers
> stuart
> --
> ...let us be heard from red core to black sky
>
> On Mon, 24 Jun 2019 at 15:24, Haifeng Zhang 
> wrote:
> >
> > Hi all,
> >
> > Is the SuggestBot still in use in Wikipedia?
> >
> > Are there similar task routing tools that have been deployed in
> Wikipedia?
> >
> > Where in Wikipedia the use of such tools or bots was documented?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Content similarity between two Wikipedia articles

2019-05-04 Thread Morten Wang
Hi Haifeng,

Yes, you might want to look into some of the work done by Hecht et al. on
content similarity between languages, as well as work by Sen et al. on
semantic relatedness algorithms (which are implemented in the WikiBrain
framework , by the way, see reference below).
Some paper to start out with could be:

   - Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M. and Gergle,
D. "Omnipedia:
   Bridging the Wikipedia Language Gap
   "
   CHI 2012
   - Hecht, B. and Gergle, D. "The Tower of Babel Meets Web 2.0:
   User-Generated Content and Its Applications in a Multilingual Context
   "
   CHI 2010
   - Shilad Sen, Anja Beth Swoap, Qisheng Li, Brooke Boatman, Ilse
   Dippenaar, Rebecca Gold, Monica Ngo, Sarah Pujol, Bret Jackson, Brent Hecht
   "Cartograph: Unlocking Spatial Visualization Through Semantic Enhancement
   " IUI 2017
   - Sen, Shilad; Johnson, Isaac; Harper, Rebecca; Mai, Huy; Horlbeck
   Olsen, Samuel; Mathers, Benjamin; Souza Vonessen, Laura; Wright, Matthew;
   Hecht, Brent "Towards Domain-Specific Semantic Relatedness: A Case Study
   in Geography " IJCAI,
   2015
   - Sen, Shilad; Lesicko, Matthew; Giesel, Margaret; Gold, Rebecca;
   Hillmann, Benjamin; Naden, Samuel; Russell, Jesse; Wang, Zixiao "Ken";
   Hecht, Brent "Turkers, Scholars, "Arafat" and "Peace": Cultural
   Communities and Algorithmic Gold Standards
   
   "
   - Sen, Shilad; Li, Toby Jia-Jun; Lesicko, Matthew; Weiland, Ari; Gold,
   Rebecca; Li, Yulun; Hillmann, Benjamin; Hecht, Brent "WikiBrain:
   Democratizing computation on Wikipedia
   "
   OpenSym 2014

You can of course also utilize similarity measures from the recommender
systems and information retrieval fields, e.g. use edit histories to
identify articles who have been edited by the same users, or apply search
engine techniques like TF/IDF and content vectors.


Cheers,
Morten

On Sat, 4 May 2019 at 04:48, Haifeng Zhang  wrote:

> Dear folks,
>
> Is there a way to compute content similarity between two Wikipedia
> articles?
>
> For example, I can think of representing each article as a vector of
> likelihoods over possible topics.
>
> But, I wonder there are other work people have already explored in the
> past.
>
>
> Thanks,
>
> Haifeng
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Ways thru which articles could attract editors

2019-04-28 Thread Morten Wang
Hi Haifeng,

In addition to the two you mention, WikiProjects might have "Collaboration
of the Week", there's the WikiCup, there's Wiki Ed. We studied all of those
in our 2015 CSCW paper: The Success and Failure of Quality Improvement
Projects in Peer Production Communities
https://www-users.cs.umn.edu/~morten/publications/cscw2015-improvementprojects.pdf

I would also recommend looking at the research on article quality that has
been done by Kane & Ransbotham. Right now I don't have the time to look up
their work again, but if I remember correctly they also looked at the
virtuous cycle of traffic and quality.


Cheers,
Morten





On Sat, 27 Apr 2019 at 17:39, Haifeng Zhang  wrote:

> Thanks for your reply, Kerry. I meant any kind of quality improvement.
>
> Some mechanisms may target specific type of editors, and others might be
> quite general.
>
>
> Best,
>
> Haifeng Zhang
>
> Postdoctoral Research Fellow
> Human-Computer Interaction Institute
> Carnegie Mellon University
> 
> From: Wiki-research-l  on
> behalf of Kerry Raymond 
> Sent: Saturday, April 27, 2019 6:16:12 PM
> To: 'Research into Wikimedia content and communities'
> Subject: Re: [Wiki-research-l] Ways thru which articles could attract
> editors
>
> "Article quality" is quite a wide topic. I would imagine most good faith
> contributors believe they are improving the quality of an article with
> every edit. Do you have some specific type of quality improvement in mind?
> E.g. more citations, more content, fewer spelling errors?
>
> Kerry
>
> -Original Message-
> From: Wiki-research-l [mailto:wiki-research-l-boun...@lists.wikimedia.org]
> On Behalf Of Haifeng Zhang
> Sent: Sunday, 28 April 2019 7:53 AM
> To: Research into Wikimedia content and communities <
> wiki-research-l@lists.wikimedia.org>
> Subject: [Wiki-research-l] Ways thru which articles could attract editors
>
> Dear folks,
>
> I wonder what are those mechanisms/events (in Wikipedia or WikiProjects)
> which may attract editors to improve article quality.
>
> One example is Today's articles for improvement. Within WikipProjects,
> GA/FA nominations seem useful too.
>
>
> Thanks,
>
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Query user history edits

2019-03-27 Thread Morten Wang
Hi Haifeng,

In my experience, this depends on how many users you're looking to get
information about. Is it a few hundred? A few thousand? A million+?

If you are getting the edit history for a limited number of users (say a
few hundred to a few thousand), then using the API can work well. One thing
to keep in mind when using the API is that your requests might be throttled
and/or there might be database lag. Are you using a software library to
access the API? If not, I'd consider using one so that throttling/lag
doesn't become an issue, it's one of the reasons why I use Pywikibot
 for API requests.

If you're interested in querying a large number of users (say tens of
thousands or more), then getting an account on Toolforge
 so you can run SQL queries against the
replicated MediaWiki databases would make sense. I've frequently used that
approach for data gathering for research purposes.

Hope that helps! And if not, don't hesitate to ask questions :)


Cheers,
Morten

On Wed, 27 Mar 2019 at 07:22, Haifeng Zhang  wrote:

> Dear folks,
>
> Is there a good way to query a user's edit history, e.g., edit count
> during a period?
>
> My current solution is using usercontribs API (
> https://www.mediawiki.org/wiki/API:Usercontribs).
>
> But, the process has been stalled maybe due to some query limit.
>
>
> Thanks,
>
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research around the use of MassMessages extension

2018-12-05 Thread Morten Wang
Found the paper on welcoming newcomers to WikiProjects that I'm certain is
also relevant here, it's this one:
Choi, Boreum, et al. "Socialization Tactics in Wikipedia and Their
Effects." CSCW, 2010.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.424.7966&rep=rep1&type=pdf


Cheers,
Morten

On Mon, 26 Nov 2018 at 10:02, Florence Devouard  wrote:

> Hello Morten
>
>
> Thanks for the infos. Will start from there !
>
> Cheers
>
> Flo
>
> Le 24/11/2018 à 16:47, Morten Wang a écrit :
> > Hi Florence,
> >
> > A paper by Zhu et al spring to mind, as well as the study of phrasing in
> > template messages by Geiger et al. Although these focus on one-to-one
> > communication on Wiki rather than mass communication, I think they'll be
> > relevant. I think there's also a paper about invitations to join
> > WikiProjects that looks at personalized vs templated messages, but I
> cannot
> > find it at the moment.
> >
> > Zhu, H., Kraut, R.E., & Kittur, A., (2013) Effects of Peer Feedback on
> > Contribution: A Field Experiment in Wikipedia. CHI, 2013.
> >
> > Defense Mechanism or Socialization Tactic? Improving Wikipedia's
> > Notifications to Rejected Contributors by Geiger, Halfaker, Pinchuk, and
> > Walling. ICWSM 2012.
> >
> >
> > Cheers,
> > Morten
> >
> >
> > On Thu, 22 Nov 2018 at 07:11, Florence Devouard 
> wrote:
> >
> >> Hello everyone,
> >>
> >>
> >> I was interested to know whether there has been any research done around
> >> the use of the Mass Message mediawiki extension and in particular about
> >> impact of using it.
> >>
> >> By extension, I am interested in any research that might be related to
> >> the impact of posting a "template" message (as opposed to an individual
> >> targetted) on a user talk page.
> >> I know the SignPost did a poll in 2017 to evaluate the interest of
> >> switching to the Newsletter extension system. And I remember reading
> >> about impact of notifications. But are there studies related to the
> >> measure of impact in terms of engagement to mass posting on user talk
> page
> >> ?
> >>
> >> Thanks for any insight you could provide
> >>
> >>
> >> Florence
> >>
> >>
> >>
> >>
> >> ___
> >> Wiki-research-l mailing list
> >> Wiki-research-l@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >>
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Research around the use of MassMessages extension

2018-11-24 Thread Morten Wang
Hi Florence,

A paper by Zhu et al spring to mind, as well as the study of phrasing in
template messages by Geiger et al. Although these focus on one-to-one
communication on Wiki rather than mass communication, I think they'll be
relevant. I think there's also a paper about invitations to join
WikiProjects that looks at personalized vs templated messages, but I cannot
find it at the moment.

Zhu, H., Kraut, R.E., & Kittur, A., (2013) Effects of Peer Feedback on
Contribution: A Field Experiment in Wikipedia. CHI, 2013.

Defense Mechanism or Socialization Tactic? Improving Wikipedia's
Notifications to Rejected Contributors by Geiger, Halfaker, Pinchuk, and
Walling. ICWSM 2012.


Cheers,
Morten


On Thu, 22 Nov 2018 at 07:11, Florence Devouard  wrote:

> Hello everyone,
>
>
> I was interested to know whether there has been any research done around
> the use of the Mass Message mediawiki extension and in particular about
> impact of using it.
>
> By extension, I am interested in any research that might be related to
> the impact of posting a "template" message (as opposed to an individual
> targetted) on a user talk page.
> I know the SignPost did a poll in 2017 to evaluate the interest of
> switching to the Newsletter extension system. And I remember reading
> about impact of notifications. But are there studies related to the
> measure of impact in terms of engagement to mass posting on user talk page
> ?
>
> Thanks for any insight you could provide
>
>
> Florence
>
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] The Maintainers

2017-08-04 Thread Morten Wang
Here's an interdisciplinary group of researchers that work on things that I
thought would be relevant and interesting to a lot of us who study
Wikipedia: http://themaintainers.org


Cheers,
Morten
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Models for developing underserved topics on Wikipedia

2017-05-05 Thread Morten Wang
I was going to chime in here and mention our 2015 CSCW paper, but Aaron
beat me to it, thanks Aaron! :)

There are several related papers in our lit. review, such as the work
studying the Public Policy Initiative (Lampe et al), projects related to
the Wikipedia Education Program/APS Initiative (Farzan et al), and
WikiProjects' Collaboration of the Week (Zhu et al). We also add the
WikiCup in our study.

Not sure what other papers to recommend in this space at the moment, good
luck!


Cheers,
Morten



On 5 May 2017 at 08:24, Aaron Halfaker  wrote:

> Relevant to Gabriel's comment: https://wikiedu.org/blog/2016/
> 08/31/academic-content/
>
> Kevin is around this mailing list sometimes.  Maybe he can give us an
> update.  :)
>
> On Fri, May 5, 2017 at 10:22 AM, Gabriel Mugar  wrote:
>
> > Hi Heather,
> > I imagine the Wiki Education Foundation has data on the impact of their
> > work on article quality. The pilot project for the foundation in 2010 was
> > aimed at improving public policy articles.
> > I hope this helps.
> > Gabe
> >
> > > On May 5, 2017, at 4:46 AM, Heather Ford  wrote:
> > >
> > > Thank you so much for your replies! I'm mostly interested in research
> > that
> > > has been done to study the value/impact of different types of
> > > interventions. But this is all useful, thank you!
> > >
> > > On 5 May 2017 07:07, "Gerard Meijssen" 
> > wrote:
> > >
> > >> Hoi,
> > >> The study by Aaron is about English Wikipedia and concentrates on
> female
> > >> scientists. Great study but when you want to know about the coverage
> of
> > >> English Wikipedia compared to missing knowledge, there are other more
> > >> relevant approaches. I blogged about one [1]. There are many
> categories
> > >> with a definition for its content where English is missing a
> substantial
> > >> number of articles. I blogged about that as well [2].
> > >>
> > >> As your need content relating to South Africa, in Wikidata we included
> > all
> > >> the current parliamentarians of South Africa. Most do/did not have an
> > >> article. There are many places in SA that do not have an article and
> > >> neither does their Mayor. In the Black Lunch Table project artists
> from
> > the
> > >> African Diaspora are documented and when they emigrate they are in
> > focus.
> > >> It follows that South African artists can do with some loving tender
> > care.
> > >> It is easy to come up with relevant subjects that are missing.
> > >>
> > >> My advise to you is: consider the subject in your curriculum. Google
> for
> > >> South African subjects relating to what is on topic and write, expand
> > >> curate as is needed. Talk in the classroom about how Wikipedia is
> > failing
> > >> South Africa and discuss what can be done and how you make the biggest
> > >> impact.. IMHO it starts with well connected stubs.
> > >>
> > >> Do yourself a favour get some friendly admins onboard and protect
> > yourself
> > >> against deletionists. For them South Africa is not what they know so
> how
> > >> can it be notable?
> > >> Thanks,
> > >> GerardM
> > >>
> > >>
> > >> [1]
> > >> http://ultimategerardm.blogspot.nl/2017/04/wikidata-
> > >> user-stories-sum-of-all.html
> > >> [2]
> > >> http://ultimategerardm.blogspot.nl/2017/04/wikipedia-
> > >> research-world-famous-in.html
> > >>
> > >> On 4 May 2017 at 23:37, Aaron Halfaker 
> > wrote:
> > >>
> > >>> Hi Heather!
> > >>>
> > >>> I've been working on methods for measuring content gaps and showing
> > when
> > >>> they appeared and were closed.
> > >>>
> > >>> See https://blog.wikimedia.org/2017/03/07/the-keilana-effect/ for a
> > >>> summary
> > >>> and https://meta.wikimedia.org/wiki/Research:Interpolating_quality_
> > >>> dynamics_in_Wikipedia_and_demonstrating_the_Keilana_Effect for a
> > >> long-form
> > >>> discussion of the methods.
> > >>>
> > >>> I've got a complete dataset of per-article quality assessments for
> all
> > >>> articles in English Wikipedia
> > >>>
> > >>> Halfaker, Aaron; Sarabadani, Amir (2016): Monthly Wikipedia article
> > >> quality
> > >>> predictions. figshare. https://doi.org/10.6084/m9.
> figshare.3859800.v3
> > >>>
> > >>> I'm working hard to get that dataset hosted on Quarry so that it
> would
> > be
> > >>> easier experiment with for arbitrary new cross-sections by anyone who
> > is
> > >>> interested.  But we've hit some technical hurdles.  See
> > >>> https://phabricator.wikimedia.org/T146718
> > >>>
> > >>> On Thu, May 4, 2017 at 12:29 PM, Andrew Krizhanovsky <
> > >>> andrew.krizhanov...@gmail.com> wrote:
> > >>>
> >  Great project! Thank you for information.
> > 
> >  There is the discussion about the multilingual project name at page
> > >>> 33-34.
> >  I like the name Wikischool :)
> > 
> >  Best regards,
> >  Andrew Krizhanovsky.
> > 
> >  On 4 May 2017 at 18:45, Ziko van Dijk  wrote:
> > > Hello,
> > >
> > > Does it have to be Wikipedia? Wikipedia is a reference work for
> > > "everybody", but not especial

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-27 Thread Morten Wang
 that, I wonder if that group of articles
> actually exists. Recently a newish Australian contributor expressed
> disappointment that all the new articles they had created were tagged (by
> others) as of Low Importance. My instinctive reply was "that's normal, I
> think of the thousands of articles I have started only a couple even rated
> as Mid importance, this is because the really important articles were all
> started long ago precisely because they were important". I suspect topics
> that are very important (for reasons other than being short-lived
> importance due in being "current" in the lifetime of Wikipedia) will
> generally show up as having started early in Wikipedia's life and that
> those that become more/less important over time will be largely linked to
> becoming or ceasing to be "current" topics). E.g. article Pasteurization
> started in May 2001 saying nothing more than " Pasteurization is the
> process of killing off bacteria in milk by quickly heating it to a near
> boiling temperature, then quickly cooling it again before the taste and
> other desirable properties are affected. The process was named after its
> inventor, French scientist Louis Pasteur. See also dairy products." The
> links in this very first version are still present in its lede paragraph
> today, suggesting our understanding of "non-current" topics is stable and
> hence initial importance determinations can probably be accurately made.
> For Pasteurization the Talk page shows it was not project-tagged until 2007
> when it was assigned High Importance as its first assessment.
>
> I suspect we will find that initial manual assessment of article
> importance will be pretty accurate for most articles. And I suspect if we
> plot initial importance assessments against time of assessment, we will
> find the higher importance articles commenced life on Wikipedia earlier
> than the lower importance articles. If I am correct, then there isn't a lot
> of value in machine-assessment of importance of topics because it relates
> to factors external to Wikipedia and often does not change over time and
> therefore can often be correctly assessed manually even on new stub
> articles (and any unassessed articles can probably be rated as Low
> Importance as statistically that's almost certainly going to be correct).
> If a topic becomes more important due to "current" events, then invariably
> that article will be updated by many people and one of them will sooner or
> later manually adjust its importance. What is less likely to happen is
> re-assessing downwards of Importance when an important "current" topic
> loses its importance when it is no longer current, e.g. are former American
> presidents like Barack Obama or George W Bush or further back less
> important now? These articles will not be updated frequently once the topic
> is no longer in the news and therefore it is less likely an editor will
> notice and manually downgrade the importance, so there may be a greater
> role for machine-assessment in downgrading importance rather than upgrading
> importance.
>
> Another area where there might be a role for machine-assessed importance
> in regards to POV-pushing where an POV-motivated editor might change the
> manual-assessment importance of articles to be higher or lower based on
> their POV (e.g. my political party is Top Importance, other parties are of
> Low Importance). I suspect that often a page watcher would correct or at
> least question that kind of re-assessment. However, articles with few
> active pagewatchers you might get away with POV-pushing the article's
> importance tag because nobody noticed. In this situation, a machine
> assessment could be useful in spotting this kind of thing.
>
> This suggests that another metric of interest to importance might be
> number of pagewatchers, although I suspect that pagewatching may relate
> more to caring about the article than to caring about the topic. And one
> has to be careful to distinguish active pagewatchers (those who actually do
> review changes on their watchlists) from those who don't, as that may make
> a difference (although I am not sure we can really tell which pagewatchers
> are truly actively reviewing as a "satisfactory review" doesn't leave a
> trace whereas an "unsatisfactory" review is likely to lead to a relatively
> soon revert or some other change to the article, the article Talk or the
> User Talk of reviewed contributor which may be detectable).
>
> The other aspect of articles that occurs to me as being possibly linked to
> importance of the topic would be use of the article as the "main" article
> for a cat

Re: [Wiki-research-l] Project exploring automated classification of article importance

2017-04-20 Thread Morten Wang
Hi Pine,

These are great pointers to existing practices on enwiki, some of which
I've been looking for and/or missed, thanks!


Cheers,
Morten

On 19 April 2017 at 22:35, Pine W  wrote:

> Hi Nettrom,
>
> A few resources from English Wikipedia regarding article importance as
> ranked by humans:
>
> https://en.wikipedia.org/wiki/Wikipedia:Vital_articles
>
> https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_
> Editorial_Team/Release_Version_Criteria#Priority_of_topic
>
> https://en.wikipedia.org/wiki/Wikipedia:WikiProject_assessment#Statistics
>
> I infer from the ENWP Wikicup's scoring protocol that for purposes of the
> competition, an article's "importance" is loosely inferred from the number
> of language editions of Wikipedia in which the article appears:
> https://en.wikipedia.org/wiki/Wikipedia:WikiCup/Scoring#Bonus_points.
>
> HTH,
>
> Pine
>
>
> On Tue, Apr 18, 2017 at 4:17 PM, Morten Wang  wrote:
>
> > Hello everyone,
> >
> > I am currently working with Aaron Halfaker and Dario Taraborelli at the
> > Wikimedia Foundation on a project exploring automated classification of
> > article importance. Our goal is to characterize the importance of an
> > article within a given context and design a system to predict a relative
> > importance rank. We have a project page on meta[1] and welcome comments
> or
> > thoughts on our talk page. You can of course also respond here on
> > wiki-research-l, or send me an email.
> >
> > Before moving on to model-building I did a fairly thorough literature
> > review, finding a myriad of papers spanning several disciplines. We have
> a
> > draft literature review also up on meta[2], which should give you a
> > reasonable introduction to the topic. Again, comments or thoughts (e.g.
> > papers we’ve missed) on the talk page, mailing list, or through email are
> > welcome.
> >
> > Links:
> >
> >1. https://meta.wikimedia.org/wiki/Research:Automated_
> >classification_of_article_importance
> ><https://meta.wikimedia.org/wiki/Research:Automated_
> > classification_of_article_importance>
> >2. https://meta.wikimedia.org/wiki/Research:Studies_of_Importance
> >
> > Regards,
> > Morten
> > [[User:Nettrom]] aka [[User:SuggestBot]]
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Project exploring automated classification of article importance

2017-04-18 Thread Morten Wang
Hello everyone,

I am currently working with Aaron Halfaker and Dario Taraborelli at the
Wikimedia Foundation on a project exploring automated classification of
article importance. Our goal is to characterize the importance of an
article within a given context and design a system to predict a relative
importance rank. We have a project page on meta[1] and welcome comments or
thoughts on our talk page. You can of course also respond here on
wiki-research-l, or send me an email.

Before moving on to model-building I did a fairly thorough literature
review, finding a myriad of papers spanning several disciplines. We have a
draft literature review also up on meta[2], which should give you a
reasonable introduction to the topic. Again, comments or thoughts (e.g.
papers we’ve missed) on the talk page, mailing list, or through email are
welcome.

Links:

   1. https://meta.wikimedia.org/wiki/Research:Automated_
   classification_of_article_importance
   

   2. https://meta.wikimedia.org/wiki/Research:Studies_of_Importance

Regards,
Morten
[[User:Nettrom]] aka [[User:SuggestBot]]
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Patriotic editing hypothesis

2017-01-24 Thread Morten Wang
A couple of research papers that might be helpful:

1: Hecht, B. and Gergle, D. 2009. Measuring Self-Focus Bias in
Community-Maintained Knowledge Repositories. Proceedings of the 2009
International Conference on Communities and Technologies , pp. 11-19.
http://www.brenthecht.com/publications/bhecht_CommAndTech2009.pdf

In their paper, Hecht & Gergle study how content in some of the Wikipedia
editions is focused on certain countries, and those typically correspond to
where the languages are spoken.

2: Warncke-Wang, M., Uduwage, A., Dong, Z., and Riedl, J. "In Search of the
Ur-Wikipedia: Universality, Similarity, and Translation in the Wikipedia
Inter-language Link Network", in WikiSym 2012.
http://www-users.cs.umn.edu/~morten/publications/wikisym2012-urwikipedia.pdf

In this paper we wanted to study similarity based on distance, meaning that
we needed to see if we could locate a Wikipedia edition to a specific
country. Turns out that if you look at the statistics[1], a lot of the
language editions get the vast majority of edits from a single country.
While that's not helpful when it comes to the English edition, it arguably
solves the problem for quite a few other languages.


Footnotes:
1:
https://stats.wikimedia.org/wikimedia/squids/SquidReportPageEditsPerLanguageBreakdown.htm


Cheers,
Morten


On 24 January 2017 at 07:27, Peter Ekman  wrote:

> Regarding Kerry Raymond's "Patriotic editing hypothesis", I've done
> some very simple informal investigation regarding the quality of
> geographic articles, these are mostly on cities, towns, counties, etc.
> in en:Wikipedia.  Geographic articles have much lower average quality
> scores than other subjects (see
> https://en.wikipedia.org/wiki/User:Smallbones/Quality4by4 )
> With just a small bit of poking around it's obvious that the quality
> difference between geo articles and the rest is due to geo articles
> about countries where English is not the native language. A bit more
> poking and something that should have been really obvious jumps out.
> French geo articles on FR:Wiki are much better (at least longer) than
> the corresponding EN:Wiki article; Russian geo articles are much
> better on RU:Wiki than on EN:Wiki, etc.
>
> This is certainly consistent with the "Patriotic editing hypothesis"
> if we define patriotism by language rather than by borders.  It could
> be checked out with other language versions e.g. German vs. French;
> (Finnish, Estonian, Polish, German, or Hungarian, etc.) vs.Russian;
> Chinese vs. any language.
>
> The hypothesis even had a very practical implication - we should
> translate more geo articles from their native language Wikipedias.
>
> Hope this helps,
> Pete Ekman
> 
> Date: Tue, 24 Jan 2017 11:12:58 +1000
> From: "Kerry Raymond" 
> To: "'Research into Wikimedia content and communities'"
> 
> Subject: [Wiki-research-l] regional KPIs
> Message-ID: <006701d275df$02016b90$060442b0$@gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> As previously came up in discussion about chapters, it would be very useful
> to have national data about Wikipedia activities, which can be determined
> (generally) from IP addresses. Now I understand the privacy argument in
> relation to logged-in users (not saying I agree with it though in relation
> to aggregate data). However, can we find a proxy that does not have the
> privacy considerations.
>
>
>
> My hypothesis is that national content is predominantly written by users
> resident in that nation. And that therefore activity on national content
> can
> be used as a proxy for national user editing activity.
>
>
>
> In the case of Australia, we could describe Australian national content in
> either of two ways: articles within the closure of the
> [[Category:Australia]] and/or those tagged as  {{WikiProject Australia}}.
> There are arguments for/against either (neither is perfect, in my
> experience
> the category closure will tend to have false positives and the project will
> tend to have false negatives).
>
>
>
> I would like to know what correlation exists between national editor
> activity (as determined from IP addresses mapped to location) and national
> content edits and if/how it changes over time for various nations. This is
> research that only WMF can do because WMF has the IP addresses and the rest
> of us can't have them for privacy reasons.
>
>
>
> If we could establish that a strong-enough correlation existed between
> them,
> we could use national content activity (for which there is no privacy
> consideration) as a proxy for national editing activity. And we might even
> be able to come up with a multiplier for each nation to provide comparable
> data for national editing activity.
>
>
>
> Now, it may be that we need to restrict the edits themselves in some way to
> maximise the correlations between national content and same-nation editor
> activity.
>
>
>
> My second hypothesis is "semantic" edits (e.g. edits that add large amounts
> of content or citati

Re: [Wiki-research-l] Upcoming research newsletter (September 2016): new papers open for review

2016-10-14 Thread Morten Wang
I signed up to read Forte et al's paper "Privacy, Anonymity, and Perceived
Risk in Open Collaboration: A Study of Tor Users and Wikipedians". After
having read it, I'm clearly not qualified to give it a proper review.
Instead, I'd suggest that someone who's well-versed in the
security/anonymity/harassment literature review it so that a solid review
can be written, and encourage someone to volunteer.


Cheers,
Morten


On 12 October 2016 at 00:46,  wrote:

> Hi everybody,
>
>
> We’re preparing for the September 2016 research newsletter and looking for
> contributors. Please take a look at: https://etherpad.wikimedia.
> org/p/WRN201609 and add your name next to any paper you are interested in
> covering. The publication schedule is a bit mixed up currently - there is a
> chance we will already need to get out this issue in the next few days; but
> if you prefer to take more time, feel free to mark your contribution for
> the subsequent October issue instead, which should come out toward the end
> of this month. As usual, short notes and one-paragraph reviews are most
> welcome.
>
>
> Highlights from this month:
>
>
>
> ·   5000 people on Brexit & US Elections
>
> ·   A Smooth Transition to Modern mathoid-based Math Rendering in
> Wikipedia with Automatic Visual Regression Testing
>
> ·   Answering End-User Questions, Queries and Searches on Wikipedia
> and its History
>
> ·   Automated News Suggestions for Populating Wikipedia Entity Page
>
> ·   Content Disputes in Wikipedia Reflect Geopolitical Instability
>
> ·   Creating Causal Embeddings for Question Answering with Minimal
> Supervision
>
> ·   Cultural Differences in the Understanding of History on Wikipedia
>
> ·   Examining potential mechanisms underlying the Wikipedia gender
> gap through a collaborative editing task
>
> ·   Expanding Wikidata's Parenthood Information by 178%, or How To
> Mine Relation Cardinalities
>
> ·   Exploration on the Use of WDQS: Breakdown by Geography, User
> Agent and Referer Class
>
> ·   Finding News Citations For Wikipedia
>
> ·   Gender gap on Wikipedia: visible in all categories?
>
> ·   How do students trust Wikipedia? An examination across genders
>
> ·   Incorporating Relation Paths in Neural Relation Extraction
>
> ·   Memory Remains: Understanding Collective Memory in the Digital Age
>
> ·   Once You Step Over the First Line, You Become Sensitized to the
> Next: Towards a Gateway Theory of Online Participation
>
> ·   Privacy, Anonymity, and Perceived Risk in Open Collaboration: A
> Study of Tor Users and Wikipedians
>
> ·   Quality and Importance of Wikipedia Articles in Different
> Languages
>
> ·   Using Semantic Web Technologies for Explaining and Predicting
> Abnormal Expenses
>
> ·   Veni, Vidi, Vicipaedia: Using the Latin Wikipedia in an Advanced
> Latin Classroom
>
> ·   WikInfoboxer: A Tool to Create Wikipedia Infoboxes Using Dbpedia
>
> ·   Wikipedia and participatory culture: Why fans edit
>
> ·   Writing for Wikipedia in the classroom: challenging official
> knowledge (a case study in 12th grade)
>
>
> If you have any question about the format or process feel free to get in
> touch off-list.
>
>
> Masssly, Tilman Bayer and Dario Taraborelli
>
>
> [1] http://meta.wikimedia.org/wiki/Research:Newsletter
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Morten Wang
I don't know of a clean, language-independent way of grabbing all stubs.
Stuart's suggestion is quite sensible, at least for English Wikipedia. When
I last checked a few years ago, the mean length of an English language stub
(on a log-scale) is around 1kB (including all markup), and they're quite
much smaller than any other class.

I'd also see if the category system allows for some straightforward
retrieval. English has
https://en.wikipedia.org/wiki/Category:Stub_categories and
https://en.wikipedia.org/wiki/Category:Stubs with quite a lot of links to
other languages, which could be a good starting point. For some of the
research we've done on quality, exploiting regularities in the category
system using database access (in other words, LIKE-queries), is a quick way
to grab most articles.

A combination of both approaches might be a good way. If you're looking for
even more thorough classification, grabbing a set and training a classifier
might be the way to go.


Cheers,
Morten


On 20 September 2016 at 02:40, Stuart A. Yeates  wrote:

> en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful
> cutoff. There is weaponised javascript to measure that at en:WP:Did you
> know/DYKcheck
>
> Probably doesn't translate to CJK languages which have radically different
> information content per character.
>
> cheers
> stuart
>
> --
> ...let us be heard from red core to black sky
>
> On Tue, Sep 20, 2016 at 9:26 PM, Robert West  wrote:
>
>> Hi everyone,
>>
>> Does anyone know if there's a straightforward (ideally
>> language-independent) way of identifying stub articles in Wikipedia?
>>
>> Whatever works is ok, whether it's publicly available data or data
>> accessible only on the WMF cluster.
>>
>> I've found lists for various languages (e.g., Italian
>>  or English
>> ), but the
>> lists are in different formats, so separate code is required for each
>> language, which doesn't scale.
>>
>> I guess in the worst case, I'll have to grep for the respective stub
>> templates in the respective wikitext dumps, but even this requires to know
>> for each language what the respective template is. So if anyone could point
>> me to a list of stub templates in different languages, that would also be
>> appreciated.
>>
>> Thanks!
>> Bob
>>
>> --
>> Up for a little language game? -- http://www.unfun.me
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] How to get the exact date when an article get a quality promotion?

2016-06-15 Thread Morten Wang
Hi Shiyue,

Whether you choose to use a set time period (e.g. 6 months like Kittur &
Kraut) or use assessment changes as your criteria, there are additional
factors you'll have to consider. If you use a set time period there are at
least three issues you'll need to consider: 1) do articles change quality
at the same pace?  2) how long before the start of your time period did an
article get its assessment?  3) what happened to your article between its
assessment and the start of your time period?

If you instead choose to use rating changes, you have the issue that those
happen at different times, so you'll have to control for the time lapsed
between them if you're comparing articles to each other, as well as perhaps
trying to figure out if an article has an inherent probability for change.
As long as you consider these types of related issues and control for them,
your approach should be sane.

I've put the code up on Github:
https://github.com/nettrom/assessments/blob/master/clean-training-set.py It
uses a few support files that are all in the same repository (
https://github.com/nettrom/assessments): assessment.py, db.py, and
revisions.py

Since I have a Tool Labs[1] account and Pywikibot[2] already set up, the
code is written to use the replicated databases for fetching revisions and
such, and Pywikibot as the library to interact with Wikipedia's API.
Neither of those are hard requirements, you can use the API instead of the
database access, and switch Pywikibot out with your favourite way of
accessing the API :)  It also uses mwparserfromhell[3] to parse the
wikitext.  I don't know of a better parser to use, but if you have one feel
free to use that instead.


References:
1: https://tools.wmflabs.org
2: https://www.mediawiki.org/wiki/Manual:Pywikibot
3: http://mwparserfromhell.readthedocs.io/en/latest/


Cheers,
Morten


On 13 June 2016 at 09:03, Shiyue Zhang  wrote:

> Hi Morten,
>
> Thanks a lot for your reply!!! I have read your paper: Tell me more: An
> actionable quality model for Wikipedia. Thanks for introducing me your
> another work in CSCW 2015, I will read it later.
>
> I saw your data. As you mentioned, it only has the revisions when the
> assessment changed. But, I prefer to get all of the revisions between 2
> assessment changes, since I want to study what makes the quality change and
> to predict the quality change. Before, I consider to adopt Kittur et al's
> formalization of quality changes in 6 months [1]. The problem is I cannot
> get the precise quality at the start and end point of 6-month period. Now I
> think I can take the period between 2 assessment changes, though it is
> also not a perfect answer, if articles are not regularly assessed, as Kerry
> and Andrew mentioned.
>
> I know you have a lot of experience in Wikipedia quality research. Could
> you give me some advices or references about the quality change study? And
> it cannot be more great if you could give me your Python code to get the
> data. I can modify it to get the data I need. Thanks a lot!
>
> References:
> Kittur A, Kraut R E. Harnessing the wisdom of crowds in wikipedia: quality
> through coordination[C]// ACM Conference on Computer Supported Cooperative
> Work. ACM, 2008:37-46.
>
> Cheers,
> Shiyue
>
>
>
>
> 2016-06-10 23:20 GMT+08:00 Morten Wang :
>
>> Hi Shiyue,
>>
>> The issues around assessments that have been brought up are valid and
>> useful to keep in mind when trying to build machine learners that do
>> quality predictions. That being said, ORES quality classifier[1] is (AFAIK)
>> trained on a dataset[2] that I've gathered based on the method I used to
>> get a dataset to train the classifier used in our CSCW 2015 paper[3]. The
>> revisions that are in that dataset were gathered by taking a snapshot of
>> the quality assessment classes and then walking backwards through the talk
>> page revision history to find the time when the assessment changed, and
>> then grabbing the revision of the article at that timestamp. If you want
>> Python code instead of the dataset, let me know.
>>
>> The team behind ORES has also been working on writing scripts that'll do
>> assessment extractions (see for instance [4]), in case you want to process
>> a dump and get all of them. So far our experience with that is that it
>> leads to slightly lower performance. Although we're uncertain as to why, my
>> guess is that the dataset is noisier, perhaps due to changing quality
>> criteria as Andrew points to.
>>
>> Please do get in touch if you have any questions!
>>
>> References:
>> 1: https://meta.wikimedia.org/wiki/ORES/wp10
>> 2:
>> https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Da

Re: [Wiki-research-l] How to get the exact date when an article get a quality promotion?

2016-06-10 Thread Morten Wang
Hi Shiyue,

The issues around assessments that have been brought up are valid and
useful to keep in mind when trying to build machine learners that do
quality predictions. That being said, ORES quality classifier[1] is (AFAIK)
trained on a dataset[2] that I've gathered based on the method I used to
get a dataset to train the classifier used in our CSCW 2015 paper[3]. The
revisions that are in that dataset were gathered by taking a snapshot of
the quality assessment classes and then walking backwards through the talk
page revision history to find the time when the assessment changed, and
then grabbing the revision of the article at that timestamp. If you want
Python code instead of the dataset, let me know.

The team behind ORES has also been working on writing scripts that'll do
assessment extractions (see for instance [4]), in case you want to process
a dump and get all of them. So far our experience with that is that it
leads to slightly lower performance. Although we're uncertain as to why, my
guess is that the dataset is noisier, perhaps due to changing quality
criteria as Andrew points to.

Please do get in touch if you have any questions!

References:
1: https://meta.wikimedia.org/wiki/ORES/wp10
2:
https://figshare.com/articles/English_Wikipedia_Quality_Asssessment_Dataset/1375406
3:
http://www-users.cs.umn.edu/~morten/publications/cscw2015-improvementprojects.pdf,
see Appendix A for info on the classifier
4:
https://github.com/wiki-ai/wikiclass/blob/master/wikiclass/extractors/enwiki.py

Cheers,
Morten


On 10 June 2016 at 00:59, Andrew Gray  wrote:

> Hi Shiyue,
>
> I agree with Kelly - these ratings probably won't do what you need, in
> that case. Sorry!
>
> We simply don't have the people (or the enthusiasm) required to do regular
> updates and I'd guess many are well over five years 'stale' since last
> rating - and most will only ever have been rated once.
>
> There's a second complicating factor for old ratings - not only are they
> stale, but the general standards for that rating might have changed. (See
> eg
> http://www.generalist.org.uk/blog/2010/quality-versus-age-of-wikipedias-featured-articles/
> for a demonstration of that last point - it would be interesting to use
> ORES to do a bigger sample)
>
> Andrew.
> On 10 Jun 2016 07:13, "Shiyue Zhang"  wrote:
>
>> Hi Kerry,
>>
>> Thanks a lot for your reply! Honestly, I am not aware of the problem you
>> mentioned that many wikiprojects don't do regular quality assessment. This
>> problem really matters to me, because I want to get the relatively true
>> quality of a revision of an article. I know Aaron's automated quality
>> assessment tool, but it is also based on a machine learning classifier,
>> which is also my goal to automatically predict quality, especially quality
>> change. So I can't take the results of this tool as my ground truth.
>>
>> 2016-06-10 12:16 GMT+08:00 Kerry Raymond :
>>
>>> If you are not aware of it, many wikiprojects don’t do any kind of
>>> regular quality assessment. Often an article is project-tagged and assessed
>>> when it’s new (which generally means the quality is assessed stub/start/C)
>>> and then it’s never re-assessed unless someone working on it is trying to
>>> get it to GA or similar and hence actively requests assessment.
>>>
>>>
>>>
>>> So it’s easy for an article to be much better quality (or even much
>>> worse quality, although that’s probably less likely) than its current
>>> assessment.
>>>
>>>
>>>
>>> I think you might do better to use Aaron’s automated quality assessment
>>> tool and apply it to different versions of a set of article and see how
>>> that changes over time. Whatever the deficiencies of an automated tool, I
>>> suspect it’s still more reliable than the human processes that we actually
>>> have. But I guess it depends on whether the focus of your study is the
>>> quality of articles or is it the process of assessing the quality of
>>> articles? My sense is that you are interested in the former rather than the
>>> latter.
>>>
>>>
>>>
>>> Kerry
>>>
>>>
>>>
>>> *From:* Wiki-research-l [mailto:
>>> wiki-research-l-boun...@lists.wikimedia.org] *On Behalf Of *Shiyue Zhang
>>> *Sent:* Friday, 10 June 2016 12:42 PM
>>> *To:* Research into Wikimedia content and communities <
>>> wiki-research-l@lists.wikimedia.org>
>>> *Subject:* Re: [Wiki-research-l] How to get the exact date when an
>>> article get a quality promotion?
>>>
>>>
>>>
>>> Hi Pine,
>>>
>>>
>>>
>>> Thanks for your reply. Yes, it is English Wikipedia. Exactly I want to
>>> get the timestamp of an article's quality rating change. I know
>>> the particular diffs shouldn't be considered as the reason why quality
>>> rating change. I'm trying to get a prediction of quality change beyond a
>>> certain time period, so I need the start and end quality of the time
>>> period.
>>>
>>>
>>>
>>> I hope anyone have the experience on this problem can give me some
>>> advice. Thanks a lot!!!
>>>
>>>
>>>
>>> 2016-06-10 9:47 GMT+08:00 Pine W 

Re: [Wiki-research-l] Any Norwegian academics writing about Wikipedia?

2015-10-22 Thread Morten Wang
Hi Laura,

The Norwegians Wikipedians maintain a list of research done by Norwegian
researchers and researchers based in Norway on
https://no.wikipedia.org/wiki/Wikipedia:Wikipedia-forskning

That list contains quite a few master's theses, a lot of papers studying
the utility of Wikipedia in education, all of my published work. I do not
know any Norwegian academics studying gender gap issues in Wikipedia.  You
might want to get in touch Astrid Carlsen at Wikimedia Norway (
ast...@wikimedia.no) and ask if she knows anyone, tell her I said hi.


Cheers,
Morten


On 22 October 2015 at 02:16, Laura Hale  wrote:

> Hey,
>
> I was wondering if any one on the list had any contacts with Norwegian
> academics doing research on Wikipedia, particularly from a gender gap
> perspective?
>
> Sincerely,
> Laura Hale
>
> --
> twitter: purplepopple
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Tool to find poorly written articles

2014-10-28 Thread Morten Wang
Apologies for being somewhat late to the party, our upcoming CSCW 2015
paper (coming soon to a research outlet near you!) took my attention, which
is kind of ironic, as in that paper our primary method of assessing quality
is a machine learner (we also use human assessments to confirm our results).

Earlier in the discussion, Aaron pointed to our WikiSym '13 paper[1].  Two
aspects of article quality that has been brought up in this discussion were
also on our mind when doing that work.  First, readability: Stvilia et
al[2] used Flesch-Kincaid[3] as part of one of their metrics.  In my work
I've found that it's not a particularly useful feature, it doesn't really
help discern the quality of an article.

Secondly, information about editors, e.g. edit counts, tenure, etc… These
features will typically help, for instance having a diverse set of editors
working on an article is associated with higher quality.  But, as we argue
in our 2013 paper, that is not a feature that it's easy to change, nor
something that it's easy to help someone change.  Same goes for a few other
features from the literature, e.g. number of edits or mean edits per day
("you should stop using the preview button and save all changes, even the
small ones, because that'll increase the quality of the article").  Instead
we argue for using features that editors can act upon, and then feed those
back into SuggestBot's set of article suggestions to assist editors in
finding articles that they want to contribute to.

Lastly, I'd like to mention that determining whether an article is
high-quality or not is a reasonably simple task, as it's a binary
classification problem.  This is where for instance word count or article
length have been shown to work well.  Nowadays I find the problem of
assessing quality on a finer-grained scale (e.g. English Wikipedia's
7-class assessment scale[4]) to be more interesting.

But, as James earlier touched on, "quality" is a many-faceted subject.
While computer approaches work well for measures like amount of content,
use of images, or citations, determining if the sources used are
appropriate is a much harder task.

Footnotes:
1: Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me more:
an actionable quality model for Wikipedia. In *Proceedings of the 9th
International Symposium on Open Collaboration* (p. 8). ACM.
http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf
2: ASSESSING INFORMATION QUALITY OF A COMMUNITY-BASED ENCYCLOPEDIA, by
Stvilia, Twidale, Smith, and Gasser, 2005
http://mitiq.mit.edu/ICIQ/Documents/IQ%20Conference%202005/Papers/AssessingIQofaCommunity-basedEncy.pdf
3: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
4: With the exception of A-class articles, as they're practically
nonexistent, and since they by definition are "complete", just like
Featured Articles, they shouldn't be A-class articles for long.


Regards,
Morten


On 28 October 2014 18:07, Aileen Oeberst  wrote:

> I am currently on vacation and will not be able to answer your mail before
> November 10. But I will get back then as soon as possible.
>
> Best regards, Aileen Oeberst
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Finding Quality Assessment Table Archive

2014-02-26 Thread Morten Wang
https://en.wikipedia.org/w/index.php?title=User:WP_1.0_bot/Tables/OverallArticles&action=history

I think you're interested in this one:
https://en.wikipedia.org/w/index.php?title=User:WP_1.0_bot/Tables/OverallArticles&oldid=588775788


Cheers,
Morten


On 25 February 2014 17:58, Ayukaev Vlad  wrote:

> Dear Peers,
>
> Do you know where can I find a quality assessment table (like this
> http://tools.wmflabs.org/enwp10/cgi-bin/table2.fcgi), but for other
> dates?
>
>
> Best,
> Vlad
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] bot-generated articles list?

2014-01-15 Thread Morten Wang
First two places I'd look are:
http://stats.wikimedia.org/EN/BotActivityMatrixCreates.htm
http://stats.wikimedia.org/EN/BotActivityMatrixEdits.htm


Cheers,
Morten



On 15 January 2014 14:15, Giovanni Luca Ciampaglia wrote:

> Hello everyone,
>
> is there a list bot-generated articles or, simpler, a list of known bots
> that have been responsible for the creation/renomination/merging of
> articles on enwiki?
>
> Cheers,
>
> --
> Giovanni Luca Ciampaglia
>
> Postdoctoral fellow
> Center for Complex Networks and Systems Research
> Indiana University
>
> ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408
> ☞ http://cnets.indiana.edu/
> ✉ gciam...@indiana.edu
> ✆ 1-812-855-7261
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread Morten Wang
Max,

With regards to quality assessment features, I recommend reading through
our paper from WikiSym this year:
http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf

The related work section contains quite a lot of the previous research on
predicting article quality, so there should be plenty of useful reading.
 As James points out, content and number of footnote references are a good
start.

There are a lot of dependencies when it comes to predicting article
quality.  If you're trying to predict High quality vs everything else, the
task isn't overly difficult.  Otherwise it could be more challenging, for
instance there are quite a bit of difference between the FAs and GAs on
English Wikipedia, and in your case you'll probably find the A-class
articles mess things up because their length tends to be somewhere between
the other two and they're of high quality.  I'm currently of the opinion
that an A-class article is simply an FAC that hasn't been submitted for FA
review yet.

You might of course run into problems with different citation traditions if
you're working across language editions.  English uses footnotes heavily,
others might instead use bibliography sections and not really cite specific
claims in the article text. (An issue we mention in our article when we
tried to get our model to work on Norwegian (bokmål) and Swedish Wikipedia).

My $.02, if you'd like to discuss this more, feel free to get in touch.


Cheers,
Morten




On 15 December 2013 07:15, Klein,Max  wrote:

>  Wiki Research Junkies,
>
> I am investigating the comparative quality of articles about  Cote
> d'Ivoire and Uganda versus other countries. I wanted to answer the question
> of what makes high-quality articles? Can anyone point me to any existing
> research on heuristics of Article Quality? That is, determining an articles
> quality by the wikitext properties, without human rating? I would also
> consider using data from the Article Feedback Tools, if there were dumps
> available for each Article in English, French, and Swahili Wikipedias.
> This is all the raw data I can seem to find
> http://toolserver.org/~dartar/aft5/dumps/
>
> The heuristic technique that I currently using is training a naive
> Bayesian filter based on:
>
>-
>
>Per Section.
> -
>
>   Text length in each section
>   -
>
>   Infoboxes in each section.
>-
>
>  Filled parameters in each infobox
>   -
>
>   Images in each section
>-
>
>Good Article, Featured Article?
>-
>
>Then Normalize on Page Views per on population / speakers of native
>language
>
> Can you also think of any other dimensions or heuristics to
> programatically rate?
>
>
>  Best,
>   Maximilian Klein
> Wikipedian in Residence, OCLC
> +17074787023
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Best papers on Wikipedia and democracy (or Internet communities and democracy)

2013-09-28 Thread Morten Wang
I like Lam et al's work on deletion decisions in the English Wikipedia: The
Effects of Group Composition on Decision Quality in a Social Production
Community http://www.grouplens.org/node/450


Cheers,
Morten



On 28 September 2013 07:56, Piotr Konieczny  wrote:

>  Hi everyone,
>
> I am doing a lit review on the topic of democratic decision making on
> Wikipedia. I wonder - what are your favorite papers on this subject?
>
> So far the most extensive discussions I've found are
>
> Black, Laura, Ted Welser, Jocely DeGroot, and Daniel Cosley. 2008
> "Wikipedia is not a democracy”: Deliberation and policy-making in an online
> community."
> Hilbert, Martin. 2009. The Maturing Concept of E-Democracy: From E-Voting
> and Online Consultations to Democratic Value Out of Jumbled Online Chatter
> Klemp. Nathaniel J. 2010. From Town-Halls to Wikis: Exploring Wikipedia's
> Implications for Deliberative Democracy.
> Reagle's 2010 book subchapter on "Polling and Voting".
> Firer-Blaess, Sylvain 2011. Wikipedia: an Example for Electronic
> Democracy? Decision, Discipline and Discourse in the Collaborative
> Encyclopedia
>
> What did I miss?
>
> In the broader scope, I'd also appreciate suggestions as to the best
> readings in the area of Internet communities and democracy. To be more
> precise, let me stress the word community here. The literature in
> e-democracy and related terms is of course very broad, but I am interested
> in studies of how online communities (like Wikipedia) make
> (quasi?)democratic decisions. Wikipedians vote, and Wikimedians in general
> do as well. How unique are they (are we...) in this? Who else has such
> votes? Redditors? Slashdotians? Other groups? What are the turnouts,
> trends? Would appreciate any information that comes to mind.
>
> --
> Piotr Konieczny, 
> PhDhttp://hanyang.academia.edu/PiotrKoniecznyhttp://scholar.google.com/citations?user=gdV8_AEJhttp://en.wikipedia.org/wiki/User:Piotrus
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] date formats in various WPs

2013-01-29 Thread Morten Wang
How about using the API for this?  It appears to always return timestamps
in UTC, and they're in a standardised format, e.g:

http://en.wikipedia.org/w/api.php?action=query&titles=Physics&prop=info|revisions&rvlimit=50
http://no.wikipedia.org/w/api.php?action=query&titles=Fysikk&prop=info|revisions&rvlimit=50
http://ja.wikipedia.org/w/api.php?action=query&titles=%E7%89%A9%E7%90%86%E5%AD%A6&prop=info|revisions&rvlimit=50

For more documentation see http://en.wikipedia.org/w/api.php , specifically
the sections for "action=query" and "prop=revisions".


Cheers,
Morten



On 29 January 2013 02:15,  wrote:

> Hi @all,
>
> do you have any idea how to unify date formats in various WPs via URL?
>
> my aim is to compare revision date/time from different WP versions
> and it would be great to have the same date format for every version of WP
> that I am looking at.
>
> Does anyone know a solution for the Wikipedias that do not offer the
> format I
> consider most useful, namely the format starting with 2013-...?
>
> I am seeking a solution via URL, i.e. one that can be used (and
> replicated) by
> any user who has no extra rights or any particular database query
> expertise for
> the WP universe.
>
> for a previous exchange on this topic see
> http://meta.wikimedia.org/wiki/Wikimedia_Forum#Date_formats_in_various_
> Wikipedias
>
> thanks & cheers,
> Claudia
> koltzenb...@w4w.net
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Are there any stats on activity of editors compared to the population?

2012-05-10 Thread Morten Wang
The majority of registered accounts on en-WP are likely to never have
edited (some of them are perhaps autocreated by someone who's home
Wikipedia is not English), or they edited an article that has since
been deleted, and are therefore listed with no edits.  In October 2010
I gathered some data for en-WP, and found that for the users who
registered in January 2009, 68.15% of them were at that time listed
with 0 edits (across all namespaces).  Unless you're also looking at
edits to deleted articles/pages, it might be difficult to go beyond a
third of the user base.


Regards,
Morten

On 10 May 2012 10:35, Piotr Konieczny  wrote:
> Thanks for the link. The figure 4,058,477 you cite (from
> http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution), as
> you note, comes with the warning that "Only article edits are counted, not
> edits on discussion pages, etc". I assume this is why the magic word
> NUMBEROFUSERS at en Wikipedia returns 16,763,691 (numerous low activity
> editors apparently make their few edits outside article mainspace).
>
> The breakdown I could live with, for a while, but the fact that this stat
> covers only about a quarter of registered accounts is a problem. Is anybody
> familiar with a way to achieve a breakdown of all named accounts with 1+
> edit (for English Wikipedia), no matter which namespace they edited?
> Preferably with more flexible ranges than the ones in that table?
>
> In other words, the linked page provides "Distribution of article
> [namespace] edits over registered editors", whereas I am interested in
> "Distribution of [all] namespaces edits over registered editors".
>
> --
> Piotr Konieczny
>
> "To be defeated and not submit, is victory; to be victorious and rest on
> one's laurels, is defeat." --Józef Pilsudski
>
>
> On 5/10/2012 4:49 AM, WereSpielChequers wrote:
>
> I'm not sure that we have exactly what your asking for.
>
> For example we have the figure of 4,058,477 but that is for registered
> accounts on the English Wikipedia that have made at least one edit to an
> article. Different language versions of Wikipedia are also available, but of
> course registered accounts doesn't exactly tally with Wikipedians not least
> because IP editors are excluded. Also I believe that early edits - pre 2004
> may not be available and I suspect that deleted edits may not be counted.
>
> That said we have further stats of 1,614,938 registered accounts with >= 3
> article edits and 772,557 >=10
>
> So http://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution is
> well worth looking at, but they break at 32 and 100 not 50 which may be a
> problem for you.
>
> Hope that helps
>
> WSC
>
> On 9 May 2012 23:42, Piotr Konieczny  wrote:
>>
>> I was looking at official stats, but I seem to be unable to find out an
>> answer to the following question:
>> * how many of Wikipedia editors have X edits (or fall within a range of
>> edits)
>> To be more precise, I am curious how many Wikipedians have:
>> * exactly 1 edit
>> * between 2-9 edits
>> * between 10-50 edits
>> I know that the total number of registered accounts is reported at
>> http://en.wikipedia.org/wiki/Wikipedia:Wikipedians
>>
>> Can anybody direct me to the right page/counter that would allow me to
>> obtain the above information? I hope it is obtainable without having to
>> download the dump...
>>
>> Incidentally, if anybody has those numbers, in addition to replying here
>> feel free to add the information and/or source the one present at
>> http://en.wikipedia.org/wiki/Wikipedia:Wikipedians
>>
>> Thanks,
>>
>> --
>> Piotr Konieczny
>> PhD Candidate
>> Dept of Sociology
>> Uni of Pittsburgh
>>
>> http://pittsburgh.academia.edu/PiotrKonieczny/
>> http://en.wikipedia.org/wiki/User:Piotrus
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] List of spam words

2010-07-01 Thread Morten Wang
On Thu, Jul 1, 2010 at 12:03 PM, S. Nunes  wrote:
> I've been working on a vandalism detection tool for Wikipedia and I am
> currently looking for a list of spam words.
> Basically, I am looking for a list of terms typically associated with
> vandalism or spam.
> Is anybody aware of such resource?

One place to look would be the ClueBot source, both for a list of
words as well as some heuristics they use to battle certain typical
vandalism cases: http://en.wikipedia.org/wiki/User:ClueBot/Source



Cheers,
Morten

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l