Re: [Wiki-research-l] Tool to find poorly written articles

Morten Wang Tue, 28 Oct 2014 21:54:07 -0700

Apologies for being somewhat late to the party, our upcoming CSCW 2015
paper (coming soon to a research outlet near you!) took my attention, which
is kind of ironic, as in that paper our primary method of assessing quality
is a machine learner (we also use human assessments to confirm our results).

Earlier in the discussion, Aaron pointed to our WikiSym '13 paper[1].  Two
aspects of article quality that has been brought up in this discussion were
also on our mind when doing that work.  First, readability: Stvilia et
al[2] used Flesch-Kincaid[3] as part of one of their metrics.  In my work
I've found that it's not a particularly useful feature, it doesn't really
help discern the quality of an article.

Secondly, information about editors, e.g. edit counts, tenure, etc… These
features will typically help, for instance having a diverse set of editors
working on an article is associated with higher quality.  But, as we argue
in our 2013 paper, that is not a feature that it's easy to change, nor
something that it's easy to help someone change.  Same goes for a few other
features from the literature, e.g. number of edits or mean edits per day
("you should stop using the preview button and save all changes, even the
small ones, because that'll increase the quality of the article").  Instead
we argue for using features that editors can act upon, and then feed those
back into SuggestBot's set of article suggestions to assist editors in
finding articles that they want to contribute to.

Lastly, I'd like to mention that determining whether an article is
high-quality or not is a reasonably simple task, as it's a binary
classification problem.  This is where for instance word count or article
length have been shown to work well.  Nowadays I find the problem of
assessing quality on a finer-grained scale (e.g. English Wikipedia's
7-class assessment scale[4]) to be more interesting.

But, as James earlier touched on, "quality" is a many-faceted subject.
While computer approaches work well for measures like amount of content,
use of images, or citations, determining if the sources used are
appropriate is a much harder task.

Footnotes:
1: Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me more:
an actionable quality model for Wikipedia. In *Proceedings of the 9th
International Symposium on Open Collaboration* (p. 8). ACM.
http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf
2: ASSESSING INFORMATION QUALITY OF A COMMUNITY-BASED ENCYCLOPEDIA, by
Stvilia, Twidale, Smith, and Gasser, 2005
http://mitiq.mit.edu/ICIQ/Documents/IQ%20Conference%202005/Papers/AssessingIQofaCommunity-basedEncy.pdf
3: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests
4: With the exception of A-class articles, as they're practically
nonexistent, and since they by definition are "complete", just like
Featured Articles, they shouldn't be A-class articles for long.

Regards,
Morten

On 28 October 2014 18:07, Aileen Oeberst <a.oebe...@iwm-kmrc.de> wrote:

> I am currently on vacation and will not be able to answer your mail before
> November 10. But I will get back then as soon as possible.
>
> Best regards, Aileen Oeberst
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Tool to find poorly written articles

Reply via email to