Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

Maik Anderka Mon, 16 Dec 2013 00:33:57 -0800

Hi!

Oliver already mentioned my dissertation [3] on analyzing and predictingquality flaws in Wikipedia. Instead of classifying articles into somequality grading scheme (e.g. featured, non-featured etc.), the mainidea is to investigate specific quality ﬂaws, and thus providingindications of the respects in which low-quality content needsimprovement. We proposed this idea in [1] and pushed it further in [2].The second paper comprises a listing of more than 100 article features(heuristics) that have been used in previous research on automatedquality assessment in Wikipedia. An in-depth description andimplementation details of these features can be found in my dissertation[3] (Appendix B).


Best regards,
Maik

[1] Maik Anderka, Benno Stein, and Nedim Lipka. Towards AutomaticQuality Assurance in Wikipedia. In Proceedings of the 20th InternationalConference on World Wide Web (WWW 2011), Hyderabad, India, pages 5-6,2011. ACM.

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011d.pdf

[2] Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flawsin User-generated Content: The Case of Wikipedia. In Proceedings of the35th International ACM SIGIR Conference on Research and Development inInformation Retrieval (SIGIR 2012), Portland, USA, pages 981-990, 2012. ACM.

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012i.pdf

[3] Maik Anderka. Analyzing and Predicting Quality Flaws inUser-generated Content: The Case of Wikipedia. Dissertation,Bauhaus-Universität Weimar, June 2013.

http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf


On 15.12.2013 20:22, Oliver Ferschke wrote:

Hello everybody,
I've been doing quite some work on article quality in Wikipedia - manyheuristics have been mentioned here already.In my opinion, a set of universal indicators for quality that worksfor all of Wikipedia does not exist.This is mainly because the perception of quality is so differentacross various WikiProjects and subject areas in a single Wikipediaand even more so across different Wikipedia language versions.On a theoretical level, some universals can be identified. But as soonas concrete heuristics are to be identified, you will always have abias towards the articles you used to identify these heuristics.
This aspect aside, having an abstract quality score that tells you howgood an article is according to your heuristics doesn't help a lot inmost cases.I much more like the approach to identify quality problems, which alsogives you an idea of the quality of an article.I have done some work on this [1], [2] and there was a recentdissertation on the same topic [3].
I'm currently writing my dissertation on language technology methodsto assist quality management in collaborative environments likeWikipedia. There, I start with a theoretical model, but as soon as theconcrete heuristics come in to play, the model has to be groundedaccording to the concrete quality standards that have been establishedin a particular sub-community of Wikipedia. I'm still wrapping up mywork, but if anybody wants to talk, I'll be happy to.
Regards,
Oliver


[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: Proceedings of the 51st Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers). p. 721-730, August2013. Sofia, Bulgaria.
[2] FlawFinder: A Modular System for Predicting Quality Flaws inWikipedia - Notebook for PAN at CLEF 2012
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September2012. Rome, Italy.
[3] Analyzing and Predicting Quality Flaws in User-generated Content:The Case of Wikipedia.
Maik Anderka
Dissertation, Bauhaus-Universität Weimar, June 2013

--
-------------------------------------------------------------------
Oliver Ferschke, M.A.
Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TU DA)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
fersc...@cs.tu-darmstadt.de
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------
------------------------------------------------------------------------
*Von:* wiki-research-l-boun...@lists.wikimedia.org[wiki-research-l-boun...@lists.wikimedia.org]" im Auftrag von"WereSpielChequers [werespielchequ...@gmail.com]
*Gesendet:* Sonntag, 15. Dezember 2013 14:27
*An:* Research into Wikimedia content and communities
*Betreff:* Re: [Wiki-research-l] Existitng Research on Article QualityHeuristics?
Re Laura's comment.
I don't dispute that there are plenty of high quality articles whichhave had only one or two contributors. However my assumption andexperience is that in general the more editors the better the quality,and I'd love to see that assumption tested by research. There may besome maximum above which quality does not rise, and there are clearlya number of gifted members of the community whose work is as good asour best crowdsourced work, especially when the crowdsourcing elementis to address the minor imperfection that comes from their own blindspot. It would be well worthwhile to learn if Women's football is anexception to this, or indeed if my own confidence in crowd sourcing ismistaken
I should also add that while I wouldn't filter out minor edits youmight as well filter out reverted edits and their reversion. Some ofour articles are notorious vandal targets and their quality is usuallyunaffected by a hundred vandalisms and reversions of vandalism perannum. Beaver before it was semi protected in Autumn 2011<https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=history>being a case in point. This also feeds into Kerry's point that manyassessments are outdated. An article that has been a vandalism targetmight have been edited a hundred times since it was assessed, and yetit is likely to have changed less than one with only half a dozenedits all of which added content.
Jonathan
On 15 December 2013 09:44, Laura Hale <la...@fanhistory.com<mailto:la...@fanhistory.com>> wrote:
    On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers
    <werespielchequ...@gmail.com <mailto:werespielchequ...@gmail.com>>
    wrote:

        Re other dimensions or heuristics:

        Very few articles are rated as Featured, and not that many as
        Good, if you are going to usethat rating system
        
<https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment>
        I'd suggest also including the lower levels, and indeed
        whether an article has been assessed and typically how long it
        takes for a new article to be assessed. Uganda for example has
        1 Featured article, 3 Good Articles and nearly 400 unassessed
        on the English language Wikipedia
        <https://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content>.

        For a crowd sourced project like Wikipedia the size of the
        crowd is crucial and varies hugely per article. So I'd suggest
        counting the number of different editors other than bots who
        have contributed to the article.


    Except why would this be something that would be an indicator of
    quality?  I've done an analysis recently of football player
    biographies where I looked at the total volume of edits, date
    created, total number of citations and total number of pictures
    and none of these factors correlates to article quality.  You can
    have an article with 1,400 editors and still have it be assessed
    as a start.  Indeed, some of the lesser known articles may
    actually attract specialist contributors who almost exclusively
    write to one topic and then take the article to DYK, GA, A or FA.
     The end result is you have articles with low page views that are
    really great that are maintained by one or two writers.



    >Whether or not a Wikipedia article has references is a quality
    dimension you might want to look at. At least on EN it is widely
    assumed to
    >be a measure of quality, though I don't recall ever seeing a
    study of the relative accuracy of cited and uncited Wikipedia
    information.

    Yeah, I'd be skeptical of this overall though it might be bad.
     The problem is you could get say one contentious section of the
    article that ends up fully cited or overcited while the rest of
    the article ends up poorly cited.  At the same time, you can get B
    articles that really should be GAs but people have been burned by
    that process so they just take it to B and left it there.  I have
    heard this quite a few time from female Wikipedians operating in
    certain places that the process actually puts them off.
--twitter: purplepopple
    blog: ozziesport.com <http://ozziesport.com/>

    _______________________________________________
    Wiki-research-l mailing list
    Wiki-research-l@lists.wikimedia.org
    <mailto:Wiki-research-l@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


**


_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

Reply via email to