Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-16 Thread Klein,Max
Hello Everybody,

Thank so much for fantastic suggestions.



Morten,

Thank you for the tell me more paper, those kind of features were exactly what 
I was looking for. I will report my results to let you know how they compare.


Maik,

Thanks for introducing the idea of flaw-based assessment. I will see which are 
the most frequent clean-up tags I come across in the many different languages, 
I hadn't thought of using flaws as actionable features.


Laura,

Your conclusions about football player biographies are instructive, I will see 
if diversity of editorship relates to quality in country articles.


Oliver,

Thanks for the warning about topic bias, I see that the problem affects 
Wikipedias as a whole. Since I am only looking at articles in a specific 
category at a time, I think my guiding assumption is that topic bias is not an 
issue inter-category, but it is good to keep in mind.


Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023



From: wiki-research-l-boun...@lists.wikimedia.org 
 on behalf of Maik Anderka 

Sent: Monday, December 16, 2013 12:33 AM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

Hi!

Oliver already mentioned my dissertation [3] on analyzing and predicting 
quality flaws in Wikipedia. Instead of classifying articles into some quality 
grading scheme (e.g.  featured, non-featured etc.), the main idea is to 
investigate specific quality flaws, and thus providing indications of the 
respects in which  low-quality content needs improvement. We proposed this idea 
in [1] and pushed it further in [2]. The second paper comprises a listing of 
more than 100 article features (heuristics) that have been used in previous 
research on automated quality assessment in Wikipedia. An in-depth description 
and implementation details of these features can be found in my dissertation 
[3] (Appendix B).

Best regards,
Maik

[1] Maik Anderka, Benno Stein, and Nedim Lipka. Towards Automatic Quality 
Assurance in Wikipedia. In Proceedings of the 20th International Conference on 
World Wide Web (WWW 2011), Hyderabad, India, pages 5-6, 2011. ACM.
http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011d.pdf

[2] Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flaws in 
User-generated Content: The Case of Wikipedia. In Proceedings of the 35th 
International ACM SIGIR Conference on Research and Development in Information 
Retrieval (SIGIR 2012), Portland, USA, pages 981-990, 2012. ACM.
http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012i.pdf

[3] Maik Anderka. Analyzing and Predicting Quality Flaws in User-generated 
Content: The Case of Wikipedia. Dissertation, Bauhaus-Universität Weimar, June 
2013.
http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf


On 15.12.2013 20:22, Oliver Ferschke wrote:
Hello everybody,

I've been doing quite some work on article quality in Wikipedia - many 
heuristics have been mentioned here already.
In my opinion, a set of universal indicators for quality that works for all of 
Wikipedia does not exist.
This is mainly because the perception of quality is so different across various 
WikiProjects and subject areas in a single Wikipedia and even more so across 
different Wikipedia language versions.
On a theoretical level, some universals can be identified. But as soon as 
concrete heuristics are to be identified, you will always have a bias towards 
the articles you used to identify these heuristics.

This aspect aside, having an abstract quality score that tells you how good an 
article is according to your heuristics doesn't help a lot in most cases.
I much more like the approach to identify quality problems, which also gives 
you an idea of the quality of an article.
I have done some work on this [1], [2] and there was a recent dissertation on 
the same topic [3].

I'm currently writing my dissertation on language technology methods to assist 
quality management in collaborative environments like Wikipedia. There, I start 
with a theoretical model, but as soon as the concrete heuristics come in to 
play, the model has to be grounded according to the concrete quality standards 
that have been established in a particular sub-community of Wikipedia. I'm 
still wrapping up my work, but if anybody wants to talk, I'll be happy to.

Regards,
Oliver


[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: Proceedings of the 51st Annual Meeting of the Association for Computational 
Linguistics (Volume 1: Long Papers). p. 721-730, August 2013. Sofia, Bulgaria.

[2] FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - 
Notebook for PAN at CLEF 2012
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In:  CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 2012. 
R

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-16 Thread Maik Anderka

Hi!

Oliver already mentioned my dissertation [3] on analyzing and predicting 
quality flaws in Wikipedia. Instead of classifying articles into some 
quality grading scheme (e.g.  featured, non-featured etc.), the main 
idea is to investigate specific quality flaws, and thus providing 
indications of the respects in which low-quality content needs 
improvement. We proposed this idea in [1] and pushed it further in [2]. 
The second paper comprises a listing of more than 100 article features 
(heuristics) that have been used in previous research on automated 
quality assessment in Wikipedia. An in-depth description and 
implementation details of these features can be found in my dissertation 
[3] (Appendix B).


Best regards,
Maik

[1] Maik Anderka, Benno Stein, and Nedim Lipka. Towards Automatic 
Quality Assurance in Wikipedia. In Proceedings of the 20th International 
Conference on World Wide Web (WWW 2011), Hyderabad, India, pages 5-6, 
2011. ACM.

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011d.pdf

[2] Maik Anderka, Benno Stein, and Nedim Lipka. Predicting Quality Flaws 
in User-generated Content: The Case of Wikipedia. In Proceedings of the 
35th International ACM SIGIR Conference on Research and Development in 
Information Retrieval (SIGIR 2012), Portland, USA, pages 981-990, 2012. ACM.

http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012i.pdf

[3] Maik Anderka. Analyzing and Predicting Quality Flaws in 
User-generated Content: The Case of Wikipedia. Dissertation, 
Bauhaus-Universität Weimar, June 2013.

http://www.uni-weimar.de/medien/webis/publications/papers/anderka_2013.pdf


On 15.12.2013 20:22, Oliver Ferschke wrote:

Hello everybody,

I've been doing quite some work on article quality in Wikipedia - many 
heuristics have been mentioned here already.
In my opinion, a set of universal indicators for quality that works 
for all of Wikipedia does not exist.
This is mainly because the perception of quality is so different 
across various WikiProjects and subject areas in a single Wikipedia 
and even more so across different Wikipedia language versions.
On a theoretical level, some universals can be identified. But as soon 
as concrete heuristics are to be identified, you will always have a 
bias towards the articles you used to identify these heuristics.


This aspect aside, having an abstract quality score that tells you how 
good an article is according to your heuristics doesn't help a lot in 
most cases.
I much more like the approach to identify quality problems, which also 
gives you an idea of the quality of an article.
I have done some work on this [1], [2] and there was a recent 
dissertation on the same topic [3].


I'm currently writing my dissertation on language technology methods 
to assist quality management in collaborative environments like 
Wikipedia. There, I start with a theoretical model, but as soon as the 
concrete heuristics come in to play, the model has to be grounded 
according to the concrete quality standards that have been established 
in a particular sub-community of Wikipedia. I'm still wrapping up my 
work, but if anybody wants to talk, I'll be happy to.


Regards,
Oliver


[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: Proceedings of the 51st Annual Meeting of the Association for 
Computational Linguistics (Volume 1: Long Papers). p. 721-730, August 
2013. Sofia, Bulgaria.


[2] FlawFinder: A Modular System for Predicting Quality Flaws in 
Wikipedia - Notebook for PAN at CLEF 2012

Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In:  CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 
2012. Rome, Italy.


[3] Analyzing and Predicting Quality Flaws in User-generated Content: 
The Case of Wikipedia.

Maik Anderka
Dissertation, Bauhaus-Universität Weimar, June 2013

--
---
Oliver Ferschke, M.A.
Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TU DA)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
fersc...@cs.tu-darmstadt.de
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
---

*Von:* wiki-research-l-boun...@lists.wikimedia.org 
[wiki-research-l-boun...@lists.wikimedia.org]" im Auftrag von 
"WereSpielChequers [werespielchequ...@gmail.com]

*Gesendet:* Sonntag, 15. Dezember 2013 14:27
*An:* Research into Wikimedia content and communities
*Betreff:* Re: [Wiki-research-l] Existitng Research on Article Quality 
Heuristics?


Re Laura's comment.

I don't dispute that there are plenty of high quality articles which 
have had only o

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread Oliver Ferschke
Hello everybody,

I've been doing quite some work on article quality in Wikipedia - many 
heuristics have been mentioned here already.
In my opinion, a set of universal indicators for quality that works for all of 
Wikipedia does not exist.
This is mainly because the perception of quality is so different across various 
WikiProjects and subject areas in a single Wikipedia and even more so across 
different Wikipedia language versions.
On a theoretical level, some universals can be identified. But as soon as 
concrete heuristics are to be identified, you will always have a bias towards 
the articles you used to identify these heuristics.

This aspect aside, having an abstract quality score that tells you how good an 
article is according to your heuristics doesn't help a lot in most cases.
I much more like the approach to identify quality problems, which also gives 
you an idea of the quality of an article.
I have done some work on this [1], [2] and there was a recent dissertation on 
the same topic [3].

I'm currently writing my dissertation on language technology methods to assist 
quality management in collaborative environments like Wikipedia. There, I start 
with a theoretical model, but as soon as the concrete heuristics come in to 
play, the model has to be grounded according to the concrete quality standards 
that have been established in a particular sub-community of Wikipedia. I'm 
still wrapping up my work, but if anybody wants to talk, I'll be happy to.

Regards,
Oliver


[1] The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In: Proceedings of the 51st Annual Meeting of the Association for Computational 
Linguistics (Volume 1: Long Papers). p. 721-730, August 2013. Sofia, Bulgaria.

[2] FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia - 
Notebook for PAN at CLEF 2012
Oliver Ferschke and Iryna Gurevych and Marc Rittberger
In:  CLEF 2012 Labs and Workshop, Notebook Papers, n. pag. September 2012. 
Rome, Italy.

[3] Analyzing and Predicting Quality Flaws in User-generated Content: The Case 
of Wikipedia.
Maik Anderka
Dissertation, Bauhaus-Universität Weimar, June 2013

--
---
Oliver Ferschke, M.A.
Doctoral Researcher
Ubiquitous Knowledge Processing Lab (UKP-TU DA)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
fersc...@cs.tu-darmstadt.de
www.ukp.tu-darmstadt.de
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
---

Von: wiki-research-l-boun...@lists.wikimedia.org 
[wiki-research-l-boun...@lists.wikimedia.org]" im Auftrag von 
"WereSpielChequers [werespielchequ...@gmail.com]
Gesendet: Sonntag, 15. Dezember 2013 14:27
An: Research into Wikimedia content and communities
Betreff: Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

Re Laura's comment.

I don't dispute that there are plenty of high quality articles which have had 
only one or two contributors. However my assumption and experience is that in 
general the more editors the better the quality, and I'd love to see that 
assumption tested by research. There may be some maximum above which quality 
does not rise, and there are clearly a number of gifted members of the 
community whose work is as good as our best crowdsourced work, especially when 
the crowdsourcing element is to address the minor imperfection that comes from 
their own blind spot. It would be well worthwhile to learn if Women's football 
is an exception to this, or indeed if my own confidence in crowd sourcing is 
mistaken

I should also add that while I wouldn't filter out minor edits you might as 
well filter out reverted edits and their reversion. Some of our articles are 
notorious vandal targets and their quality is usually unaffected by a hundred 
vandalisms and reversions of vandalism per annum. Beaver before it was semi 
protected in Autumn 
2011<https://en.wikipedia.org/w/index.php?title=Beaver&offset=20111211084232&action=history>
 being a case in point. This also feeds into Kerry's point that many 
assessments are outdated. An article that has been a vandalism target might 
have been edited a hundred times since it was assessed, and yet it is likely to 
have changed less than one with only half a dozen edits all of which added 
content.

Jonathan


On 15 December 2013 09:44, Laura Hale 
mailto:la...@fanhistory.com>> wrote:

On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers 
mailto:werespielchequ...@gmail.com>> wrote:
Re other dimensions or heuristics:

Very few articles are rated as Featured, and not that many as Good, if you are 
going to use that rating 
system<https://en.w

Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread WereSpielChequers
Re Laura's comment.

I don't dispute that there are plenty of high quality articles which have
had only one or two contributors. However my assumption and experience is
that in general the more editors the better the quality, and I'd love to
see that assumption tested by research. There may be some maximum above
which quality does not rise, and there are clearly a number of gifted
members of the community whose work is as good as our best crowdsourced
work, especially when the crowdsourcing element is to address the minor
imperfection that comes from their own blind spot. It would be well
worthwhile to learn if Women's football is an exception to this, or indeed
if my own confidence in crowd sourcing is mistaken

I should also add that while I wouldn't filter out minor edits you might as
well filter out reverted edits and their reversion. Some of our articles
are notorious vandal targets and their quality is usually unaffected by a
hundred vandalisms and reversions of vandalism per annum. Beaver before it
was semi protected in Autumn
2011being
a case in point. This also feeds into Kerry's point that many
assessments are outdated. An article that has been a vandalism target might
have been edited a hundred times since it was assessed, and yet it is
likely to have changed less than one with only half a dozen edits all of
which added content.

Jonathan


On 15 December 2013 09:44, Laura Hale  wrote:

>
> On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers <
> werespielchequ...@gmail.com> wrote:
>
>> Re other dimensions or heuristics:
>>
>> Very few articles are rated as Featured, and not that many as Good, if
>> you are going to use that rating 
>> systemI'd
>>  suggest also including the lower levels, and indeed whether an article
>> has been assessed and typically how long it takes for a new article to be
>> assessed. Uganda for example has 1 Featured article, 3 Good Articles and
>> nearly 400 unassessed on the English language 
>> Wikipedia
>> .
>>
>> For a crowd sourced project like Wikipedia the size of the crowd is
>> crucial and varies hugely per article. So I'd suggest counting the number
>> of different editors other than bots who have contributed to the article.
>>
>
> Except why would this be something that would be an indicator of quality?
>  I've done an analysis recently of football player biographies where I
> looked at the total volume of edits, date created, total number of
> citations and total number of pictures and none of these factors correlates
> to article quality.  You can have an article with 1,400 editors and still
> have it be assessed as a start.  Indeed, some of the lesser known articles
> may actually attract specialist contributors who almost exclusively write
> to one topic and then take the article to DYK, GA, A or FA.  The end result
> is you have articles with low page views that are really great that are
> maintained by one or two writers.
>
>
>
> >Whether or not a Wikipedia article has references is a quality dimension
> you might want to look at. At least on EN it is widely assumed to
> >be a measure of quality, though I don't recall ever seeing a study of the
> relative accuracy of cited and uncited Wikipedia information.
>
> Yeah, I'd be skeptical of this overall though it might be bad.  The
> problem is you could get say one contentious section of the article that
> ends up fully cited or overcited while the rest of the article ends up
> poorly cited.  At the same time, you can get B articles that really should
> be GAs but people have been burned by that process so they just take it to
> B and left it there.  I have heard this quite a few time from female
> Wikipedians operating in certain places that the process actually puts them
> off.
>
> --
> twitter: purplepopple
> blog: ozziesport.com
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread Morten Wang
Max,

With regards to quality assessment features, I recommend reading through
our paper from WikiSym this year:
http://www-users.cs.umn.edu/~morten/publications/wikisym2013-tellmemore.pdf

The related work section contains quite a lot of the previous research on
predicting article quality, so there should be plenty of useful reading.
 As James points out, content and number of footnote references are a good
start.

There are a lot of dependencies when it comes to predicting article
quality.  If you're trying to predict High quality vs everything else, the
task isn't overly difficult.  Otherwise it could be more challenging, for
instance there are quite a bit of difference between the FAs and GAs on
English Wikipedia, and in your case you'll probably find the A-class
articles mess things up because their length tends to be somewhere between
the other two and they're of high quality.  I'm currently of the opinion
that an A-class article is simply an FAC that hasn't been submitted for FA
review yet.

You might of course run into problems with different citation traditions if
you're working across language editions.  English uses footnotes heavily,
others might instead use bibliography sections and not really cite specific
claims in the article text. (An issue we mention in our article when we
tried to get our model to work on Norwegian (bokmål) and Swedish Wikipedia).

My $.02, if you'd like to discuss this more, feel free to get in touch.


Cheers,
Morten




On 15 December 2013 07:15, Klein,Max  wrote:

>  Wiki Research Junkies,
>
> I am investigating the comparative quality of articles about  Cote
> d'Ivoire and Uganda versus other countries. I wanted to answer the question
> of what makes high-quality articles? Can anyone point me to any existing
> research on heuristics of Article Quality? That is, determining an articles
> quality by the wikitext properties, without human rating? I would also
> consider using data from the Article Feedback Tools, if there were dumps
> available for each Article in English, French, and Swahili Wikipedias.
> This is all the raw data I can seem to find
> http://toolserver.org/~dartar/aft5/dumps/
>
> The heuristic technique that I currently using is training a naive
> Bayesian filter based on:
>
>-
>
>Per Section.
> -
>
>   Text length in each section
>   -
>
>   Infoboxes in each section.
>-
>
>  Filled parameters in each infobox
>   -
>
>   Images in each section
>-
>
>Good Article, Featured Article?
>-
>
>Then Normalize on Page Views per on population / speakers of native
>language
>
> Can you also think of any other dimensions or heuristics to
> programatically rate?
>
>
>  Best,
>   Maximilian Klein
> Wikipedian in Residence, OCLC
> +17074787023
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread James Salsman
Maximilian Klein wrote:

>... Can you also think of any other dimensions or heuristics
> to programatically rate?

Ref tags per article text bytes works pretty well, even by itself.

Also, please consider readability metrics. I would say that at this point
on enwiki, about a third of our real reader-impeding quallity issues have
more to do with overly technical jargon-laden articles, which usually also
have word and sentence length issues, than underdeveloped exposition.
Especially our math articles, many of which are almost useless for
undergraduates, let alone students at the earlier grade levels where the
corresponding concepts are introduced.

The good news is that doesn't seem to be happening in other topic areas
like biology, physics, or medicine. But math is kind of a disaster area
that way and it's not getting better with time.
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread Laura Hale
On Sun, Dec 15, 2013 at 9:53 AM, WereSpielChequers <
werespielchequ...@gmail.com> wrote:

> Re other dimensions or heuristics:
>
> Very few articles are rated as Featured, and not that many as Good, if you
> are going to use that rating 
> systemI'd
>  suggest also including the lower levels, and indeed whether an article
> has been assessed and typically how long it takes for a new article to be
> assessed. Uganda for example has 1 Featured article, 3 Good Articles and
> nearly 400 unassessed on the English language 
> Wikipedia
> .
>
> For a crowd sourced project like Wikipedia the size of the crowd is
> crucial and varies hugely per article. So I'd suggest counting the number
> of different editors other than bots who have contributed to the article.
>

Except why would this be something that would be an indicator of quality?
 I've done an analysis recently of football player biographies where I
looked at the total volume of edits, date created, total number of
citations and total number of pictures and none of these factors correlates
to article quality.  You can have an article with 1,400 editors and still
have it be assessed as a start.  Indeed, some of the lesser known articles
may actually attract specialist contributors who almost exclusively write
to one topic and then take the article to DYK, GA, A or FA.  The end result
is you have articles with low page views that are really great that are
maintained by one or two writers.



>Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to
>be a measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.

Yeah, I'd be skeptical of this overall though it might be bad.  The problem
is you could get say one contentious section of the article that ends up
fully cited or overcited while the rest of the article ends up poorly
cited.  At the same time, you can get B articles that really should be GAs
but people have been burned by that process so they just take it to B and
left it there.  I have heard this quite a few time from female Wikipedians
operating in certain places that the process actually puts them off.

-- 
twitter: purplepopple
blog: ozziesport.com
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Existitng Research on Article Quality Heuristics?

2013-12-15 Thread WereSpielChequers
Re other dimensions or heuristics:

Very few articles are rated as Featured, and not that many as Good, if you
are going to use that rating
systemI'd
suggest also including the lower levels, and indeed whether an article
has been assessed and typically how long it takes for a new article to be
assessed. Uganda for example has 1 Featured article, 3 Good Articles and
nearly 400 unassessed on the English language
Wikipedia
.

For a crowd sourced project like Wikipedia the size of the crowd is crucial
and varies hugely per article. So I'd suggest counting the number of
different editors other than bots who have contributed to the article. It
might also be worth getting some measure of local internet speed or usage
level as context. There was a big upgrade to East Africa's Internet
connection a few years ago. For Wikipedia the crucial metric is the size of
the Internet comfortable population with some free time and ready access to
PCs, I'm not sure we've yet measured how long it takes from people getting
internet access to their being sufficiently confident to edit Wikipedia
articles, I suspect the answer is age related,  but it would be worth
checking the various editor surveys to see if this has been collected yet.
My understanding is that in much of Africa many people are bypassing the
whole PC thing and going straight to smartphones, and of course for
mobilephone users Wikipedia is essentially a queryable media rather than an
interactive editable one.

Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to be a
measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.

Thankfully the Article Feedback tool has been almost eradicated from the
English language Wikipedia, I don't know if it is still on French or
Swahili. I don't see it as being connected to the quality of article,
thouugh it should be an interesting measure of how loved or hated a given
celebrity was during the time the tool was deployed. So I'd suggest
ignoring it in your research on article quality.

Hope that helps

Jonathan


On 15 December 2013 06:15, Klein,Max  wrote:

>  Wiki Research Junkies,
>
> I am investigating the comparative quality of articles about  Cote
> d'Ivoire and Uganda versus other countries. I wanted to answer the question
> of what makes high-quality articles? Can anyone point me to any existing
> research on heuristics of Article Quality? That is, determining an articles
> quality by the wikitext properties, without human rating? I would also
> consider using data from the Article Feedback Tools, if there were dumps
> available for each Article in English, French, and Swahili Wikipedias.
> This is all the raw data I can seem to find
> http://toolserver.org/~dartar/aft5/dumps/
>
> The heuristic technique that I currently using is training a naive
> Bayesian filter based on:
>
>-
>
>Per Section.
> -
>
>   Text length in each section
>   -
>
>   Infoboxes in each section.
>-
>
>  Filled parameters in each infobox
>   -
>
>   Images in each section
>-
>
>Good Article, Featured Article?
>-
>
>Then Normalize on Page Views per on population / speakers of native
>language
>
> Can you also think of any other dimensions or heuristics to
> programatically rate?
>
>
>  Best,
>   Maximilian Klein
> Wikipedian in Residence, OCLC
> +17074787023
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l