Re other dimensions or heuristics:

Very few articles are rated as Featured, and not that many as Good, if you
are going to use that rating
system<https://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Assessment>I'd
suggest also including the lower levels, and indeed whether an article
has been assessed and typically how long it takes for a new article to be
assessed. Uganda for example has 1 Featured article, 3 Good Articles and
nearly 400 unassessed on the English language
Wikipedia<https://en.wikipedia.org/wiki/Wikipedia:UGANDA#Recognized_content>
.

For a crowd sourced project like Wikipedia the size of the crowd is crucial
and varies hugely per article. So I'd suggest counting the number of
different editors other than bots who have contributed to the article. It
might also be worth getting some measure of local internet speed or usage
level as context. There was a big upgrade to East Africa's Internet
connection a few years ago. For Wikipedia the crucial metric is the size of
the Internet comfortable population with some free time and ready access to
PCs, I'm not sure we've yet measured how long it takes from people getting
internet access to their being sufficiently confident to edit Wikipedia
articles, I suspect the answer is age related,  but it would be worth
checking the various editor surveys to see if this has been collected yet.
My understanding is that in much of Africa many people are bypassing the
whole PC thing and going straight to smartphones, and of course for
mobilephone users Wikipedia is essentially a queryable media rather than an
interactive editable one.

Whether or not a Wikipedia article has references is a quality dimension
you might want to look at. At least on EN it is widely assumed to be a
measure of quality, though I don't recall ever seeing a study of the
relative accuracy of cited and uncited Wikipedia information.

Thankfully the Article Feedback tool has been almost eradicated from the
English language Wikipedia, I don't know if it is still on French or
Swahili. I don't see it as being connected to the quality of article,
thouugh it should be an interesting measure of how loved or hated a given
celebrity was during the time the tool was deployed. So I'd suggest
ignoring it in your research on article quality.

Hope that helps

Jonathan


On 15 December 2013 06:15, Klein,Max <kle...@oclc.org> wrote:

>  Wiki Research Junkies,
>
> I am investigating the comparative quality of articles about  Cote
> d'Ivoire and Uganda versus other countries. I wanted to answer the question
> of what makes high-quality articles? Can anyone point me to any existing
> research on heuristics of Article Quality? That is, determining an articles
> quality by the wikitext properties, without human rating? I would also
> consider using data from the Article Feedback Tools, if there were dumps
> available for each Article in English, French, and Swahili Wikipedias.
> This is all the raw data I can seem to find
> http://toolserver.org/~dartar/aft5/dumps/
>
> The heuristic technique that I currently using is training a naive
> Bayesian filter based on:
>
>    -
>
>    Per Section.
>     -
>
>       Text length in each section
>       -
>
>       Infoboxes in each section.
>        -
>
>          Filled parameters in each infobox
>           -
>
>       Images in each section
>        -
>
>    Good Article, Featured Article?
>    -
>
>    Then Normalize on Page Views per on population / speakers of native
>    language
>
> Can you also think of any other dimensions or heuristics to
> programatically rate?
>
>
>  Best,
>   Maximilian Klein
> Wikipedian in Residence, OCLC
> +17074787023
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to