Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles

edward Thu, 08 May 2014 14:44:19 -0700

On 08/05/2014 22:29, Andrew Gray wrote:

Section 3.3 of the report covers article selection. They went about it

backwards (at least, backwards to the way you might expect) -
recruiting reviewers and then manually identifying relevant articles,
as the original goal was to use relevant topics for individual
specialists.

Even this selective method didn't work as well as might be hoped,

because the mechanism of the study required a minimum level of content
- the articles had to be substantial enough to be useful for a
comparison, and of sufficient length and comparable scope in both sets
of sources - which ruled out many of the initial selections.

After it was published I emailed both the epic and the Oxford team tounderstand why they chose the articles they did. I was unable to get asatisfactory answer.

The method of selecting the most notable philosopher-theologians from acertain period is a good one. There is no reason it has to be random,so long as there is a clearly defined selection method. However, theywere unable to explain why of the most notable subjects, they choseAquinas and Anselm. I suspect there was a selection bias, as those werethe articles which 'looked' the best. (The ones on Ockham and Scotuswere so obviously vandalised that even a novice would have spotted theproblem).

Even then, as I have already pointed out above, they missed the factthat the Anselm article was plagiarised from Britannica 1911, so thatinstead of comparing Britannica to Wikiepedia, they were comparingBritannica 2011 with Britannica 1911. And they missed some bad errorsthat had been introduced by Wikipedia editors when they attempted tomodernise the old Britannica prose.

To give a simple example that even Geni will have to concede is not'subjectively wrong', the Wikipedia article on Anselm said

"Anselm wrote many proofs within Monologion and Proslogion. In the firstproof, Anselm relies on the ordinary grounds of realism, which coincideto some extent with the theory of Augustine."


This is a mangled version of the B1911 which reads

"This demonstration is the substance of the Monologion and Proslogion.In the first of these the proof rests on the ordinary grounds of realism"

You see what went wrong? 'first of these' should refer to the firstbook, namely Monologion. But one editor removed ""This demonstration isthe substance of the Monologion and Proslogion" as being too difficultfor ordinary readers, leaving 'first of these'. Another editor camealong and thought it referred to the first proof. This is quite incorrect.

I am still amazed the Oxford team didn't spot this. Even if you don'tknow the article was lifted from B1911, the oddity of the assertionshould have rung alarm bells. There are about 9 other mistakes ofdiffering severity.



On 08/05/2014 22:29, Andrew Gray wrote:

On 8 May 2014 01:56, Andreas Kolbe <jayen...@gmail.com> wrote:

(However, this study does not seem to have been based on a random sample –
at least I cannot find any mention of the sample selection method in the
study's write-up. The selection of a random sample is key to any such
effort, and the method used to select the sample should be described in
detail in any resulting report.)

https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf

Section 3.3 of the report covers article selection. They went about it
backwards (at least, backwards to the way you might expect) -
recruiting reviewers and then manually identifying relevant articles,
as the original goal was to use relevant topics for individual
specialists.

Even this selective method didn't work as well as might be hoped,
because the mechanism of the study required a minimum level of content
- the articles had to be substantial enough to be useful for a
comparison, and of sufficient length and comparable scope in both sets
of sources - which ruled out many of the initial selections.

(This is a key point to remember: the study effectively assesses the
quality of a subset of "developed" articles in Wikipedia, rather than
the presumably less-good fragmentary ones. It's a valid question to
ask, but not always the one people think it's answering...)

"Thus the selection of articles was constrained by two important
factors: one, the need to find topics appropriate for the academics
whom we were able to recruit to the project; secondly, that articles
from different online encyclopaedias were of comparable substance and
focus. (Such factors would need to be taken carefully into account
when embarking on a future large-scale study, where the demands of
finding large numbers of comparable articles are likely to be
considerable.)"

You'd need to adopt a fairly different methodology if you wanted a
random sampling; I suppose you could prefilter a sample by "likely to
be suitable" metrics (eg minimum size, article title matching a title
list from the other reference works) and randomly select from within
*those*, but of course you would still have the fundamental issue that
you're essentially reviewing a selected portion of the project.



_______________________________________________
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles

Reply via email to