Re: [WikiEN-l] Rating the English wikipedia
{{sofixit}} :) On Sun, Feb 13, 2011 at 10:16 PM, Ian Woollard wrote: > This encyclopedia has been rated as C-Class on the project's quality scale. > > This encyclopedia has been checked against the following criteria for > B-Class status: > > 1. Referencing and citation: criterion not met (many common > articles are not adequately referenced) > 2. Coverage and accuracy: criterion not met (currently 3.5 million > of an estimated 4.4 million articles) > 3. Structure: criterion met (seems to be reasonably well structured) > 4. Grammar and style: criterion met (mostly good enough, but would > not please a purist) > 5. Supporting materials: criterion met (multiple wikis surround and > support it) > > I therefore award the Wikipedia class C: > > The Wikipedia is substantial, but is still missing important content > or contains a lot of irrelevant material. The Wikipedia should have > references to reliable sources, but may still have significant issues > or require substantial cleanup. > > The Wikipedia is better developed in style, structure and quality than > Start-Class, but fails one or more of the criteria for B-Class. It may > have some gaps or missing elements; need editing for clarity, balance > or flow; or contain policy violations such as bias or original > research. > > Useful to a casual reader, but would not overall provide a complete > picture for even a moderately detailed study. Considerable editing is > needed to close gaps in content and address cleanup issues. > > -- > -Ian Woollard > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Sun, Feb 13, 2011 at 8:16 PM, Ian Woollard wrote: > > I therefore award the Wikipedia class C: > > Considering that 55% of articles are stubs and 21% are start awarding Wikipedia a C overall is quite generous. -- Brian Mingus Graduate student Computational Cognitive Neuroscience Lab University of Colorado at Boulder ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
I say it's start class at best. On Sun, Feb 13, 2011 at 10:23 PM, Brian J Mingus wrote: > On Sun, Feb 13, 2011 at 8:16 PM, Ian Woollard wrote: > >> >> I therefore award the Wikipedia class C: >> >> > Considering that 55% of articles are stubs and 21% are start awarding > Wikipedia a C overall is quite generous. > > -- > Brian Mingus > Graduate student > Computational Cognitive Neuroscience Lab > University of Colorado at Boulder > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > -- Faith is about what you really truly believe in, not about what you are taught to believe. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011, Newyorkbrad wrote: > {{sofixit}} :) fixin' the Wikipedia - brb -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
Can we at least agree it's High-importance? Newyorkbrad On Sun, Feb 13, 2011 at 10:16 PM, Ian Woollard wrote: > This encyclopedia has been rated as C-Class on the project's quality scale. > > This encyclopedia has been checked against the following criteria for > B-Class status: > > 1. Referencing and citation: criterion not met (many common > articles are not adequately referenced) > 2. Coverage and accuracy: criterion not met (currently 3.5 million > of an estimated 4.4 million articles) > 3. Structure: criterion met (seems to be reasonably well structured) > 4. Grammar and style: criterion met (mostly good enough, but would > not please a purist) > 5. Supporting materials: criterion met (multiple wikis surround and > support it) > > I therefore award the Wikipedia class C: > > The Wikipedia is substantial, but is still missing important content > or contains a lot of irrelevant material. The Wikipedia should have > references to reliable sources, but may still have significant issues > or require substantial cleanup. > > The Wikipedia is better developed in style, structure and quality than > Start-Class, but fails one or more of the criteria for B-Class. It may > have some gaps or missing elements; need editing for clarity, balance > or flow; or contain policy violations such as bias or original > research. > > Useful to a casual reader, but would not overall provide a complete > picture for even a moderately detailed study. Considerable editing is > needed to close gaps in content and address cleanup issues. > > -- > -Ian Woollard > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011, Brian J Mingus wrote: > Considering that 55% of articles are stubs and 21% are start awarding > Wikipedia a C overall is quite generous. I think you can't take the simple percentages of articles, a lot of the most important and well visited articles are pretty well sorted, whereas the stubs are mostly articles few people go to. I would think that percentages of FA/GA/A/B/C/Start/Stub with respect to page hits would be much more illuminating. > -- > Brian Mingus > Graduate student > Computational Cognitive Neuroscience Lab > University of Colorado at Boulder -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
> I would think that percentages of FA/GA/A/B/C/Start/Stub with respect > to page hits would be much more illuminating. Ooh, I'd like to see that. And to get a list of pages that are well below par considering their popularity. Steve ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011 03:35, Ian Woollard wrote: > I think you can't take the simple percentages of articles, a lot of > the most important and well visited articles are pretty well sorted, > whereas the stubs are mostly articles few people go to. While this discussion is worth having, I wish to record a view, now long held, by means of a metaphor. Wikipedia is an omelette, not scrambled eggs. Because of the intrinsic use of of hypertext, taking WP to be (in the large) a collection of articles is always a distortion. If the "few people" who go to a stub are just those who would refer to a corresponding footnote in a book, the system as a whole is functioning as it should. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
I think Charles is right about this. There is a common conception, or misconception, that stubship or start-class-ship is just a way station on the way to articlehood. But some articles are probably destined to remain short, or at least, can remain short without their shortness reflecting poorly on the project. I don't know if there are any statistics, but I am sure that the Britannica (for example) has at least as many one- or two- or three-paragraph articles as lengthier ones. It may be that the wording of the stub template fosters this reading. "This article is a stub. You can help Wikipedia by expanding it." Often, of course, but perhaps not always. Newyorkbrad On Mon, Feb 14, 2011 at 4:23 AM, Charles Matthews < charles.r.matth...@ntlworld.com> wrote: > On 14/02/2011 03:35, Ian Woollard wrote: > > I think you can't take the simple percentages of articles, a lot of > > the most important and well visited articles are pretty well sorted, > > whereas the stubs are mostly articles few people go to. > While this discussion is worth having, I wish to record a view, now long > held, by means of a metaphor. Wikipedia is an omelette, not scrambled > eggs. Because of the intrinsic use of of hypertext, taking WP to be (in > the large) a collection of articles is always a distortion. If the "few > people" who go to a stub are just those who would refer to a > corresponding footnote in a book, the system as a whole is functioning > as it should. > > Charles > > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
I think not. There's a difference between a stub (which may not have many or even any references at all) and a very short article. Something can be a valid C-class, and still only be 2 or 3 paragraphs. On 14/02/2011, Newyorkbrad wrote: > I think Charles is right about this. There is a common conception, or > misconception, that stubship or start-class-ship is just a way station on > the way to articlehood. But some articles are probably destined to remain > short, or at least, can remain short without their > shortness reflecting poorly on the project. I don't know if there are any > statistics, but I am sure that the Britannica (for example) has at least as > many one- or two- or three-paragraph articles as lengthier ones. > > It may be that the wording of the stub template fosters this reading. "This > article is a stub. You can help Wikipedia by expanding it." Often, of > course, but perhaps not always. > > Newyorkbrad > > > > > > > On Mon, Feb 14, 2011 at 4:23 AM, Charles Matthews < > charles.r.matth...@ntlworld.com> wrote: > >> On 14/02/2011 03:35, Ian Woollard wrote: >> > I think you can't take the simple percentages of articles, a lot of >> > the most important and well visited articles are pretty well sorted, >> > whereas the stubs are mostly articles few people go to. >> While this discussion is worth having, I wish to record a view, now long >> held, by means of a metaphor. Wikipedia is an omelette, not scrambled >> eggs. Because of the intrinsic use of of hypertext, taking WP to be (in >> the large) a collection of articles is always a distortion. If the "few >> people" who go to a stub are just those who would refer to a >> corresponding footnote in a book, the system as a whole is functioning >> as it should. >> >> Charles >> >> >> ___ >> WikiEN-l mailing list >> WikiEN-l@lists.wikimedia.org >> To unsubscribe from this mailing list, visit: >> https://lists.wikimedia.org/mailman/listinfo/wikien-l >> > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
True, but how well is the distinction understood by people who apply the templates or rate the articles? Newyorkbrad On Mon, Feb 14, 2011 at 11:30 AM, Ian Woollard wrote: > I think not. There's a difference between a stub (which may not have > many or even any references at all) and a very short article. > Something can be a valid C-class, and still only be 2 or 3 paragraphs. > > On 14/02/2011, Newyorkbrad wrote: > > I think Charles is right about this. There is a common conception, or > > misconception, that stubship or start-class-ship is just a way station on > > the way to articlehood. But some articles are probably destined to > remain > > short, or at least, can remain short without their > > shortness reflecting poorly on the project. I don't know if there are > any > > statistics, but I am sure that the Britannica (for example) has at least > as > > many one- or two- or three-paragraph articles as lengthier ones. > > > > It may be that the wording of the stub template fosters this reading. > "This > > article is a stub. You can help Wikipedia by expanding it." Often, of > > course, but perhaps not always. > > > > Newyorkbrad > > > > > > > > > > > > > > On Mon, Feb 14, 2011 at 4:23 AM, Charles Matthews < > > charles.r.matth...@ntlworld.com> wrote: > > > >> On 14/02/2011 03:35, Ian Woollard wrote: > >> > I think you can't take the simple percentages of articles, a lot of > >> > the most important and well visited articles are pretty well sorted, > >> > whereas the stubs are mostly articles few people go to. > >> While this discussion is worth having, I wish to record a view, now long > >> held, by means of a metaphor. Wikipedia is an omelette, not scrambled > >> eggs. Because of the intrinsic use of of hypertext, taking WP to be (in > >> the large) a collection of articles is always a distortion. If the "few > >> people" who go to a stub are just those who would refer to a > >> corresponding footnote in a book, the system as a whole is functioning > >> as it should. > >> > >> Charles > >> > >> > >> ___ > >> WikiEN-l mailing list > >> WikiEN-l@lists.wikimedia.org > >> To unsubscribe from this mailing list, visit: > >> https://lists.wikimedia.org/mailman/listinfo/wikien-l > >> > > ___ > > WikiEN-l mailing list > > WikiEN-l@lists.wikimedia.org > > To unsubscribe from this mailing list, visit: > > https://lists.wikimedia.org/mailman/listinfo/wikien-l > > > > > -- > -Ian Woollard > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011, Newyorkbrad wrote: > True, but how well is the distinction understood by people who apply the > templates or rate the articles? I'm certain that the rating system is imperfectly applied. It is to be hoped and likely that over time both the ratings and the way that they are applied will improve. > Newyorkbrad -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
It would be nice if the consistency of the ratings were to improve over time whilst the criteria remained the same, if that were to happen we would be able to use this to monitor improvement over time. But standards inflation has the better of us, that's why at http://en.wikipedia.org/wiki/Wikipedia:Featured_article_review we can't simply revert to the version that originally passed FA. The current version of an old FA may well be better than when the article passed FA, but still not meet current FA standards. It would be great to have an accurate measure of the change in quality of the pedia. But the ratings won't give us that. WereSpielChequers On 14 February 2011 17:04, Ian Woollard wrote: > On 14/02/2011, Newyorkbrad wrote: >> True, but how well is the distinction understood by people who apply the >> templates or rate the articles? > > I'm certain that the rating system is imperfectly applied. > > It is to be hoped and likely that over time both the ratings and the > way that they are applied will improve. > >> Newyorkbrad > > -- > -Ian Woollard > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14 February 2011 20:04, Fences&Windows wrote: > From: Ian Woollard >>2. Coverage and accuracy: criterion not met (currently 3.5 million >>of an estimated 4.4 million articles) > You think there are only 4.4 million possible topics? Based on what criteria? I recall someone (Ray Saintonge?) working out there'd be at least 20 million, just going on placenames and politicians that are currently in all the large WPs. Anyone got a link on hand to that? - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14 February 2011 20:04, Fences&Windows wrote: > Date: Mon, 14 Feb 2011 03:16:12 + > From: Ian Woollard > Subject: [WikiEN-l] Rating the English wikipedia > >>This encyclopedia has been rated as C-Class on the project's quality scale. >>This encyclopedia has been checked against the following criteria for >>B-Class status: > >>2. Coverage and accuracy: criterion not met (currently 3.5 million >>of an estimated 4.4 million articles) > > > You think there are only 4.4 million possible topics? Based on what criteria? > Stevertigo also thought this in the essay Wikipedia:Concept limit, which I > tagged as [citation needed]. There are probably tens of millions of > potentially > notable topics, if not hundreds of millions. However, we're better at deleting > new articles than writing them and writing a new article that will survive > these > days requires more detailed research than in years gone by. I agree. There are far more than 4.4 million possible topics. Consider all the human settlements that we could write articles about. There could well be millions of those (I really don't know how many there are). ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Mon, Feb 14, 2011 at 3:17 PM, David Gerard wrote: > On 14 February 2011 20:04, Fences&Windows > wrote: >> From: Ian Woollard > >>>2. Coverage and accuracy: criterion not met (currently 3.5 million >>>of an estimated 4.4 million articles) > >> You think there are only 4.4 million possible topics? Based on what criteria? > > > I recall someone (Ray Saintonge?) working out there'd be at least 20 > million, just going on placenames and politicians that are currently > in all the large WPs. Anyone got a link on hand to that? Perhaps http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test -- gwern http://www.gwern.net ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14 February 2011 20:48, Gwern Branwen wrote: > On Mon, Feb 14, 2011 at 3:17 PM, David Gerard wrote: >> I recall someone (Ray Saintonge?) working out there'd be at least 20 >> million, just going on placenames and politicians that are currently >> in all the large WPs. Anyone got a link on hand to that? > Perhaps > http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test That's the one! There's a *heck* of a lot still to be written. - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14 February 2011 20:48, Gwern Branwen wrote: > Perhaps > http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test I think that page is more a test of how good we are at interwiki linking than anything else. The trend it shows is far too fast to be explained by new articles being written, it must be explained by old articles being linked to. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
There are two approaches to predicting the size of Wikipedia, one based on working out how many articles would meet the general notability guideline, the other charting how we have grown and extrapolating the curve. I'm not totally convinced at the 20 million theory based on articles in different Wikipedias that aren't interwiki linked. I suspect that a bit more work at finding intrawiki links would chip away at that, I know from the death anomalies project http://meta.wikimedia.org/wiki/Death_anomalies_table that we are still adding intrawiki links, and I'm pretty sure that we've added a lot in the 18 months since the 20 million prediction was made. So the potential size of the pedia might be less than twenty million, but I'm pretty sure it is many millions more than the 3.55 million we currently have. Provided we keep our notability policy and if we can rein in the deletionists, there are a lot of notable topics that don't have articles yet. There was an extrapolation of the trend done in 2007 that predicted we'd peak at 3.5 million http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%27s_growth#Logistic_model_for_growth_in_article_count_of_Wikipedia We are currently 1% above that and still growing. The 4.4million prediction comes from the Gompertz model http://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia But the vulnerability of that model, as with any extrapolation, is that the thing you are modelling can change. If something like WYSIWYG editing were to bring in a new wave of editors then the model would break and it would be possible to think in terms of how many potential articles qualify. WereSpielChequers On 14 February 2011 21:54, David Gerard wrote: > On 14 February 2011 20:48, Gwern Branwen wrote: >> On Mon, Feb 14, 2011 at 3:17 PM, David Gerard wrote: > >>> I recall someone (Ray Saintonge?) working out there'd be at least 20 >>> million, just going on placenames and politicians that are currently >>> in all the large WPs. Anyone got a link on hand to that? > >> Perhaps >> http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test > > > That's the one! > > There's a *heck* of a lot still to be written. > > > - d. > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Mon, Feb 14, 2011 at 3:16 AM, Ian Woollard wrote: > I therefore award the Wikipedia class C: I award it an F minus, based on using it to do some research today on the topic of the Nebra sky disc (i.e. as a starting point to looking elsewhere, but I was hoping that the Wikipedia article would be a good starting point): http://en.wikipedia.org/wiki/Nebra_sky_disk Different bits of text within the article contradict each other, there is a struck-out bit (using tags) down in the references section, and when you look in the article history, you find lots of recent changes in January 2011. From what I can tell, someone in January 2011 has made lots of changes. These are the changes since 4 December 2010: http://en.wikipedia.org/w/index.php?title=Nebra_sky_disk&diff=413679667&oldid=400465808 Some of the removal edits: http://en.wikipedia.org/w/index.php?title=Nebra_sky_disk&diff=prev&oldid=410561429 http://en.wikipedia.org/w/index.php?title=Nebra_sky_disk&diff=410525404&oldid=409950734 http://en.wikipedia.org/w/index.php?title=Nebra_sky_disk&diff=411978495&oldid=411480834 http://en.wikipedia.org/w/index.php?title=Nebra_sky_disk&diff=413984194&oldid=413679667 Essentially, the article is a mess, so I gave up and went elsewhere to look for information on this object. And back on Wikipedia, I've asked some other editors to have a look at the article. I'm tempted to ask whether the "system" worked here or not. I understand that there is always a chance that you come across an article in a poor state during editing, but quite why there wasn't a proper reaction here, I don't know. Carcharoth ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 01:17, Carcharoth wrote: > On Mon, Feb 14, 2011 at 3:16 AM, Ian Woollard wrote: > >> I therefore award the Wikipedia class C: > > I award it an F minus, based on using it to do some research today on > the topic of the Nebra sky disc (i.e. as a starting point to looking > elsewhere, but I was hoping that the Wikipedia article would be a good > starting point): > > http://en.wikipedia.org/wiki/Nebra_sky_disk I'm not sure that judging a project with 3 million articles based on a sample of just one article a great idea. > > I'm tempted to ask whether the "system" worked here or not. I > understand that there is always a chance that you come across an > article in a poor state during editing, but quite why there wasn't a > proper reaction here, I don't know. I'd say it's hit the wall of text problem beyond a certain size unless there is an individual really prepared to look after the article there is a tendency towards messiness. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011, David Gerard wrote: > On 14 February 2011 20:48, Gwern Branwen wrote: >> Perhaps >> http://en.wikipedia.org/wiki/User:Piotrus/Wikipedia_interwiki_and_specialized_knowledge_test Oh rght. So back in 2006, Piotrus claims that there should be 400 million articles. It turns out he based this essentially only on biographies. In Poland. Quick sanity check: that's about one bio article for every twentieth person alive on the entire planet. And these would be encyclopedically *notable* people would they? We can easily see that that's not going to happen, even allowing for the fact that lots of people have died already, most people just aren't that notable, and the current population completely swamps historical populations. OK, so how did this happen? So I checked back through the history of the article. The first claim was that it essentially needs 400 million biographies of people. It turns out that the 400 million was based on dividing 30 into 1000 to get 0.3% and then dividing that into the biographies in the English Wikipedia. But... 30 in 1000 is 3%. So he's already out by a factor of 10. That's bad enough. So now we're down to 40 million. His next error is assuming that the English Wikipedia is off by a factor of 33 on its biographies *worldwide*, as opposed to having a blind patch on Poland. So let's look at this. The biographical encyclopedia that he mentions has 25,000 entries. Poland has 38 million people. So less than 1 person in a thousand is notable in Poland according to this encyclopedia. I then checked the British biography 'Who's who'. They have about 30,000 entries, but that's only about 1 person in 2000 in Great Britain, so even less. But again, roughly 1 person in 1000. The world population is currently about 7 billion. So if it's as high as 1 in a 1000 then that's about 7 million articles, and to be honest in reality it's probably a *lot* less, a lot of people globally do things like subsistence level farming, and are thus far less likely to be notable. So even that is excessively favourable. I would guess we're looking at a few million biographies needed, worldwide at the very most. And sure, there's probably other biographical encyclopedias out there, and they may list a few more that Who's who misses, but that kind of thing depends on notability as to whether they'd survive AFDs in a general encyclopedia. Anyway, so I stop there. Even 40 million appears completely unsupportable. It looks like it's off again by about another order of magnitude. So, to sum up, this article's claim of 400 million is just based on simple and obvious arithmetic logical errors, and seems to be two orders of magnitude too high. > - d. -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 04:00, Ian Woollard wrote: > Anyway, so I stop there. Even 40 million appears completely > unsupportable. It looks like it's off again by about another order of > magnitude. Oh really? People have been keeping records for a long time. Western Europe has very comprehensive records going back 200 years. More patchy records strech back about 8000 years. When you consider the number of politicians, military leaders, aristocracy, industrialists, sportspeople, scientists, writers, artists, musicians, performers and general hangers on there have been in that time it's quite a lot of people. How many is probably impossible to calulate. There are various attack lines "how many people does it take to make a person notable" or random sampling of the electoral roll would be one way to make a start but as far as I'm aware we haven't done so. We can establish a lower bound since the Thomson-Gale's Biography Resource Center contains over 1,335,000 biographies. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Tue, Feb 15, 2011 at 3:03 AM, geni wrote: > I'm not sure that judging a project with 3 million articles based on a > sample of just one article a great idea. That was tongue-in-cheek, but a reminder to be wary of the state of an article. I wonder whether the recent editing history should be more visible to readers, or at least an indication of when the article was last edited? The "This page was last modified on 15 February 2011 at 01:35." is right at the bottom of the page - arguably (like on other sites) it should be up at the top. >> I'm tempted to ask whether the "system" worked here or not. I >> understand that there is always a chance that you come across an >> article in a poor state during editing, but quite why there wasn't a >> proper reaction here, I don't know. > > I'd say it's hit the wall of text problem beyond a certain size unless > there is an individual really prepared to look after the article there > is a tendency towards messiness. I've just discovered a talk page section where the editors discussed things. I missed it because it was stuck at the top of the talk page, rather than the bottom of the talk page (a common misplacement done by editors not familar with talk page conventions). So the system was working here. It was just that the discussion was slightly hidden away. And the talk page is almost as confusing as the article. I wonder if there is a tool that shows when reading an article if there has been recent talk page activity? I know you can just click the talk page tab, but some of this information should be visible immediately, and not just a few clicks away. Carcharoth ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Tue, Feb 15, 2011 at 4:33 AM, geni wrote: > We can establish a lower > bound since the Thomson-Gale's Biography Resource Center contains over > 1,335,000 biographies. The 2007 edition of the ODNB (British biographical history) has "50,113 biographical articles covering 54,922 lives". What criteria are used for the Thomson-Gale's Biography Resource Center? We don't have an article on that, though we do have this: http://en.wikipedia.org/wiki/Biography_and_Genealogy_Master_Index "The Biography and Genealogy Master Index (BGMI) was a printed reference index, and is currently a proprietary database published by the Gale Research Company. The database indexes more than 15 million individuals, living and deceased, covered in more than 1700 biographical reference sources." It that something different? http://www.gale.cengage.com/servlet/BrowseSeriesServlet?region=9&imprint=000&titleCode=BDMI&edition Carcharoth ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 11:22, Carcharoth wrote: > On Tue, Feb 15, 2011 at 4:33 AM, geni wrote: > >> We can establish a lower >> bound since the Thomson-Gale's Biography Resource Center contains over >> 1,335,000 biographies. > > The 2007 edition of the ODNB (British biographical history) has > "50,113 biographical articles covering 54,922 lives". What criteria > are used for the Thomson-Gale's Biography Resource Center? We don't > have an article on that, though we do have this: > > http://en.wikipedia.org/wiki/Biography_and_Genealogy_Master_Index > > "The Biography and Genealogy Master Index (BGMI) was a printed > reference index, and is currently a proprietary database published by > the Gale Research Company. The database indexes more than 15 million > individuals, living and deceased, covered in more than 1700 > biographical reference sources." > > It that something different? > > http://www.gale.cengage.com/servlet/BrowseSeriesServlet?region=9&imprint=000&titleCode=BDMI&edition > > Carcharoth It's something listed at: http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 14/02/2011 22:31, WereSpielChequers wrote: > If something like WYSIWYG > editing were to bring in a new wave of editors then the model would > break and it would be possible to think in terms of how many potential > articles qualify. I think there is a point here. There are certainly a number of valid topics without articles in enWP (a million is a good enough figure), but the question is how many people will (a) think they should be written, and then (b) do something about it. The demographics of "new editors" have something to do with (a). We certainly need new editors upgrading our older articles where that has not been done, also (which is on-topic for the thread). Much of this discussion seems to work still with a rather primitive model of how editors assign themselves to tasks. Among tasks is seeing what the encyclopedia needs by direct inspection of existing content. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 04:33, geni wrote: > On 15 February 2011 04:00, Ian Woollard wrote: > > Anyway, so I stop there. Even 40 million appears completely > > unsupportable. It looks like it's off again by about another order of > > magnitude. > > Oh really? > Yeah, really. That page claims we only have 3% of notable Poles. Are you really, seriously, telling me we only have 3% of ALL notable biographies??? Because that's what that page is assuming to calculate that 40 million. > People have been keeping records for a long time. Western Europe has > very comprehensive records going back 200 years. More patchy records > strech back about 8000 years. > Yup. > When you consider the number of politicians, military leaders, > aristocracy, industrialists, sportspeople, scientists, writers, > artists, musicians, performers and general hangers on there have been > in that time it's quite a lot of people. > > How many is probably impossible to calulate. It's not impossible to calculate, you look at the counts from an encyclopedias of famous people. And they very typically list historical people as well as living people. -- > geni > -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 16:19, Ian Woollard wrote: > Yeah, really. That page claims we only have 3% of notable Poles. Are you > really, seriously, telling me we only have 3% of ALL notable biographies??? > Because that's what that page is assuming to calculate that 40 million. It's possible. Our coverage of say British MPs starts to fall apart pre-20th century. > It's not impossible to calculate, you look at the counts from an > encyclopedias of famous people. And they very typically list historical > people as well as living people. But they all hit dead tree limitations. Sure you can chose a very narrow focus book like the alphabet of the saints. So it seem pretty likely that up until 1992 Southampton FC had a bit over 700 players about which it would be possible to write something about. But such books don't really exist for far areas. Still assuming players play for an average of 2 clubs (remember players didn't used to move around as much) you are looking at about 28 000 english male football bios up until 1992. But how many captains of the royal navy are notable? How many knights? Mayors? So while yes it may be possible for some individual areas as to how many bios there could be more generally I don't think can be done. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15/02/2011, geni wrote: > On 15 February 2011 16:19, Ian Woollard wrote: >> Yeah, really. That page claims we only have 3% of notable Poles. Are you >> really, seriously, telling me we only have 3% of ALL notable >> biographies??? >> Because that's what that page is assuming to calculate that 40 million. > > It's possible. Our coverage of say British MPs starts to fall apart > pre-20th century. But should each MP necessarily have his own biography? >> It's not impossible to calculate, you look at the counts from an >> encyclopedias of famous people. And they very typically list historical >> people as well as living people. > > But they all hit dead tree limitations. Then they're not capable of being reliably sourced. > Sure you can chose a very > narrow focus book like the alphabet of the saints. So it seem pretty > likely that up until 1992 Southampton FC had a bit over 700 players > about which it would be possible to write something about. But such > books don't really exist for far areas. Then there's no sources, and no biography. > Still assuming players play for an average of 2 clubs (remember > players didn't used to move around as much) you are looking at about > 28 000 english male football bios up until 1992. Only if they're notable, and reliably sourced. I don't think they're notable enough to have their own article simply for having played. > But how many captains of the royal navy are notable? How many knights? > Mayors? Indeed. > So while yes it may be possible for some individual areas as to how > many bios there could be more generally I don't think can be done. So you're saying that you don't know; and it's not a lot of use is it? > -- > geni -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 04:00, Ian Woollard wrote: > I then checked the British biography 'Who's who'. They have about > 30,000 entries, but that's only about 1 person in 2000 in Great > Britain, so even less. This is actually quite an interesting angle to come at the problem from. Who's Who has 34,210 people in it (the selection process is "notable" by their standards, "related to the UK", though this is sometimes stretched, and currently living). Their "legacy archive", of people who were at some point included since publication began c. 1900, is larger; it runs to 89,763 names - thus a total of ~124,000 people, of whom 28% are currently alive. But that's, of course, an undercount of all people "notable and related to the UK". * Firstly, Who's Who has gaps; it has an idiosyncratic and, historically, quite old-fashioned selection process. My current work is on the sort of person that stuffy establishment reference works thrived on, but I find perhaps 20% of them aren't covered. * Secondly, the gaps involve systemic biases; to consider one we can easily check for, only 13% of the "current" biographies are women, and a tiny 4% of the "old" biographies are. * Thirdly - perhaps the biggest element - notability didn't begin with the people still breathing in 1900. The Who's Who figures don't reflect the long tail of historical biographies from the past; a conservative estimate might be to double or triple the figures. After making appropriate adjustments for these, we find that the data suggests there might be 400,000 potentially suitable biographies out there within the broad geographical remit of Who's Who; expanding that to the world as a whole would begin to push the high seven figures. Or, to look at it another way... we currently have around half a million BLPs from around the world. *Without* correcting for the long tail of dead people, then our known coverage of BLPs would suggest there should be around 1,800,000 total "possible" biographies. If we *do* make a corresponding adjustment, then the expected total comes in at three to four million biographies. And, of course, we have known gaps in our BLP coverage, suggesting the total number would come out higher... We currently have around 900,000 biographies. So even by a *highly conservative* estimate, taking for the sake of argument that we have 100% coverage of living biographies and that the number of people notable before the late nineteenth century was trivial, there'd still be, at the very least, a million notable past biographies still waiting to be written... -- - Andrew Gray andrew.g...@dunelm.org.uk ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 18:17, Ian Woollard wrote: > On 15/02/2011, geni wrote: >> On 15 February 2011 16:19, Ian Woollard wrote: >>> Yeah, really. That page claims we only have 3% of notable Poles. Are you >>> really, seriously, telling me we only have 3% of ALL notable >>> biographies??? >>> Because that's what that page is assuming to calculate that 40 million. >> >> It's possible. Our coverage of say British MPs starts to fall apart >> pre-20th century. > > But should each MP necessarily have his own biography? > >>> It's not impossible to calculate, you look at the counts from an >>> encyclopedias of famous people. And they very typically list historical >>> people as well as living people. >> >> But they all hit dead tree limitations. > > Then they're not capable of being reliably sourced. Of course they are. It's just the sources are things other than encyclopedias of famous people > Only if they're notable, and reliably sourced. I don't think they're > notable enough to have their own article simply for having played. In practice yes they are. Local newspapers tend to use their local sports teams as filler. > So you're saying that you don't know; and it's not a lot of use is it? No I'm saying it wasn't possible to know. You were the one who claimed it was. -- geni ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15/02/2011 18:17, Ian Woollard wrote: > On 15/02/2011, geni wrote: >> On 15 February 2011 16:19, Ian Woollard wrote: >>> Yeah, really. That page claims we only have 3% of notable Poles. Are you >>> really, seriously, telling me we only have 3% of ALL notable >>> biographies??? >>> Because that's what that page is assuming to calculate that 40 million. >> It's possible. Our coverage of say British MPs starts to fall apart >> pre-20th century. > But should each MP necessarily have his own biography? > Arguably the answer is "yes", back to the 16th century at least. There has actually been quite a lot of havoc onsite over stub MP biographies during the past year, but it transpires that there are pretty good sources back to 1660, and usually adequate sources in the century leading up to that (if you work at it). The ODNB took a decision not to include all MPs (it says somewhere, in terms that suggest that it was a decision that did at least require a moment's thought). Some parliaments of Henry VIII are apparently lacking lists of MPs, but after then it seems like a good use of WP to collate this information. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 15 February 2011 20:18, Charles Matthews wrote: > Arguably the answer is "yes", back to the 16th century at least. There > has actually been quite a lot of havoc onsite over stub MP biographies > during the past year, but it transpires that there are pretty good > sources back to 1660, and usually adequate sources in the century > leading up to that (if you work at it). The ODNB took a decision not to > include all MPs (it says somewhere, in terms that suggest that it was a > decision that did at least require a moment's thought). There is a project (even longer-running and slower-burning than the ODNB) to construct a reference work covering all MPs, at least as much as they're known, along with various other bits and pieces: http://www.histparl.ac.uk/about.html In the past sixty years, they've managed to cover a little over half the timeframe in twenty-eight (!) volumes. I have never seen their work, I admit, but I'd be intrigued to... -- - Andrew Gray andrew.g...@dunelm.org.uk ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Tue, Feb 15, 2011 at 8:56 PM, Andrew Gray wrote: > > There is a project (even longer-running and slower-burning than the > ODNB) to construct a reference work covering all MPs, at least as much > as they're known, along with various other bits and pieces: > > http://www.histparl.ac.uk/about.html > Or try http://en.wikipedia.org/wiki/History_of_Parliament for some explanation of its history. > In the past sixty years, they've managed to cover a little over half > the timeframe in twenty-eight (!) volumes. I have never seen their > work, I admit, but I'd be intrigued to... > I have the CD-Rom containing the volumes published up to 1998 and 12 volumes published since then are on a shelf just above the computer. They are very interesting studies, delving very deep into manuscript sources and using as their sources letters between various senior politicians preserved in the archives. They concentrate only on the subjects' Parliamentary and political activities, so for example the only mention of the diary of Samuel Pepys (MP for Castle Rising 1673-79, Harwich 1679 and 1685-88) is that Pepys stopped writing it before he became an MP. -- Sam Blacketer ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On Mon, Feb 14, 2011 at 9:54 PM, David Gerard wrote: > There's a *heck* of a lot still to be written. On that topic, I came across this interesting essay: http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth It tries to project to the year 2025! Carcharoth ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Rating the English wikipedia
On 16/02/2011, Carcharoth wrote: > I came across this interesting essay: > > http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth > > It tries to project to the year 2025! And fails spectacularly. The extended growth model seems pretty inaccurate, very over-optimistic: http://en.wikipedia.org/wiki/File:Enwikipediagrowthcomparison.PNG That graph hasn't been updated recently, but other graphs show that the Gompertz model is still tracking about as well as any simple model could do: http://en.wikipedia.org/wiki/File:EnwikipediagrowthGom.PNG although even that is looking perhaps very slightly pessimistic, but it's too early to be absolutely sure. But we can certainly I think, say with some justification, that the extended growth model is significantly off the mark. > Carcharoth -- -Ian Woollard ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l