Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 07/27/11 2:42 AM, Charles Matthews wrote: > On 27/07/2011 08:49, Ray Saintonge wrote: >> On 07/26/11 3:13 AM, Charles Matthews wrote: >>> On 20/07/2011 10:17, Ray Saintonge wrote: I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high. >>> Yes, that is one area where the material seems available to do much >>> more. >>> An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police. >>> On the other hand, the number of active Wikipedians who know where their >>> next 1000 articles are coming from is quite small, IMX. The emphasis on >>> enWP is hardly on being prolific: quality is more highly rated than >>> quantity. That may not be wrong, of course, but to some extent these >>> things are a matter of personal taste, and should remain so. We could do >>> with better support of the "good stub" concept, I think: probably an >>> example of "tacit knowledge" about the site, in that editors who have >>> been around for a while know what that means, while the manual pages >>> have a different slant. >>> >>> All discussions of the "notability" concept we use seem to end up with >>> the generally broken nature of the thing. It is just that there is no >>> snappy replacement. WP:GNG is a bit objectionable in the insistence on >>> "secondary sources"; it is not completely silly but is not that helpful >>> either when you start pushing the limits. >> Perhaps this requires a clearer description of what is essential to a >> good stub. > I think a discussion of the nature of "good stubs", in relation though > to what we know (or rather guess) about the "long tail" of reference > material that is "out there" in some form, sounds like an interesting > one to have, and not one I recall having before. Basically there are > things that (a) people could want to look up, (b) for which > "footnote"-style answers exist and are verifiable, and (c) could appear > at that sort of length in WP, where they would be an asset rather than > an embarrassment. And we still don't know that much about the whole > population of such things. In the shorter obituary notices of Gentleman's Magazine the information often follows a predictable pattern. To the extent that it is within predefined parameters it could fit well in a "List of ..." article. If a particular entry goes beyond that there is a strong argument that it warrants a stub article of its own. The notion that a second source be provided is often unsound. While there is always the possibility of hoax entries in these old magazines, such entries would still be a tiny segment of the overall content. The majority of contributors, then as now, do so in good faith. A stub from one of these broadly based national publications, will often only be mirrored in a local history that had a very small circulation. Those who complain about these stubs, are often unwilling to track down even relatively common references. >> The WP:GNG is opaque and bureaucratic. It is not suitable to much of >> the 19th century material that I have. "Notes and Queries is a >> fascinating publication where the readership answered questions posed >> by others. Providing other sources for this could be extremely >> difficult, and none of it comes close to being subject to BLP >> requirements > Ec ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 27/07/2011 08:49, Ray Saintonge wrote: > On 07/26/11 3:13 AM, Charles Matthews wrote: >> On 20/07/2011 10:17, Ray Saintonge wrote: >>> I missed reading this thread when it was active, but my own estimate of >>> what still needs to be done in historical biographies alone is quite >>> high. >> Yes, that is one area where the material seems available to do much >> more. >> >> >An estimate of 20,000,000 English >>> Wikipedia articles seems increasingly conservative. The amount of work >>> to be done is enormous even without having to fight with the notability >>> police. >> On the other hand, the number of active Wikipedians who know where their >> next 1000 articles are coming from is quite small, IMX. The emphasis on >> enWP is hardly on being prolific: quality is more highly rated than >> quantity. That may not be wrong, of course, but to some extent these >> things are a matter of personal taste, and should remain so. We could do >> with better support of the "good stub" concept, I think: probably an >> example of "tacit knowledge" about the site, in that editors who have >> been around for a while know what that means, while the manual pages >> have a different slant. >> >> All discussions of the "notability" concept we use seem to end up with >> the generally broken nature of the thing. It is just that there is no >> snappy replacement. WP:GNG is a bit objectionable in the insistence on >> "secondary sources"; it is not completely silly but is not that helpful >> either when you start pushing the limits. >> >> > Perhaps this requires a clearer description of what is essential to a > good stub. I think a discussion of the nature of "good stubs", in relation though to what we know (or rather guess) about the "long tail" of reference material that is "out there" in some form, sounds like an interesting one to have, and not one I recall having before. Basically there are things that (a) people could want to look up, (b) for which "footnote"-style answers exist and are verifiable, and (c) could appear at that sort of length in WP, where they would be an asset rather than an embarrassment. And we still don't know that much about the whole population of such things. > > The WP:GNG is opaque and bureaucratic. It is not suitable to much of > the 19th century material that I have. "Notes and Queries is a > fascinating publication where the readership answered questions posed > by others. Providing other sources for this could be extremely > difficult, and none of it comes close to being subject to BLP > requirements. > Yes, a kind of reference desk for those of largely antiquarian interests in the 19th century (and onwards). The GNG has plenty wrong with it in some topic areas, which is why specialised notability guides are written. I don't think it has yet come up in the form "for historical/antiquarian purposes, what is the minimum adequate kind of answer to a query?". One day I suppose we'll have an overview of "topic policy" based on a census of actual "topics". I think we'll have to get through our second decade before worrying about that, though. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 07/26/11 3:13 AM, Charles Matthews wrote: > On 20/07/2011 10:17, Ray Saintonge wrote: >> I missed reading this thread when it was active, but my own estimate of >> what still needs to be done in historical biographies alone is quite >> high. > Yes, that is one area where the material seems available to do much more. > > >An estimate of 20,000,000 English >> Wikipedia articles seems increasingly conservative. The amount of work >> to be done is enormous even without having to fight with the notability >> police. > On the other hand, the number of active Wikipedians who know where their > next 1000 articles are coming from is quite small, IMX. The emphasis on > enWP is hardly on being prolific: quality is more highly rated than > quantity. That may not be wrong, of course, but to some extent these > things are a matter of personal taste, and should remain so. We could do > with better support of the "good stub" concept, I think: probably an > example of "tacit knowledge" about the site, in that editors who have > been around for a while know what that means, while the manual pages > have a different slant. > > All discussions of the "notability" concept we use seem to end up with > the generally broken nature of the thing. It is just that there is no > snappy replacement. WP:GNG is a bit objectionable in the insistence on > "secondary sources"; it is not completely silly but is not that helpful > either when you start pushing the limits. > > Perhaps this requires a clearer description of what is essential to a good stub. The WP:GNG is opaque and bureaucratic. It is not suitable to much of the 19th century material that I have. "Notes and Queries is a fascinating publication where the readership answered questions posed by others. Providing other sources for this could be extremely difficult, and none of it comes close to being subject to BLP requirements. People who rate quality as more important than quantity fail to see the negative aspects of their condition. A simple "caveat lector" can be more reliable than any guarantee of accuracy. Ec ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 07/20/11 4:23 AM, Carcharoth wrote: > On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintonge wrote: >> I missed reading this thread when it was active, but my own estimate of >> what still needs to be done in historical biographies alone is quite >> high. > I agree, but some level of selectivity is needed. I now try and > maintain a list of articles I failed to find when looking for > information, and also of articles that are on other language > Wikipedias but not the English one. I'll post some of those at the > end. "Level of selectivity" too easily becomes an excuse for exclusion. Some of us feel that comprehensiveness is closer to the core values of Wikipedia. >> For most of its 177 years of publication "The Gentleman's >> Magazine". provided a steady diet of obituaries. If it averaged 1000 >> pages a year that's well over 170,000 pages of material. > A good start would be a listing along with how long the obituaries > are. You might find some are very short. The obvious thing to focus on > is ones where other sources exist, and keep the others as a project > list for now. Some are indeed too short to warrant individual articles. Perhaps the entire content of an issue's obituary (The publication uses the singular to refer to the entire collection of death notices in an issue.) needs to be added to Wikisource. I am looking at the October 1801 issue where there are many such stubs, as with an entry for August 16: "A poor old man, named Threadaway belonging to the workhouse at Newington, Surrey, employed in brewing beer for the use of the house, by some accident fell into the boiling liquor, and was scalded to death." This one is not likely to ever be expanded, but others easily have more useful information. >> What do we do with such things >> as the drawings of the proposed new gaol at Bury-St. Edmonds in the >> August 1801 issue of "The Gentleman's Magazine"? (Does it even still >> exist?) > You would first look for it in other sources, and then add it to the > history section or article for Bury-St. Edmonds. Not all material will > lend itself to a new article, and corroboration with other sources is > important. Corroboration from other sources should not always be such a necessity. When we are dealing with 200-year old information that corroboration is not such an easy task. Even when it exists it is not easily accessible, or will take a great deal of effort to track down. Sometimes you just need to trust your single source on the basis of your experience with the reliability of the source. Corroboration can wait for some other day, though our one source still needs to be fully identified. >> Then there's the endless stream of books that were reviewed in >> a wide range of 19th century periodicals. The reviews themselves are as >> worth reading as the books, because they often contrasted a number of >> publications around a chosen theme. > Eh. I'm less enthusiastic about book reviews. I'd transcribe them into > Wikisource and link them from the books they review (if the books have > articles, and if not, then move on). I would be less interested in the reviews than the books themselves. It is the books themselves that need articles. >> An estimate of 20,000,000 English >> Wikipedia articles seems increasingly conservative. The amount of work >> to be done is enormous even without having to fight with the notability >> police. > Sometimes other sites are better suited to some material. I would > start with Wikisource for some of the material you have mentioned. > > Anyway, a few examples of missing articles: > > Gunnarea capensis (marine polychaete worm) > Laboratoire Souterrain à Bas Bruit (LSBB, French research ) > Giovanni da Vigo (1450-1525, Italian surgeon) > > The latter two have articles on the French (fr) and Italian (it) > Wikipedia, so could be dealt with by translation efforts, but nothing > on the first example. Some of the more obscure branches of the tree of > life are replete with redlinks. > Absolutely! We can always easily find missing articles on an individual basis. It's the scope that's overwhelming. Ec ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 20/07/2011 10:17, Ray Saintonge wrote: > I missed reading this thread when it was active, but my own estimate of > what still needs to be done in historical biographies alone is quite > high. Yes, that is one area where the material seems available to do much more. >An estimate of 20,000,000 English > Wikipedia articles seems increasingly conservative. The amount of work > to be done is enormous even without having to fight with the notability > police. On the other hand, the number of active Wikipedians who know where their next 1000 articles are coming from is quite small, IMX. The emphasis on enWP is hardly on being prolific: quality is more highly rated than quantity. That may not be wrong, of course, but to some extent these things are a matter of personal taste, and should remain so. We could do with better support of the "good stub" concept, I think: probably an example of "tacit knowledge" about the site, in that editors who have been around for a while know what that means, while the manual pages have a different slant. All discussions of the "notability" concept we use seem to end up with the generally broken nature of the thing. It is just that there is no snappy replacement. WP:GNG is a bit objectionable in the insistence on "secondary sources"; it is not completely silly but is not that helpful either when you start pushing the limits. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintonge wrote: > I missed reading this thread when it was active, but my own estimate of > what still needs to be done in historical biographies alone is quite > high. I agree, but some level of selectivity is needed. I now try and maintain a list of articles I failed to find when looking for information, and also of articles that are on other language Wikipedias but not the English one. I'll post some of those at the end. > For most of its 177 years of publication "The Gentleman's > Magazine". provided a steady diet of obituaries. If it averaged 1000 > pages a year that's well over 170,000 pages of material. A good start would be a listing along with how long the obituaries are. You might find some are very short. The obvious thing to focus on is ones where other sources exist, and keep the others as a project list for now. > What do we do with such things > as the drawings of the proposed new gaol at Bury-St. Edmonds in the > August 1801 issue of "The Gentleman's Magazine"? (Does it even still > exist?) You would first look for it in other sources, and then add it to the history section or article for Bury-St. Edmonds. Not all material will lend itself to a new article, and corroboration with other sources is important. > Then there's the endless stream of books that were reviewed in > a wide range of 19th century periodicals. The reviews themselves are as > worth reading as the books, because they often contrasted a number of > publications around a chosen theme. Eh. I'm less enthusiastic about book reviews. I'd transcribe them into Wikisource and link them from the books they review (if the books have articles, and if not, then move on). > An estimate of 20,000,000 English > Wikipedia articles seems increasingly conservative. The amount of work > to be done is enormous even without having to fight with the notability > police. Sometimes other sites are better suited to some material. I would start with Wikisource for some of the material you have mentioned. Anyway, a few examples of missing articles: Gunnarea capensis (marine polychaete worm) Laboratoire Souterrain à Bas Bruit (LSBB, French research ) Giovanni da Vigo (1450-1525, Italian surgeon) The latter two have articles on the French (fr) and Italian (it) Wikipedia, so could be dealt with by translation efforts, but nothing on the first example. Some of the more obscure branches of the tree of life are replete with redlinks. Carcharoth ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 02/17/11 2:54 AM, WereSpielChequers wrote: > Even if the online resources didn't improve, and we could really do > with a big improvement in parts of the developing world, as long as > the Internet continues to be updated we can expect a steady flow of > new articles. Sports, Politics, popular culture and science are all > going to generate new articles for the foreseeable future. We > currently have half a million biographies of living people, assuming > we keep our current notability standards and coverage levels, then to > keep that number stable we can expect at least ten thousand more each > year. So even without filling in the historical gaps there will be a > steady increase in the total number of biographies on the pedia. > Large gaps in our coverage of people who retired pre-Internet are > slowly being filled in from the obituary pages, and that could > continue for decades. Every year there will be new films, books, > natural disasters and sports events. So if we still have an editor > community to write them, we can expect a steady flow of new articles. > I missed reading this thread when it was active, but my own estimate of what still needs to be done in historical biographies alone is quite high. For most of its 177 years of publication "The Gentleman's Magazine". provided a steady diet of obituaries. If it averaged 1000 pages a year that's well over 170,000 pages of material.I now also have the first 60 years of "Notes and Queries"; it was the kind of publication that a 19th century Wikipedian would have loved to work on. It includes all sorts of fascinating oddball material. "Who's Who" was followed by "Who Was Who" for deceased persons, but there were also more narrowly focused versions for different places, and different subject areas. Out of curiosity I looked up one surname in the Spanish language "Enciclopedia universal illustrada" Of the 30 persons with that surname enwp only had articles on 2, eswp only 1. What do we do with such things as the drawings of the proposed new gaol at Bury-St. Edmonds in the August 1801 issue of "The Gentleman's Magazine"? (Does it even still exist?) Then there's the endless stream of books that were reviewed in a wide range of 19th century periodicals. The reviews themselves are as worth reading as the books, because they often contrasted a number of publications around a chosen theme. An estimate of 20,000,000 English Wikipedia articles seems increasingly conservative. The amount of work to be done is enormous even without having to fight with the notability police. Ec ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 17 February 2011 10:54, WereSpielChequers wrote: > I think we need a model of article growth that blends two elements, > multiple bell curves showing the process of initially populating the > pedia with various subjects, and an annual input of new articles on > newly notable subjects. Sigmoid with a linear limit, i.e. more or less what we see? - d. ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
Even if the online resources didn't improve, and we could really do with a big improvement in parts of the developing world, as long as the Internet continues to be updated we can expect a steady flow of new articles. Sports, Politics, popular culture and science are all going to generate new articles for the foreseeable future. We currently have half a million biographies of living people, assuming we keep our current notability standards and coverage levels, then to keep that number stable we can expect at least ten thousand more each year. So even without filling in the historical gaps there will be a steady increase in the total number of biographies on the pedia. Large gaps in our coverage of people who retired pre-Internet are slowly being filled in from the obituary pages, and that could continue for decades. Every year there will be new films, books, natural disasters and sports events. So if we still have an editor community to write them, we can expect a steady flow of new articles. I think we need a model of article growth that blends two elements, multiple bell curves showing the process of initially populating the pedia with various subjects, and an annual input of new articles on newly notable subjects. I expect that on many subjects of interest to our first wave of editors - computing, milhist, contemporary western popular culture and the geography of the English speaking parts of the developed world we have already gone quite away over the top of the bell. But there are other bell curves that we are at much earlier stages of. Judging from the newpages I've seen in the last few months populated places in the Indian subcontinent is very much on the fast rising side of the bell curve. The bell curves of species, astronomical objects, chemicals, genes and chemicals are all in their early stages. In future as new editors come on board or existing editors acquire new enthusiasms we can expect that yet unwritten areas of the pedia will go through their own bell curve expansions. We still have a huge influx of new editors, though very few stick around. I suspect the ultimate size of the pedia depends at least as much on the way we treat new editors as it does on the availability of easily accessible sources. WereSpielChequers On 17 February 2011 09:38, Charles Matthews wrote: > On 16/02/2011 23:56, Carcharoth wrote: >> On Mon, Feb 14, 2011 at 9:54 PM, David Gerard wrote: >> >>> There's a *heck* of a lot still to be written. >> On that topic, I came across this interesting essay: >> >> http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth >> >> It tries to project to the year 2025! > I'd be interested in any discussion at all on the amount of useful > material out there (on the Web) and how it is changing. It is a fact > that there are more and more reliable sources posted that can be used to > create articles. This is a factor that affects directly what actually > gets written, as opposed to what potentially might be a topic to write > about. > > I think we just don't know how much will be around in 2025 that could > support our work, either in the form of public domain reference > material, or respectable scholarly webpages to which we can link. > Extrapolations leaving out this factor aren't worth as much as they > might be. > > Charles > > > ___ > WikiEN-l mailing list > WikiEN-l@lists.wikimedia.org > To unsubscribe from this mailing list, visit: > https://lists.wikimedia.org/mailman/listinfo/wikien-l > ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
[WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia
On 16/02/2011 23:56, Carcharoth wrote: > On Mon, Feb 14, 2011 at 9:54 PM, David Gerard wrote: > >> There's a *heck* of a lot still to be written. > On that topic, I came across this interesting essay: > > http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth > > It tries to project to the year 2025! I'd be interested in any discussion at all on the amount of useful material out there (on the Web) and how it is changing. It is a fact that there are more and more reliable sources posted that can be used to create articles. This is a factor that affects directly what actually gets written, as opposed to what potentially might be a topic to write about. I think we just don't know how much will be around in 2025 that could support our work, either in the form of public domain reference material, or respectable scholarly webpages to which we can link. Extrapolations leaving out this factor aren't worth as much as they might be. Charles ___ WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l