Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-31 Thread Ray Saintonge
On 07/27/11 2:42 AM, Charles Matthews wrote:
> On 27/07/2011 08:49, Ray Saintonge wrote:
>> On 07/26/11 3:13 AM, Charles Matthews wrote:
>>> On 20/07/2011 10:17, Ray Saintonge wrote:
 I missed reading this thread when it was active, but my own estimate of
 what still needs to be done in historical biographies alone is quite
 high.
>>> Yes, that is one area where the material seems available to do much
>>> more.
>>>
 An estimate of 20,000,000 English
 Wikipedia articles seems increasingly conservative.  The amount of work
 to be done is enormous even without having to fight with the notability
 police.
>>> On the other hand, the number of active Wikipedians who know where their
>>> next 1000 articles are coming from is quite small, IMX. The emphasis on
>>> enWP is hardly on being prolific: quality is more highly rated than
>>> quantity. That may not be wrong, of course, but to some extent these
>>> things are a matter of personal taste, and should remain so. We could do
>>> with better support of the "good stub" concept, I think: probably an
>>> example of "tacit knowledge" about the site, in that editors who have
>>> been around for a while know what that means, while the manual pages
>>> have a different slant.
>>>
>>> All discussions of the "notability" concept we use seem to end up with
>>> the generally broken nature of the thing. It is just that there is no
>>> snappy replacement. WP:GNG is a bit objectionable in the insistence on
>>> "secondary sources"; it is not completely silly but is not that helpful
>>> either when you start pushing the limits.
>> Perhaps this requires a clearer description of what is essential to a
>> good stub.
> I think a discussion of the nature of "good stubs", in relation though
> to what we know (or rather guess) about the "long tail" of reference
> material that is "out there" in some form, sounds like an interesting
> one to have, and not one I recall having before. Basically there are
> things that (a) people could want to look up, (b) for which
> "footnote"-style answers exist and are verifiable, and (c) could appear
> at that sort of length in WP, where they would be an asset rather than
> an embarrassment. And we still don't know that much about the whole
> population of such things.

In the shorter obituary notices of Gentleman's  Magazine the information 
often follows a predictable pattern.  To the extent that it is within 
predefined parameters it could fit well in a "List of ..." article.  If 
a particular entry goes beyond that there is a strong argument that it 
warrants a stub article of its own.  The notion that a second source be 
provided is often unsound. While there is always the possibility of hoax 
entries in these old magazines, such entries would still be a tiny 
segment of the overall content. The majority of contributors, then as 
now, do so in good faith. A stub from one of these broadly based 
national publications, will often only be mirrored in a local history 
that had a very small circulation.  Those who complain about these 
stubs, are often unwilling to track down even relatively common references.

>> The WP:GNG is opaque and bureaucratic. It is not suitable to much of
>> the 19th century material that I have.  "Notes and Queries is a
>> fascinating publication where the readership answered questions posed
>> by others. Providing other sources for this could be extremely
>> difficult, and none of it comes close to being subject to BLP
>> requirements
>
Ec


___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-27 Thread Charles Matthews
On 27/07/2011 08:49, Ray Saintonge wrote:
> On 07/26/11 3:13 AM, Charles Matthews wrote:
>> On 20/07/2011 10:17, Ray Saintonge wrote:
>>> I missed reading this thread when it was active, but my own estimate of
>>> what still needs to be done in historical biographies alone is quite
>>> high.
>> Yes, that is one area where the material seems available to do much 
>> more.
>>
>> >An estimate of 20,000,000 English
>>> Wikipedia articles seems increasingly conservative.  The amount of work
>>> to be done is enormous even without having to fight with the notability
>>> police.
>> On the other hand, the number of active Wikipedians who know where their
>> next 1000 articles are coming from is quite small, IMX. The emphasis on
>> enWP is hardly on being prolific: quality is more highly rated than
>> quantity. That may not be wrong, of course, but to some extent these
>> things are a matter of personal taste, and should remain so. We could do
>> with better support of the "good stub" concept, I think: probably an
>> example of "tacit knowledge" about the site, in that editors who have
>> been around for a while know what that means, while the manual pages
>> have a different slant.
>>
>> All discussions of the "notability" concept we use seem to end up with
>> the generally broken nature of the thing. It is just that there is no
>> snappy replacement. WP:GNG is a bit objectionable in the insistence on
>> "secondary sources"; it is not completely silly but is not that helpful
>> either when you start pushing the limits.
>>
>>
> Perhaps this requires a clearer description of what is essential to a 
> good stub.

I think a discussion of the nature of "good stubs", in relation though 
to what we know (or rather guess) about the "long tail" of reference 
material that is "out there" in some form, sounds like an interesting 
one to have, and not one I recall having before. Basically there are 
things that (a) people could want to look up, (b) for which 
"footnote"-style answers exist and are verifiable, and (c) could appear 
at that sort of length in WP, where they would be an asset rather than 
an embarrassment. And we still don't know that much about the whole 
population of such things.
>
> The WP:GNG is opaque and bureaucratic. It is not suitable to much of 
> the 19th century material that I have.  "Notes and Queries is a 
> fascinating publication where the readership answered questions posed 
> by others. Providing other sources for this could be extremely 
> difficult, and none of it comes close to being subject to BLP 
> requirements.
>
Yes, a kind of reference desk for those of largely antiquarian interests 
in the 19th century (and onwards). The GNG has plenty wrong with it in 
some topic areas, which is why specialised notability guides are 
written. I don't think it has yet come up in the form "for 
historical/antiquarian purposes, what is the minimum adequate kind of 
answer to a query?".

One day I suppose we'll have an overview of "topic policy" based on a 
census of actual "topics". I think we'll have to get through our second 
decade before worrying about that, though.

Charles



___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-27 Thread Ray Saintonge
On 07/26/11 3:13 AM, Charles Matthews wrote:
> On 20/07/2011 10:17, Ray Saintonge wrote:
>> I missed reading this thread when it was active, but my own estimate of
>> what still needs to be done in historical biographies alone is quite
>> high.
> Yes, that is one area where the material seems available to do much more.
>
>   >An estimate of 20,000,000 English
>> Wikipedia articles seems increasingly conservative.  The amount of work
>> to be done is enormous even without having to fight with the notability
>> police.
> On the other hand, the number of active Wikipedians who know where their
> next 1000 articles are coming from is quite small, IMX. The emphasis on
> enWP is hardly on being prolific: quality is more highly rated than
> quantity. That may not be wrong, of course, but to some extent these
> things are a matter of personal taste, and should remain so. We could do
> with better support of the "good stub" concept, I think: probably an
> example of "tacit knowledge" about the site, in that editors who have
> been around for a while know what that means, while the manual pages
> have a different slant.
>
> All discussions of the "notability" concept we use seem to end up with
> the generally broken nature of the thing. It is just that there is no
> snappy replacement. WP:GNG is a bit objectionable in the insistence on
> "secondary sources"; it is not completely silly but is not that helpful
> either when you start pushing the limits.
>
>
Perhaps this requires a clearer description of what is essential to a 
good stub.

The WP:GNG is opaque and bureaucratic. It is not suitable to much of the 
19th century material that I have.  "Notes and Queries is a fascinating 
publication where the readership answered questions posed by others. 
Providing other sources for this could be extremely difficult, and none 
of it comes close to being subject to BLP requirements.

People who rate quality as more important than quantity fail to see the 
negative aspects of their condition. A simple "caveat lector" can be 
more reliable than any guarantee of accuracy.

Ec


___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-27 Thread Ray Saintonge
On 07/20/11 4:23 AM, Carcharoth wrote:
> On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintonge  wrote:
>> I missed reading this thread when it was active, but my own estimate of
>> what still needs to be done in historical biographies alone is quite
>> high.
> I agree, but some level of selectivity is needed. I now try and
> maintain a list of articles I failed to find when looking for
> information, and also of articles that are on other language
> Wikipedias but not the English one. I'll post some of those at the
> end.

"Level of selectivity" too easily becomes an excuse for exclusion. Some 
of us feel that comprehensiveness is closer to the core values of 
Wikipedia.
>> For most of its 177 years of publication "The Gentleman's
>> Magazine". provided a steady diet of obituaries. If it averaged 1000
>> pages a year that's well over 170,000 pages of material.
> A good start would be a listing along with how long the obituaries
> are. You might find some are very short. The obvious thing to focus on
> is ones where other sources exist, and keep the others as a project
> list for now.

Some are indeed too short to warrant individual articles.  Perhaps the 
entire content of an issue's obituary (The publication uses the singular 
to refer to the entire collection of death notices in an issue.) needs 
to be added to Wikisource.  I am looking at the October 1801 issue where 
there are many such stubs, as with an entry for August 16: "A poor old 
man, named Threadaway belonging to the workhouse at Newington, Surrey, 
employed in brewing beer for the use of the house, by some accident fell 
into the boiling liquor, and was scalded to death."  This one is not 
likely to ever be expanded, but others easily have more useful information.
>> What do we do with such things
>> as the drawings of the proposed new gaol at Bury-St. Edmonds in the
>> August 1801 issue of "The Gentleman's Magazine"? (Does it even still
>> exist?)
> You would first look for it in other sources, and then add it to the
> history section or article for Bury-St. Edmonds. Not all material will
> lend itself to a new article, and corroboration with other sources is
> important.

Corroboration from other sources should not always be such a necessity. 
When we are dealing with 200-year old information that corroboration is 
not such an easy task.  Even when it exists it is not easily accessible, 
or will take a great deal of effort to track down.  Sometimes you just 
need to trust your single source on the basis of your experience with 
the reliability of the source. Corroboration can wait for some other 
day, though our one source still needs to be fully identified.

>> Then there's the endless stream of books that were reviewed in
>> a wide range of 19th century periodicals.  The reviews themselves are as
>> worth reading as the books, because they often contrasted a number of
>> publications around a chosen theme.
> Eh. I'm less enthusiastic about book reviews. I'd transcribe them into
> Wikisource and link them from the books they review (if the books have
> articles, and if not, then move on).

I would be less interested in the reviews than the books themselves.  It 
is the books themselves that need articles.

>> An estimate of 20,000,000 English
>> Wikipedia articles seems increasingly conservative.  The amount of work
>> to be done is enormous even without having to fight with the notability
>> police.
> Sometimes other sites are better suited to some material. I would
> start with Wikisource for some of the material you have mentioned.
>
> Anyway, a few examples of missing articles:
>
> Gunnarea capensis (marine polychaete worm)
> Laboratoire Souterrain à Bas Bruit (LSBB, French research )
> Giovanni da Vigo (1450-1525, Italian surgeon)
>
> The latter two have articles on the French (fr) and Italian (it)
> Wikipedia, so could be dealt with by translation efforts, but nothing
> on the first example. Some of the more obscure branches of the tree of
> life are replete with redlinks.
>
Absolutely! We can always easily find missing articles on an individual 
basis. It's the scope that's overwhelming.

Ec

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-26 Thread Charles Matthews
On 20/07/2011 10:17, Ray Saintonge wrote:
> I missed reading this thread when it was active, but my own estimate of
> what still needs to be done in historical biographies alone is quite
> high.

Yes, that is one area where the material seems available to do much more.

 >An estimate of 20,000,000 English
> Wikipedia articles seems increasingly conservative.  The amount of work
> to be done is enormous even without having to fight with the notability
> police.
On the other hand, the number of active Wikipedians who know where their 
next 1000 articles are coming from is quite small, IMX. The emphasis on 
enWP is hardly on being prolific: quality is more highly rated than 
quantity. That may not be wrong, of course, but to some extent these 
things are a matter of personal taste, and should remain so. We could do 
with better support of the "good stub" concept, I think: probably an 
example of "tacit knowledge" about the site, in that editors who have 
been around for a while know what that means, while the manual pages 
have a different slant.

All discussions of the "notability" concept we use seem to end up with 
the generally broken nature of the thing. It is just that there is no 
snappy replacement. WP:GNG is a bit objectionable in the insistence on 
"secondary sources"; it is not completely silly but is not that helpful 
either when you start pushing the limits.

Charles


___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-20 Thread Carcharoth
On Wed, Jul 20, 2011 at 10:17 AM, Ray Saintonge  wrote:

> I missed reading this thread when it was active, but my own estimate of
> what still needs to be done in historical biographies alone is quite
> high.

I agree, but some level of selectivity is needed. I now try and
maintain a list of articles I failed to find when looking for
information, and also of articles that are on other language
Wikipedias but not the English one. I'll post some of those at the
end.

> For most of its 177 years of publication "The Gentleman's
> Magazine". provided a steady diet of obituaries. If it averaged 1000
> pages a year that's well over 170,000 pages of material.

A good start would be a listing along with how long the obituaries
are. You might find some are very short. The obvious thing to focus on
is ones where other sources exist, and keep the others as a project
list for now.



> What do we do with such things
> as the drawings of the proposed new gaol at Bury-St. Edmonds in the
> August 1801 issue of "The Gentleman's Magazine"? (Does it even still
> exist?)

You would first look for it in other sources, and then add it to the
history section or article for Bury-St. Edmonds. Not all material will
lend itself to a new article, and corroboration with other sources is
important.

> Then there's the endless stream of books that were reviewed in
> a wide range of 19th century periodicals.  The reviews themselves are as
> worth reading as the books, because they often contrasted a number of
> publications around a chosen theme.

Eh. I'm less enthusiastic about book reviews. I'd transcribe them into
Wikisource and link them from the books they review (if the books have
articles, and if not, then move on).

> An estimate of 20,000,000 English
> Wikipedia articles seems increasingly conservative.  The amount of work
> to be done is enormous even without having to fight with the notability
> police.

Sometimes other sites are better suited to some material. I would
start with Wikisource for some of the material you have mentioned.

Anyway, a few examples of missing articles:

Gunnarea capensis (marine polychaete worm)
Laboratoire Souterrain à Bas Bruit (LSBB, French research )
Giovanni da Vigo (1450-1525, Italian surgeon)

The latter two have articles on the French (fr) and Italian (it)
Wikipedia, so could be dealt with by translation efforts, but nothing
on the first example. Some of the more obscure branches of the tree of
life are replete with redlinks.

Carcharoth

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-07-20 Thread Ray Saintonge
On 02/17/11 2:54 AM, WereSpielChequers wrote:
> Even if the online resources didn't improve, and we could really do
> with a big improvement in parts of the developing world, as long as
> the Internet continues to be updated we can expect a steady flow of
> new articles. Sports, Politics, popular culture and science are all
> going to generate new articles for the foreseeable future.  We
> currently have half a million biographies of living people, assuming
> we keep our current notability standards and coverage levels, then to
> keep that number stable  we can expect at least ten thousand more each
> year. So even without filling in the historical gaps there will be a
> steady increase in the total number of biographies on the pedia.
> Large gaps in our coverage of people who retired pre-Internet are
> slowly being filled in from the obituary pages, and that could
> continue for decades. Every year there will be new films, books,
> natural disasters and sports events.  So if we still have an editor
> community to write them, we can expect a steady flow of new articles.
>
I missed reading this thread when it was active, but my own estimate of 
what still needs to be done in historical biographies alone is quite 
high.  For most of its 177 years of publication "The Gentleman's 
Magazine". provided a steady diet of obituaries. If it averaged 1000 
pages a year that's well over 170,000 pages of material.I now also have 
the first 60 years of "Notes and Queries"; it was the kind of 
publication that a 19th century Wikipedian would have loved to work on. 
It includes all sorts of fascinating oddball material.  "Who's Who" was 
followed by "Who Was Who" for deceased persons, but there were also more 
narrowly focused versions for different places, and different subject 
areas. Out of curiosity I looked up one surname in the Spanish language 
"Enciclopedia universal illustrada" Of the 30 persons with that surname 
enwp only had articles on 2, eswp only 1. What do we do with such things 
as the drawings of the proposed new gaol at Bury-St. Edmonds in the 
August 1801 issue of "The Gentleman's Magazine"? (Does it even still 
exist?)  Then there's the endless stream of books that were reviewed in 
a wide range of 19th century periodicals.  The reviews themselves are as 
worth reading as the books, because they often contrasted a number of 
publications around a chosen theme.  An estimate of 20,000,000 English 
Wikipedia articles seems increasingly conservative.  The amount of work 
to be done is enormous even without having to fight with the notability 
police.

Ec

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-02-17 Thread David Gerard
On 17 February 2011 10:54, WereSpielChequers
 wrote:

> I think we need a model of article growth that blends two elements,
> multiple bell curves showing the process of  initially populating the
> pedia with various subjects, and an annual input of new articles on
> newly notable subjects.


Sigmoid with a linear limit, i.e. more or less what we see?


- d.

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


Re: [WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-02-17 Thread WereSpielChequers
Even if the online resources didn't improve, and we could really do
with a big improvement in parts of the developing world, as long as
the Internet continues to be updated we can expect a steady flow of
new articles. Sports, Politics, popular culture and science are all
going to generate new articles for the foreseeable future.  We
currently have half a million biographies of living people, assuming
we keep our current notability standards and coverage levels, then to
keep that number stable  we can expect at least ten thousand more each
year. So even without filling in the historical gaps there will be a
steady increase in the total number of biographies on the pedia.
Large gaps in our coverage of people who retired pre-Internet are
slowly being filled in from the obituary pages, and that could
continue for decades. Every year there will be new films, books,
natural disasters and sports events.  So if we still have an editor
community to write them, we can expect a steady flow of new articles.

I think we need a model of article growth that blends two elements,
multiple bell curves showing the process of  initially populating the
pedia with various subjects, and an annual input of new articles on
newly notable subjects. I expect that on many subjects of interest to
our first wave of editors - computing, milhist, contemporary western
popular culture and the geography of the English speaking parts of the
developed world we have already gone quite away over the top of the
bell. But there are other bell curves that we are at much earlier
stages of. Judging from the newpages I've seen in the last few months
populated places in the Indian subcontinent is very much on the fast
rising side of the bell curve. The bell curves of species,
astronomical objects, chemicals, genes and chemicals are all in their
early stages. In future as new editors come on board or existing
editors acquire new enthusiasms we can expect that yet unwritten areas
of the pedia will go through their own bell curve expansions.

We still have a huge influx of new editors, though very few stick
around. I suspect the ultimate size of the pedia depends at least as
much on the way we treat new editors as it does on the availability of
easily accessible sources.

WereSpielChequers

On 17 February 2011 09:38, Charles Matthews
 wrote:
> On 16/02/2011 23:56, Carcharoth wrote:
>> On Mon, Feb 14, 2011 at 9:54 PM, David Gerard  wrote:
>>
>>> There's a *heck* of a lot still to be written.
>> On that topic, I came across this interesting essay:
>>
>> http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
>>
>> It tries to project to the year 2025!
> I'd be interested in any discussion at all on the amount of useful
> material  out there (on the Web) and how it is changing. It is a fact
> that there are more and more reliable sources posted that can be used to
> create articles. This is a factor that affects directly what actually
> gets written, as opposed to what potentially might be a topic to write
> about.
>
> I think we just don't know how much will be around in 2025 that could
> support our work, either in the form of public domain reference
> material, or respectable scholarly webpages to which we can link.
> Extrapolations leaving out this factor aren't worth as much as they
> might be.
>
> Charles
>
>
> ___
> WikiEN-l mailing list
> WikiEN-l@lists.wikimedia.org
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>

___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l


[WikiEN-l] Scale of online resources, was Re: Rating the English wikipedia

2011-02-17 Thread Charles Matthews
On 16/02/2011 23:56, Carcharoth wrote:
> On Mon, Feb 14, 2011 at 9:54 PM, David Gerard  wrote:
>
>> There's a *heck* of a lot still to be written.
> On that topic, I came across this interesting essay:
>
> http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia_extended_growth
>
> It tries to project to the year 2025!
I'd be interested in any discussion at all on the amount of useful 
material  out there (on the Web) and how it is changing. It is a fact 
that there are more and more reliable sources posted that can be used to 
create articles. This is a factor that affects directly what actually 
gets written, as opposed to what potentially might be a topic to write 
about.

I think we just don't know how much will be around in 2025 that could 
support our work, either in the form of public domain reference 
material, or respectable scholarly webpages to which we can link. 
Extrapolations leaving out this factor aren't worth as much as they 
might be.

Charles


___
WikiEN-l mailing list
WikiEN-l@lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l