Re: [Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Piotr Konieczny
I believe we have a number of studies which have shown that majority of 
content was written by the small minority of most active editors. This 
does not invalidate the comment about automated editing; bottom line - 
most of anything on Wikipedia, i.e. both content and non-content support 
infrastructure, was and is being done by a small group of very dedicated 
people.


--
Piotr Konieczny

"To be defeated and not submit, is victory; to be victorious and rest on one's 
laurels, is defeat." --Józef Pilsudski

On 10/28/2012 5:57 PM, Kerry Raymond wrote:

My comments on the top editors came from what I read here

http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits

Editors who use automated tools to do various little fixes can generate large 
edit counts. Of course it does not follow that all large-edit-count editors are 
doing this.

Sent from my iPad

On 29/10/2012, at 8:47 AM, "Yaroslav M. Blanter"  wrote:


On Mon, 29 Oct 2012 08:13:48 +1100, Kerry Raymond wrote:


As far as I can see most of the top 1 editors appear to be making
a lot of of their contributions in terms of administration and
quality
control (eg fighting vandalism) rather than in content. I think the
"long tail" of (good faith) editors are mostly contributing content
on
a range of topics that I believe will continue to grow. I believe
that
once a WYSIWYG editor for WP becomes available we will see a growth
in
the long tail of editors and the topics they write on because I think
wiki markup is a barrier for many people currently under-represented
in the demographics of WP editors.

I actually have quite the opposite impression. I think most of the top
contributors are actually creating content. I myself am somewhere in the
top 3000, and 90% of my edits are in the article space. I would be
interested to see a study on this if it exists.


I agree WP has moved into a new phase different from its earliest
years and probably its policies and processes might need to change to
reflect that. For example, it's fine to "be bold" with a stub, but
woe
betide the newbie editor that decides to be bold with a
well-developed
article whose current words may have been carefully crafted to
capture
the right nuances to keep all the warring factions happy. Personally
I
believe mature articles need more of a curated approach to
incorporate
new material contributed by anyone but where the edits are done by
more experienced editors of that topic. Not that they should be
"gatekeepers" but that the material be added in the right place and
in
a way that reflects prior agreements in relation to reflecting
differing viewpoints. I think the WP policy on mature articles should
be "be careful not to break what's already there".


With this I agree.

Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l




___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Kerry Raymond
My comments on the top editors came from what I read here

http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits

Editors who use automated tools to do various little fixes can generate large 
edit counts. Of course it does not follow that all large-edit-count editors are 
doing this.

Sent from my iPad

On 29/10/2012, at 8:47 AM, "Yaroslav M. Blanter"  wrote:

> On Mon, 29 Oct 2012 08:13:48 +1100, Kerry Raymond wrote:
> 
>> 
>> As far as I can see most of the top 1 editors appear to be making
>> a lot of of their contributions in terms of administration and 
>> quality
>> control (eg fighting vandalism) rather than in content. I think the
>> "long tail" of (good faith) editors are mostly contributing content 
>> on
>> a range of topics that I believe will continue to grow. I believe 
>> that
>> once a WYSIWYG editor for WP becomes available we will see a growth 
>> in
>> the long tail of editors and the topics they write on because I think
>> wiki markup is a barrier for many people currently under-represented
>> in the demographics of WP editors.
> 
> I actually have quite the opposite impression. I think most of the top 
> contributors are actually creating content. I myself am somewhere in the 
> top 3000, and 90% of my edits are in the article space. I would be 
> interested to see a study on this if it exists.
> 
>> 
>> I agree WP has moved into a new phase different from its earliest
>> years and probably its policies and processes might need to change to
>> reflect that. For example, it's fine to "be bold" with a stub, but 
>> woe
>> betide the newbie editor that decides to be bold with a 
>> well-developed
>> article whose current words may have been carefully crafted to 
>> capture
>> the right nuances to keep all the warring factions happy. Personally 
>> I
>> believe mature articles need more of a curated approach to 
>> incorporate
>> new material contributed by anyone but where the edits are done by
>> more experienced editors of that topic. Not that they should be
>> "gatekeepers" but that the material be added in the right place and 
>> in
>> a way that reflects prior agreements in relation to reflecting
>> differing viewpoints. I think the WP policy on mature articles should
>> be "be careful not to break what's already there".
>> 
> 
> With this I agree.
> 
> Cheers
> Yaroslav
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Yaroslav M. Blanter

On Sun, 28 Oct 2012 21:58:10 +1100, Laura Hale wrote:

On Sun, Oct 28, 2012 at 9:25 PM, Yaroslav M. Blanter  wrote:


I believe there are two different issues. The first is what is the
maximum possible number of articles (this is what I asked). For all
practical purposes (manpower we have, time until Wikipedia will
collapce and cease to exist, etc) we will only able to write a tiny
part of them. This is why media are discussing questions like
whether English Wikipedia will ever reach 5M articles. I think this
is a much more complex issue which has to do with the editor
retention dynamics and general lifetime of internet companies.


There is a lot of content missing.  The maximum could actually be
quite great.  There is a fair amount of material just not adequately
created to begin with.  It isnt just new notable topics in terms of
politicians, sport competitors, sports team seasons, hurricanes,
elections, etc. that can grow.   There are a huge fountain of
articles not created about these in pre-existing literature.  Beyond
that, valid spin-off articles do not yet exist for many topics. 
(Within my own framework, there are few articles on womens sports in 
a

country, and specific womens sports in a country.) [[Sport in
Kiribati]] does not exist, nor does [[Womens sport in Kiribati]]. 
And this goes down... With the way English Wikipedia is structured,
you could have an endless variety of these as topics get more and 
more

filled in.


Absolutely. As I mentioned, just today I created an article which 
contains about 50 redlinks, and these redlinked articles are clearly 
notable. The problem is that I currently seem to be the only editor on 
English Wikipedia qualified to write these articles, and I am more busy 
with other things, currently not available as well. I will probably not 
be able to accomplish even what I am doing not until Wikipedia ceases to 
exist or until I die, whatever comes earlier.


Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Yaroslav M. Blanter

On Mon, 29 Oct 2012 08:13:48 +1100, Kerry Raymond wrote:



As far as I can see most of the top 1 editors appear to be making
a lot of of their contributions in terms of administration and 
quality

control (eg fighting vandalism) rather than in content. I think the
"long tail" of (good faith) editors are mostly contributing content 
on
a range of topics that I believe will continue to grow. I believe 
that
once a WYSIWYG editor for WP becomes available we will see a growth 
in

the long tail of editors and the topics they write on because I think
wiki markup is a barrier for many people currently under-represented
in the demographics of WP editors.


I actually have quite the opposite impression. I think most of the top 
contributors are actually creating content. I myself am somewhere in the 
top 3000, and 90% of my edits are in the article space. I would be 
interested to see a study on this if it exists.




I agree WP has moved into a new phase different from its earliest
years and probably its policies and processes might need to change to
reflect that. For example, it's fine to "be bold" with a stub, but 
woe
betide the newbie editor that decides to be bold with a 
well-developed
article whose current words may have been carefully crafted to 
capture
the right nuances to keep all the warring factions happy. Personally 
I
believe mature articles need more of a curated approach to 
incorporate

new material contributed by anyone but where the edits are done by
more experienced editors of that topic. Not that they should be
"gatekeepers" but that the material be added in the right place and 
in

a way that reflects prior agreements in relation to reflecting
differing viewpoints. I think the WP policy on mature articles should
be "be careful not to break what's already there".



With this I agree.

Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Kerry Raymond
Hi, Richard! 

The reason I find the War of 1812 amusing as an example is simply because to me 
(an Australian) it's a completely unimportant subject. I am neither British nor 
American and it all occurred on the other side of the world to me; why should I 
know or care? Yet the American War of Independence (same parties a few years 
earlier) is important to Australian history because it caused Britain to decide 
to establish an  Australian colony.

Importance is very much subjective. It might well be that wars in which America 
participated are well-covered in Wikipedia but surely there were lots of other 
wars that aren't well-covered but are important to their region's history? And 
no doubt many readers of Wikipedia have no interest in wars at all and believe 
Seinfeld episodes and Britney Spears are important topics. Who is to be the 
arbiter of what is "important"? It seems to me that so long as someone finds a 
topic interesting and has a few sources to draw on, they might as well write a 
Wikipedia article about it.  If one person thinks the topic is interesting 
enough to invest the effort, odds on someone else will find it of interest. I 
write primarily local history material on WP and am often surprised at how 
often others join in with contributions to articles I have started. The reality 
is that stubs do get expanded and redlinks do lead to the creation of new 
articles even on topics that I would freely acknowledge are not the world's 
most important topics but nonetheless clearly of interest to some folk. And 
where there are writers for a topic, I believe there must also be readers. 

So that is why I disagree with your comment about WP being for the benefit of a 
few thousand editors and indifferent to what the public wants/needs. I'm not 
one of the top 1 editors. I'm just a reader of Wikipedia who one day 
started editing bits and pieces about the suburb I live in and my involvement 
grew very slowly from there. Isn't that the story for most WP editors? Editors 
are the "public"; they are not selected or certified in any way. WP makes it 
possible for any one to make small contributions which is far easier for the 
public to do than the previous model of needing to publish an entire book on 
the subject, which obviously requires a far greater expertise and thus far less 
representative of what the public wants/needs.

As far as I can see most of the top 1 editors appear to be making a lot of 
of their contributions in terms of administration and quality control (eg 
fighting vandalism) rather than in content. I think the "long tail" of (good 
faith) editors are mostly contributing content on a range of topics that I 
believe will continue to grow. I believe that once a WYSIWYG editor for WP 
becomes available we will see a growth in the long tail of editors and the 
topics they write on because I think wiki markup is a barrier for many people 
currently under-represented in the demographics of WP editors.

I agree WP has moved into a new phase different from its earliest years and 
probably its policies and processes might need to change to reflect that. For 
example, it's fine to "be bold" with a stub, but woe betide the newbie editor 
that decides to be bold with a well-developed article whose current words may 
have been carefully crafted to capture the right nuances to keep all the 
warring factions happy. Personally I believe mature articles need more of a 
curated approach to incorporate new material contributed by anyone but where 
the edits are done by more experienced editors of that topic. Not that they 
should be "gatekeepers" but that the material be added in the right place and 
in a way that reflects prior agreements in relation to reflecting differing 
viewpoints. I think the WP policy on mature articles should be "be careful not 
to break what's already there".

Sent from my iPad

On 29/10/2012, at 12:19 AM, Richard Jensen  wrote:

> I was the one who raised the 1812 example in the context of Wikipedia's 
> coverage of military history; see Richard Jensen, "Military History on the 
> Electronic Frontier: Wikipedia Fights the War of 1812," ''The Journal of 
> Military History'' 76#4  (October 2012): 523-556;  the page proofs (with some 
> typos) are online at
> http://www.americanhistoryprojects.com/downloads/JMH1812.PDF
> 
> My argument is that Wikipedia is written by and for the benefit of a few 
> thousand editors -- what the readers or the general public wants or thinks or 
> uses is largely irrelevant.
> 
> The growth then depends on the need to recruit new editors --
> using the details from the 1812 article I suggest that fewer and fewer new 
> editors are actually  interested. (I also looked at other major articles on 
> WWI, WWII, the American Civil War & others and found the same pattern.)
> 
> Look at it demographically: apart from teenage boys coming of age, the 
> population of computer-literate people who are ignorant of Wikipedia is very 
> small i

Re: [Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Steven Walling
On Sun, Oct 28, 2012 at 6:19 AM, Richard Jensen  wrote:

> Look at it demographically: apart from teenage boys coming of age, the
> population of computer-literate people who are ignorant of Wikipedia is
> very small indeed in 2012.  That was not true in 2005 when lots of editors
> joined up and did a lot of work on important articles.


You seem to be disregarding the entirety of the developing world and
non-English speakers in that statement.

-- 
Steven Walling
https://wikimediafoundation.org/
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] 1-year dump of English Wikipedia article ratings

2012-10-28 Thread Dario Taraborelli
no, that's based on textual feedback data from a small random sample of 
articles  [1] from the AFTv5 tests, not the current ratings (AFTv4)

[1] http://meta.wikimedia.org/wiki/Research:AFT


On Oct 27, 2012, at 2:13 PM, Taha Yasseri  wrote:

> Thanks Dario.
> I should also add your own CSCW'13  paper. Right?
> 
> On Sat, Oct 27, 2012 at 10:51 PM, Dario Taraborelli 
>  wrote:
> …and on a final note, this is an awesome work in progress that attempts to 
> classify Wikipedia articles based on a broad range of quality metrics 
> (including AFT ratings).
> 
> https://github.com/slaporte/qualityvis
> 
> 
> On Oct 27, 2012, at 1:42 PM, Dario Taraborelli  
> wrote:
> 
>> I forgot to mention Ashton Anderson's dataviz work based on AFTv4 data
>> 
>> https://graphics.stanford.edu/wikis/cs448b-11-fall/FP-AndersonAshton
>> 
>> On Oct 27, 2012, at 1:36 PM, Dario Taraborelli  
>> wrote:
>> 
>>> Taha,
>>> 
>>> other than the internal reports during the product dev phase [1] and some 
>>> occasional uses of this data in the literature, there hasn't been much work 
>>> on AFT ratings. To my knowledge, the best use of this data outside of WMF 
>>> is in Adam Hyland's work (he presented a study at Wikimania [2] and I think 
>>> he's working on a follow-up paper).
>>> 
>>> Dario
>>> 
>>> [1] http://www.mediawiki.org/wiki/Article_feedback/Research
>>> [2] http://en.wikipedia.org/wiki/User:Protonk/Article_Feedback
>>> 
>>> 
>>> On Oct 27, 2012, at 6:57 AM, Taha Yasseri  wrote:
>>> 
 Hi Dario,
 Thank you. That's indeed a very interesting data set.
 
 Is anyone aware of any study or analysis of this or similar data on 
 "article ratings"?
 Even a raw data analysis would be very helpful to set up a systematic 
 study. Unfortunately, I'm not update on the state of the art. 
 
 cheers,
 .Taha
 
 On Mon, Oct 22, 2012 at 10:51 PM, Dario Taraborelli 
  wrote:
 We've released a full, anonymized dump of article ratings (aka AFTv4) 
 collected over 1 year since the deployment of the tool on the entire 
 English Wikipedia (July 22, 2011 - July 22, 2012).
 
 http://thedatahub.org/en/dataset/wikipedia-article-ratings
 
 The dataset (which includes 11m unique article ratings along 4 dimensions) 
 is licensed under CC0 and supersedes the partial dumps originally hosted 
 on the dumps server. Real-time AFTv4 data remains available as usual via 
 the toolserver. Feel free to get in touch if you have any questions about 
 this data.
 
 Dario
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
 
 
 
 -- 
 .t
 
 ___
 Wiki-research-l mailing list
 Wiki-research-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>> 
>> 
> 
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> 
> 
> 
> 
> -- 
> .t
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] War of 1812 and all that

2012-10-28 Thread Richard Jensen
I was the one who raised the 1812 example in the context of 
Wikipedia's coverage of military history; see Richard Jensen, 
"Military History on the Electronic Frontier: Wikipedia Fights the 
War of 1812," ''The Journal of Military History'' 76#4  (October 
2012): 523-556;  the page proofs (with some typos) are online at

http://www.americanhistoryprojects.com/downloads/JMH1812.PDF

My argument is that Wikipedia is written by and for the benefit of a 
few thousand editors -- what the readers or the general public wants 
or thinks or uses is largely irrelevant.


The growth then depends on the need to recruit new editors --
using the details from the 1812 article I suggest that fewer and 
fewer new editors are actually  interested. (I also looked at other 
major articles on WWI, WWII, the American Civil War & others and 
found the same pattern.)


Look at it demographically: apart from teenage boys coming of age, 
the population of computer-literate people who are ignorant of 
Wikipedia is very small indeed in 2012.  That was not true in 2005 
when lots of editors joined up and did a lot of work on important articles.


So I think that military history at Wikipedia is pretty well 
saturated. That does not mean there are not more possible topics (we 
have about 130,000 articles (including stubs) now and major libraries 
will own maybe 100,000+ full length books on military topics).  I 
suggest that new editors need to have an attractive new niche that is 
not now well covered.  I suggest that they will have a very hard time 
finding such a niche that allows for the excitement of new writing 
about important topics. (such as took place in back in 
2005-2007).  Personally I greatly enjoyed writing about George 
Washington and Ulysses Grant and Napoleon--that's why I'm here.  I 
would have trouble explaining to someone why they should write up 
general #1001, #1002, #1103 ... let alone colonel #10,001, 10,002, 10,003 


Richard Jensen
User:Rjensen  email rjen...@uic.edu



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread emijrp
I'm with Kerry.

By the way, 120 million notable articles are possible, but the estimate is
far to be complete, so the real figure is greater for sure. I love these
discussions.

2012/10/28 Kerry Raymond 

> Re: the article. It seems to be one of a number of opinion pieces that
> uses the War of 1812 as its primary example. It must be some new scientific
> method: proof by War of 1812 :-)
>
> But more seriously, I think the potential for new articles in Wikipedia is
> limited only by the definition of notability, for which the primary
> requirement is some good quality sources. So the more that is written, the
> more there is to write about in Wikipedia. Even if we restricted ourselves
> to new articles on topics notable prior to 2013 (say), we would still have
> enormous growth potential.
>
> Generally Wikipedia has better coverage of contemporary topics than
> historical because the WWW provides easy access to more sources for topics
> of contemporary notability than for historic notability. But if every
> single episode of Seinfeld is notable (as it must be as each has a WP
> article!), then surely every book/song/poem/artwork that has ever been
> reviewed is notable too. and based on the apparent notability of current
> sports people and the results of what seems like every football season,
> tennis tournament, atheletics meet, etc, then surely history has plenty of
> equally notable articles on similar topics. Jousting tournaments in 1517 in
> Avignon, etc. What about race horses? A lot has been written on their
> pedigree, form and prospects for centuries. Lots of growth potential there
> too.
>
> History has a wealth of new articles for Wikipedia of at least the same
> notability as current subjects. Whether anyone wants to write them or
> anyone want to read them, only time will tell. Notability doesn't
> necessarily make something interesting to a modern reader. But there is a
> massive "long tail" of historically notable topics that could be written
> about.
>
>
> Sent from my iPad
>
> On 28/10/2012, at 8:55 PM, "Yaroslav M. Blanter"  wrote:
>
> > We have a new article in The Atlantic,
> >
> >
> http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurmountable-wikipedia-is-nearing-completion-in-a-sense/264111/
> >
> > (which btw I found following Dario's twitter, @ReaderMeter, which I
> recommend)
> >
> > and this is still the same story of whether we achieved the limit of
> what can be written etc). Without going into details of this animated
> debate (I have smth to say, for instance, I just created two articles which
> have about a hundred red links, and the material to fill in these red links
> is available, but this will lead us away from the topic), I am curious, if
> anybody ever tried to estimate what is the possible number of notable
> topics for articles. On the short time scale, it should grow linearly with
> time, since we have new sports events, elections, TW shows, movies, books
> etc, and many persons who previously not been notable become notable. Thus,
> this number must be
> >
> > N = a + b (t-2012),
> >
> > where a is the number of topics notable now, t is the time in years, and
> b is the number of new topics which become notable every year.
> >
> > Was there any research on what order of magnitude a and b have? I guess
> b must be in the order of dozens of thousands, since we are talking about
> people. What is a? Is it dominated by the number of species of insects, or
> cosmic bodies, or what?
> >
> > I tried to ask this question several years ago in Russian Wikipedia, but
> there was no concluding answer.
> >
> > Cheers
> > Yaroslav
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>



-- 
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT  |
StatMediaWiki
| WikiEvidens  |
WikiPapers
| WikiTeam 
Personal website: https://sites.google.com/site/emijrp/
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Yaroslav M. Blanter

On Sun, 28 Oct 2012 11:20:59 +0100, Pierre-Carl Langlais wrote:

Considering a, you have this fine study by Emijrp :
http://en.wikipedia.org/wiki/User:Emijrp/All_human_knowledge

Apparently a would be roughly around 120 000 000.

As media coverage and scientific research become most efficient every
year, I suspect that b is contantly growing, and follow a geometric
progression. Yet I don't know how to figure that out in concrete:
perhaps something like 100 000 * 1,05^n.

100 000 being a low estimation regarding the current growth of
knowledge (new biological species, new people, new political issues,
new scientific concepts and discoveries…).

PCL



Thanks, sounds reasonable.

Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Fae
Doesn't Russell's paradox apply here? Surely an appropriate lemma
would be that no finite number of notable subjects can ever be given.

Fae

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Laura Hale
On Sun, Oct 28, 2012 at 9:25 PM, Yaroslav M. Blanter wrote:

>
> I believe there are two different issues. The first is what is the maximum
> possible number of articles (this is what I asked). For all practical
> purposes (manpower we have, time until Wikipedia will collapce and cease to
> exist, etc) we will only able to write a tiny part of them. This is why
> media are discussing questions like whether English Wikipedia will ever
> reach 5M articles. I think this is a much more complex issue which has to
> do with the editor retention dynamics and general lifetime of internet
> companies.
>

There is a lot of content missing.  The maximum could actually be quite
great.  There is a fair amount of material just not adequately created to
begin with.  It isn't just new notable topics in terms of politicians,
sport competitors, sports team seasons, hurricanes, elections, etc. that
can grow.   There are a huge fountain of articles not created about these
in pre-existing literature.  Beyond that, valid spin-off articles do not
yet exist for many topics.  (Within my own framework, there are few
articles on women's sports in a country, and specific women's sports in a
country.) [[Sport in Kiribati]] does not exist, nor does [[Women's sport in
Kiribati]].  And this goes down... With the way English Wikipedia is
structured, you could have an endless variety of these as topics get more
and more filled in.

-- 
twitter: purplepopple
blog: ozziesport.com
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Kerry Raymond
Re: the article. It seems to be one of a number of opinion pieces that uses the 
War of 1812 as its primary example. It must be some new scientific method: 
proof by War of 1812 :-)

But more seriously, I think the potential for new articles in Wikipedia is 
limited only by the definition of notability, for which the primary requirement 
is some good quality sources. So the more that is written, the more there is to 
write about in Wikipedia. Even if we restricted ourselves to new articles on 
topics notable prior to 2013 (say), we would still have enormous growth 
potential.

Generally Wikipedia has better coverage of contemporary topics than historical 
because the WWW provides easy access to more sources for topics of contemporary 
notability than for historic notability. But if every single episode of 
Seinfeld is notable (as it must be as each has a WP article!), then surely 
every book/song/poem/artwork that has ever been reviewed is notable too. and 
based on the apparent notability of current sports people and the results of 
what seems like every football season, tennis tournament, atheletics meet, etc, 
then surely history has plenty of equally notable articles on similar topics. 
Jousting tournaments in 1517 in Avignon, etc. What about race horses? A lot has 
been written on their pedigree, form and prospects for centuries. Lots of 
growth potential there too.

History has a wealth of new articles for Wikipedia of at least the same 
notability as current subjects. Whether anyone wants to write them or anyone 
want to read them, only time will tell. Notability doesn't necessarily make 
something interesting to a modern reader. But there is a massive "long tail" of 
historically notable topics that could be written about.


Sent from my iPad

On 28/10/2012, at 8:55 PM, "Yaroslav M. Blanter"  wrote:

> We have a new article in The Atlantic,
> 
> http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurmountable-wikipedia-is-nearing-completion-in-a-sense/264111/
> 
> (which btw I found following Dario's twitter, @ReaderMeter, which I recommend)
> 
> and this is still the same story of whether we achieved the limit of what can 
> be written etc). Without going into details of this animated debate (I have 
> smth to say, for instance, I just created two articles which have about a 
> hundred red links, and the material to fill in these red links is available, 
> but this will lead us away from the topic), I am curious, if anybody ever 
> tried to estimate what is the possible number of notable topics for articles. 
> On the short time scale, it should grow linearly with time, since we have new 
> sports events, elections, TW shows, movies, books etc, and many persons who 
> previously not been notable become notable. Thus, this number must be
> 
> N = a + b (t-2012),
> 
> where a is the number of topics notable now, t is the time in years, and b is 
> the number of new topics which become notable every year.
> 
> Was there any research on what order of magnitude a and b have? I guess b 
> must be in the order of dozens of thousands, since we are talking about 
> people. What is a? Is it dominated by the number of species of insects, or 
> cosmic bodies, or what?
> 
> I tried to ask this question several years ago in Russian Wikipedia, but 
> there was no concluding answer.
> 
> Cheers
> Yaroslav
> 
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Dariusz Jemielniak
>
> I believe there are two different issues. The first is what is the maximum
> possible number of articles (this is what I asked). For all practical
> purposes (manpower we have, time until Wikipedia will collapce and cease to
> exist, etc) we will only able to write a tiny part of them. This is why
> media are discussing questions like whether English Wikipedia will ever
> reach 5M articles. I think this is a much more complex issue which has to
> do with the editor retention dynamics and general lifetime of internet
> companies.
>

Yup, my bad, misread - sorry :)

dj
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Yaroslav M. Blanter

On Sun, 28 Oct 2012 11:08:29 +0100, Dariusz Jemielniak wrote:

hi,

hmm... 


Was there any research on what order of magnitude a and b have? I
guess b must be in the order of dozens of thousands, since we are
talking about people. What is a? Is it dominated by the number of
species of insects, or cosmic bodies, or what?


but what would be the unit of measurement? Also, per analogiam: new
cameras resolution is improving from year to year. When exactly 
should

it stop?  Theres no easy answer, because all depends on how much you
think you should be able to magnify a picture without pixelization. 

Id say that for all practical purposes Wikipedia will be saturated
when the vast majority of searches is covered, and users find
abundance of information for whatever topic they research, and this
information is given to them at the exact level of sophistication
theyre ready to comprehend. Were waaay far from this ideal.

best,

dariusz


Hi Dariusz,

I do not understand your question. In my formula, a is measured in 
articled, and b is measured in articles per year, as detailed.


I believe there are two different issues. The first is what is the 
maximum possible number of articles (this is what I asked). For all 
practical purposes (manpower we have, time until Wikipedia will collapce 
and cease to exist, etc) we will only able to write a tiny part of them. 
This is why media are discussing questions like whether English 
Wikipedia will ever reach 5M articles. I think this is a much more 
complex issue which has to do with the editor retention dynamics and 
general lifetime of internet companies.


Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Pierre-Carl Langlais


Considering a, you have this fine study by Emijrp : 
http://en.wikipedia.org/wiki/User:Emijrp/All_human_knowledge

Apparently a would be roughly around 120 000 000.

As media coverage and scientific research become most efficient every  
year, I suspect that b is contantly growing, and follow a geometric  
progression. Yet I don't know how to figure that out in concrete:  
perhaps something like 100 000 * 1,05^n.


100 000 being a low estimation regarding the current growth of  
knowledge (new biological species, new people, new political issues,  
new scientific concepts and discoveries…).


PCL


We have a new article in The Atlantic,

http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurmountable-wikipedia-is-nearing-completion-in-a-sense/264111/

(which btw I found following Dario's twitter, @ReaderMeter, which I  
recommend)


and this is still the same story of whether we achieved the limit of  
what can be written etc). Without going into details of this  
animated debate (I have smth to say, for instance, I just created  
two articles which have about a hundred red links, and the material  
to fill in these red links is available, but this will lead us away  
from the topic), I am curious, if anybody ever tried to estimate  
what is the possible number of notable topics for articles. On the  
short time scale, it should grow linearly with time, since we have  
new sports events, elections, TW shows, movies, books etc, and many  
persons who previously not been notable become notable. Thus, this  
number must be


N = a + b (t-2012),

where a is the number of topics notable now, t is the time in years,  
and b is the number of new topics which become notable every year.


Was there any research on what order of magnitude a and b have? I  
guess b must be in the order of dozens of thousands, since we are  
talking about people. What is a? Is it dominated by the number of  
species of insects, or cosmic bodies, or what?


I tried to ask this question several years ago in Russian Wikipedia,  
but there was no concluding answer.


Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Dariusz Jemielniak
hi,

hmm...

Was there any research on what order of magnitude a and b have? I guess b
> must be in the order of dozens of thousands, since we are talking about
> people. What is a? Is it dominated by the number of species of insects, or
> cosmic bodies, or what?


but what would be the unit of measurement? Also, per analogiam: new
cameras' resolution is improving from year to year. When exactly should it
stop?  There's no easy answer, because all depends on how much you think
you should be able to magnify a picture without pixelization.

I'd say that for all practical purposes Wikipedia will be saturated when
the vast majority of searches is covered, and users find abundance of
information for whatever topic they research, and this information is given
to them at the exact level of sophistication they're ready to comprehend.
We're waaay far from this ideal.

best,

dariusz
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Total number of notable subjects?

2012-10-28 Thread Yaroslav M. Blanter

We have a new article in The Atlantic,

http://www.theatlantic.com/technology/archive/2012/10/surmounting-the-insurmountable-wikipedia-is-nearing-completion-in-a-sense/264111/

(which btw I found following Dario's twitter, @ReaderMeter, which I 
recommend)


and this is still the same story of whether we achieved the limit of 
what can be written etc). Without going into details of this animated 
debate (I have smth to say, for instance, I just created two articles 
which have about a hundred red links, and the material to fill in these 
red links is available, but this will lead us away from the topic), I am 
curious, if anybody ever tried to estimate what is the possible number 
of notable topics for articles. On the short time scale, it should grow 
linearly with time, since we have new sports events, elections, TW 
shows, movies, books etc, and many persons who previously not been 
notable become notable. Thus, this number must be


N = a + b (t-2012),

where a is the number of topics notable now, t is the time in years, 
and b is the number of new topics which become notable every year.


Was there any research on what order of magnitude a and b have? I guess 
b must be in the order of dozens of thousands, since we are talking 
about people. What is a? Is it dominated by the number of species of 
insects, or cosmic bodies, or what?


I tried to ask this question several years ago in Russian Wikipedia, 
but there was no concluding answer.


Cheers
Yaroslav

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l