Re: [Wiki-research-l] Regular contributor
--- El lun, 17/11/08, Platonides [EMAIL PROTECTED] escribió: De: Platonides [EMAIL PROTECTED] Asunto: Re: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: lunes, 17 noviembre, 2008 9:42 Felipe Ortega wrote: I also have my doubts about the filtering conditions. For instance, in eswiki, 'BOTpolicia' is not registered as such and it's responsible for more than 90.000 edits, so far. On the other hand, a famous user in eswiki (retired for this moment, id=13770 to be precise) He has returned, ~500 edits this week ;) Wow, this is getting interesting :D Filtering by number of edits/hour or similar may require a lot of time/resources, specially in larger Wikipedias, (sorry, but for my thesis I'm mainly focused on the top-ten Wikipedias :) ). The problem is that here you need the edits *per user*, not per page. I understand from the WikiXRay page that you're recreating the mediawiki tables. Yeap, but only as an initial stage. Then I create some new intermediate tables to speed up the data mining. It'd just to query each user contributions and check the time difference. With indexes in place, you would get a time good enough. When it may get terribly slow is if applying to all users, as you would make the algorithm quadratic. I agree, but then, we still would need some basic criteria to decide which users to probe to identify hidden bots. I suppose a good starting point would be looking for BOT patterns in the name ¿? Mmmm, or perhaps directly with the number of revisions. I will try to have a closer look at this after the thesis (I need to plan my next entertainments :) ). Cheers, F. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Interesting. So, in summary: - Most edits done by a small core - But, most of the text created by the long tail - However, most of the text that people actually read, was created by the small core Is that a good summary of what we know about this question? Alain -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Reid Priedhorsky Sent: November 16, 2008 9:50 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Platonides wrote: Desilets, Alain wrote: Regarding this, I have had heard different stories about contributors. I seem to recall one study that concluded that, while 85% of the **edits** are done by a small core of contributors, if you take a random page and select a sentence from it, this sentence is more likely to be the result of edits by contributors from the long tail than core contributors. I forget the reference for that study though. Does someone on this list have solid information about this? I think it's a fairly crucial piece of information that we should have a clear handle on as a research community. Alain It was a research by Aaron Swartz http://www.aaronsw.com/weblog/whowriteswikipedia I led a study last year that found that the long tail was even longer than it usually is (i.e., the elite contributors contribute even more than they would be expected to). Specifically, the 0.1% of editors who edited the most times contributed about half the value of Wikipedia, when value is measured by words times views. http://www-users.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf End of shameless plug. ;) Reid ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Sure, we have started a great migration of our website, so the old links does not work, yet. You can grab it from here: http://gsyc.es/~jfelipe/tmp/Ineq_Wikipedia.pdf Best. F. --- El lun, 17/11/08, Desilets, Alain [EMAIL PROTECTED] escribió: De: Desilets, Alain [EMAIL PROTECTED] Asunto: RE: [Wiki-research-l] Regular contributor Para: [EMAIL PROTECTED], Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: lunes, 17 noviembre, 2008 2:36 Thx. Do you have the URL, or title? I can't find it on the web. -Original Message- From: Felipe Ortega [mailto:[EMAIL PROTECTED] Sent: November 15, 2008 12:43 PM To: Research into Wikimedia content and communities; Desilets, Alain Subject: RE: [Wiki-research-l] Regular contributor --- El vie, 14/11/08, Desilets, Alain [EMAIL PROTECTED] escribió: De: Desilets, Alain [EMAIL PROTECTED] Asunto: RE: [Wiki-research-l] Regular contributor Para: [EMAIL PROTECTED], Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: viernes, 14 noviembre, 2008 2:32 Regarding this, I have had heard different stories about contributors. I seem to recall one study that concluded that, while 85% of the **edits** are done by a small core of contributors, if you take a random page and select a sentence from it, this sentence is more likely to be the result of edits by contributors from the long tail than core contributors. I forget the reference for that study though. Does someone on this list have solid information about this? I think it's a fairly crucial piece of information that we should have a clear handle on as a research community. Hi, Alain. Yes, the study is by Aaron Schwartz. It was a base premise in our last paper at HICSS 08, comparing his statement to the theory of Jimmy Wales about the core of very active users. Actually, both are right (more or less :) ). If you look at it from the per_user perspective, the core can be identified very precisely. But your question is focused on per_article statistics. It's logical to expect so, since the distribution of distinct authors per article follows a stepped power-law, and you have a lot of articles in the larger editions. If you pick an article at random, chances are that you will, most probably, pick one with few editors. Best, Felipe. Alain -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Felipe Ortega Sent: November 13, 2008 5:33 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor You have a very similar effect in larger Wikipedias. In those ones, there is no very active, single bus-like contributor, but a core of very active users concentrating about 85% of the total number of edits per month. It seems that in these languages, though, there is a generational relay in which new active users jump into the core to substitute those who eventually give up, for any reason. So, the concentration becomes stable after a couple of years (aprox.) and the encyclopedia is able to continue growing. Best. F. --- El jue, 23/10/08, Gerard Meijssen [EMAIL PROTECTED] escribió: De: Gerard Meijssen [EMAIL PROTECTED] Asunto: Re: [Wiki-research-l] Regular contributor Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: jueves, 23 octubre, 2008 10:27 Hoi, I missed that this was the research mailing list.. my fault. Consequently my answer was not appropriate. With this in mind, it is interesting to learn how the spread is in particularly the smaller projects. In my opinion there must be a certain amount of productive people in order to get to a community that does not have one person who is the bus factor. Having someone who drives the bus is really important. I wonder how you can point this person out. I think that someone who is just editing is important but it is not all that builds a community. Thanks, GerardM On the Volapuk wikipedia Smeira was really important. When he left, I understand that activity collapsed. 2008/10/22 phoebe ayers [EMAIL PROTECTED] 2008/10/21 Gerard Meijssen [EMAIL PROTECTED] Hoi, When you divide people up in groups, when you single out the ones most valuable, you in effect divide the community. Whatever you base your metrics on, there will be sound arguments to deny the point of view. When it is about the number of edits, it is clear to the pure encyclopedistas
Re: [Wiki-research-l] Regular contributor
--- El lun, 17/11/08, Desilets, Alain [EMAIL PROTECTED] escribió: De: Desilets, Alain [EMAIL PROTECTED] Asunto: Re: [Wiki-research-l] Regular contributor Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: lunes, 17 noviembre, 2008 3:00 Interesting. So, in summary: - Most edits done by a small core - But, most of the text created by the long tail - However, most of the text that people actually read, was created by the small core Is that a good summary of what we know about this question? I think so :). F. Alain -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Reid Priedhorsky Sent: November 16, 2008 9:50 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Platonides wrote: Desilets, Alain wrote: Regarding this, I have had heard different stories about contributors. I seem to recall one study that concluded that, while 85% of the **edits** are done by a small core of contributors, if you take a random page and select a sentence from it, this sentence is more likely to be the result of edits by contributors from the long tail than core contributors. I forget the reference for that study though. Does someone on this list have solid information about this? I think it's a fairly crucial piece of information that we should have a clear handle on as a research community. Alain It was a research by Aaron Swartz http://www.aaronsw.com/weblog/whowriteswikipedia I led a study last year that found that the long tail was even longer than it usually is (i.e., the elite contributors contribute even more than they would be expected to). Specifically, the 0.1% of editors who edited the most times contributed about half the value of Wikipedia, when value is measured by words times views. http://www-users.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf End of shameless plug. ;) Reid ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Desilets, Alain schrieb: Interesting. So, in summary: - Most edits done by a small core - But, most of the text created by the long tail - However, most of the text that people actually read, was created by the small core Is that a good summary of what we know about this question? Oh... that's pretty, I want to show that around! Care to, err, blog it? -- daniel ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
From the way that some of you have been carrying the discussion, it seems as if some here feel comfortable deriving generalizable claims that culd ring true across the Wikiverse, as if the very substance of certain Wikipedia articles wouldn't have an inherent and significant bearing on the demographic composition and communicative dynamics of online collaboration. I would urge others, as some have lightly alluded to already, to stay conscientious of idiosyncrasies that may exist across a multiplicity of Wikipedias. Since, I am somewhat out of the loop, I would be appreciativei if someone were able to corroborate this idea of cultures of knowledge production that vary from realm to realm in Wikipedia. There are also the real world, cultural variables which can be reproduced inside Wikipedia. Take for example the study which found that French Wikipedians were much less comfortable deleting others' contributions. http://jcmc.indiana.edu/vol12/issue1/pfeil.html Sincerely, Said Hamideh ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
--- El lun, 17/11/08, Desilets, Alain [EMAIL PROTECTED] escribió: One thing that struck me this AM is that, while most of Wikipedia MAY have been written by a small core, it is doubtful that you would have been able to recruit that small core without a massively collaborative platform. In other words, the magic of Wikipedia is that it is able to engage millions of people into creating it, some of whom will become part of that core. You're right, Alain, is the same effect that we have identified long ago in other massive collaborative projects (but not at the same level of success, I suspect) like Open Source development projects. This is only a replication of the same onion model identified by Crowston and Howison in: http://freesoftware.mit.edu/papers/crowstonhowison.pdf We have detected still other interesting similarities. Again I have to refer to my following thesis for that (I still need 3 more weeks and a couple of revisions from my advisor :S). Best, F. Alain PS: I don't have a blog yet. Shame on me ;-). ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
I understand the difficulty of dealing with anonymous edits, because many of them might be edits from registered users who simply did not bother to log on for that one edit. However, I think it is worth looking at how the conclusions might be affected under different scenarios for labelling those anonymous users. For example, one might assume that the bulk of anonymous edits are made by infrequent contributors who are part of the long tail, as opposed to the members of the core. Does that change anything to the conclusion that most of the value is produced by a small core? If the answer is that even this does not change the conclusions, then case is closed. But if turns out that the conclusion is sensitive to how you label anonymous, then it seems to me that the next research that needs to be carried out, is to try and characterise the degree to which anons are, or are not registered users who are part of the core. Alain ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Desilets, Alain wrote: I understand the difficulty of dealing with anonymous edits, because many of them might be edits from registered users who simply did not bother to log on for that one edit. However, I think it is worth looking at how the conclusions might be affected under different scenarios for labelling those anonymous users. For example, one might assume that the bulk of anonymous edits are made by infrequent contributors who are part of the long tail, as opposed to the members of the core. Does that change anything to the conclusion that most of the value is produced by a small core? If the answer is that even this does not change the conclusions, then case is closed. But if turns out that the conclusion is sensitive to how you label anonymous, then it seems to me that the next research that needs to be carried out, is to try and characterise the degree to which anons are, or are not registered users who are part of the core. Alain Anonymous are not part of the core. People in the small core do have accounts. They may have started as ips, but there're too many advantages on registering for regular users. Yes, it may be an edit by a long term user whose session timeouted, but he will log in for the next one. Also, he may be in the core on a different wiki (and editing anonymusly on a foreign one)*. Long-term wikipedians editing anonymously are long-term on another one or banned users coming with a different hat. Other reasons could be edits on insecure computers or people afraid of being recognised. *Addtion of SUL on wikimedia wikis will mitigate this. Disclaimer: These are my personal observations. So don't take it as a formal study. :) ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Ziko van Dijk wrote: My own concern with my definition is that it I should raise the minimum number of edits of a regular contributor. Also the period of observation should be longer. But that would make it more work to do the observation; counting ten edits is faster than using the user edit counter. Maybe a developer could create a tool that simplifies the work, with a human being only to be needed for telling who is a content contributor and not a Foreign helper. Well, on the user table there are the number of user edits and registering time, which would really filter it. (Note that some people registration is much earlier than real edit beginning, specially with SUL automatic account creations. Plus, if the first edit just creates a user page and there're no edits on 5 months, it may not really count. OTOH, an edit in talk or project should be as relevant as one on main. So perhaps exclude edits on User: and User talk?) ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
--- El vie, 14/11/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: RE: [Wiki-research-l] Regular contributor Para: 'Research into Wikimedia content and communities' wiki-research-l@lists.wikimedia.org, [EMAIL PROTECTED] Fecha: viernes, 14 noviembre, 2008 2:29 Hi Felipe, I can’t follow your reasoning how bots are insignificant. Just as Ziko pointed out, the matrix of bot contributions (and our general experience) tells otherwise. On larger wikipedias bots account for 5-30% of edits on smaller wikis anything up to 50-70% or even more in rare cases. Mmmm, then we have something really strange going on here. I thought I had a graph of the evolution of bots edits share with respect to the total number of edits by month, but I think I have to generate it again. However, my impression looking at temporal tables and results was not that high. Actually, I'm not the only one who stated that. Nikki Kittur, in another good paper: http://www.parc.com/research/publications/files/5904.pdf Pointed out the same, though for enwiki (and we haven't got figures to compare that). All in all, I think this does not affect our results or model since, as a bare minimum, I always add a where rev_user not in (select ug_user from user_groups where ug_group='bot') in my base queries. I will try to post a graph soon to have quantitative arguments, rather than mere impressions. Perhaps I'm missing something, but if so, I could not say, right now, what. Think of the bots that add interwiki links as primary example of activities that account for massive amount of edits. That's precisely why I was quite suprised/concerned about my findings. They are counterintuitive. These may be insignificant on popular articles with 1000’s of edits, but most articles have very few edits, ‘the long tail’ one might call it and there it adds up. Yep, dead right. Just right now, I'm not concentrating on per article statistics but per user ones. Best, F. Cheers, Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de http://stats.wikimedia.org/EN/TablesWikipediaDE.htm ja http://stats.wikimedia.org/EN/TablesWikipediaJA.htm fr http://stats.wikimedia.org/EN/TablesWikipediaFR.htm it http://stats.wikimedia.org/EN/TablesWikipediaIT.htm pl http://stats.wikimedia.org/EN/TablesWikipediaPL.htm es http://stats.wikimedia.org/EN/TablesWikipediaES.htm nl http://stats.wikimedia.org/EN/TablesWikipediaNL.htm pt http://stats.wikimedia.org/EN/TablesWikipediaPT.htm ru http://stats.wikimedia.org/EN/TablesWikipediaRU.htm zh http://stats.wikimedia.org/EN/TablesWikipediaZH.htm sv http://stats.wikimedia.org/EN/TablesWikipediaSV.htm fi http://stats.wikimedia.org/EN/TablesWikipediaFI.htm 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega [EMAIL PROTECTED] Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l
Re: [Wiki-research-l] Regular contributor
--- El vie, 14/11/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: RE: [Wiki-research-l] Regular contributor Para: 'Research into Wikimedia content and communities' wiki-research-l@lists.wikimedia.org, [EMAIL PROTECTED] Fecha: viernes, 14 noviembre, 2008 2:40 Many bots that are active on many wikis are not registered as such on smaller wikis. Therefore I treat any user name that is registered as bot on 10+ wikis as bot on all wikis. Seems very reasonable :). It is of course again an correction which is not 100% accurate, but close I might hope. Paraphrasing one of my research colleagues: it's better something than nothing at all :). Single User Logon can help in this respect some day. Wow, man. That would let my model jump to the speedlight. If only I were capable of tracing users among different languages... In theory we could spot some bots by their behavior, say a user that edits 24 hours per day, of manages 5 updates per second for a long time, or added thousands of articles in a short period. But I’m not sure it would be worth the effort, and it would low priority in any case. I also have my doubts about the filtering conditions. For instance, in eswiki, 'BOTpolicia' is not registered as such and it's responsible for more than 90.000 edits, so far. On the other hand, a famous user in eswiki (retired for this moment, id=13770 to be precise) is responsible for 100.000 edits, and was erroneously identified as a bot many times :). We have similar cases in other languages. Filtering by number of edits/hour or similar may require a lot of time/resources, specially in larger Wikipedias, (sorry, but for my thesis I'm mainly focused on the top-ten Wikipedias :) ). Honestly, I don't have a good answer for this right now. Best. F. Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de http://stats.wikimedia.org/EN/TablesWikipediaDE.htm ja http://stats.wikimedia.org/EN/TablesWikipediaJA.htm fr http://stats.wikimedia.org/EN/TablesWikipediaFR.htm it http://stats.wikimedia.org/EN/TablesWikipediaIT.htm pl http://stats.wikimedia.org/EN/TablesWikipediaPL.htm es http://stats.wikimedia.org/EN/TablesWikipediaES.htm nl http://stats.wikimedia.org/EN/TablesWikipediaNL.htm pt http://stats.wikimedia.org/EN/TablesWikipediaPT.htm ru http://stats.wikimedia.org/EN/TablesWikipediaRU.htm zh http://stats.wikimedia.org/EN/TablesWikipediaZH.htm sv http://stats.wikimedia.org/EN/TablesWikipediaSV.htm fi http://stats.wikimedia.org/EN/TablesWikipediaFI.htm 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega [EMAIL PROTECTED] Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https
Re: [Wiki-research-l] Regular contributor
--- El vie, 14/11/08, Desilets, Alain [EMAIL PROTECTED] escribió: De: Desilets, Alain [EMAIL PROTECTED] Asunto: RE: [Wiki-research-l] Regular contributor Para: [EMAIL PROTECTED], Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: viernes, 14 noviembre, 2008 2:32 Regarding this, I have had heard different stories about contributors. I seem to recall one study that concluded that, while 85% of the **edits** are done by a small core of contributors, if you take a random page and select a sentence from it, this sentence is more likely to be the result of edits by contributors from the long tail than core contributors. I forget the reference for that study though. Does someone on this list have solid information about this? I think it's a fairly crucial piece of information that we should have a clear handle on as a research community. Hi, Alain. Yes, the study is by Aaron Schwartz. It was a base premise in our last paper at HICSS 08, comparing his statement to the theory of Jimmy Wales about the core of very active users. Actually, both are right (more or less :) ). If you look at it from the per_user perspective, the core can be identified very precisely. But your question is focused on per_article statistics. It's logical to expect so, since the distribution of distinct authors per article follows a stepped power-law, and you have a lot of articles in the larger editions. If you pick an article at random, chances are that you will, most probably, pick one with few editors. Best, Felipe. Alain -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Felipe Ortega Sent: November 13, 2008 5:33 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor You have a very similar effect in larger Wikipedias. In those ones, there is no very active, single bus-like contributor, but a core of very active users concentrating about 85% of the total number of edits per month. It seems that in these languages, though, there is a generational relay in which new active users jump into the core to substitute those who eventually give up, for any reason. So, the concentration becomes stable after a couple of years (aprox.) and the encyclopedia is able to continue growing. Best. F. --- El jue, 23/10/08, Gerard Meijssen [EMAIL PROTECTED] escribió: De: Gerard Meijssen [EMAIL PROTECTED] Asunto: Re: [Wiki-research-l] Regular contributor Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: jueves, 23 octubre, 2008 10:27 Hoi, I missed that this was the research mailing list.. my fault. Consequently my answer was not appropriate. With this in mind, it is interesting to learn how the spread is in particularly the smaller projects. In my opinion there must be a certain amount of productive people in order to get to a community that does not have one person who is the bus factor. Having someone who drives the bus is really important. I wonder how you can point this person out. I think that someone who is just editing is important but it is not all that builds a community. Thanks, GerardM On the Volapuk wikipedia Smeira was really important. When he left, I understand that activity collapsed. 2008/10/22 phoebe ayers [EMAIL PROTECTED] 2008/10/21 Gerard Meijssen [EMAIL PROTECTED] Hoi, When you divide people up in groups, when you single out the ones most valuable, you in effect divide the community. Whatever you base your metrics on, there will be sound arguments to deny the point of view. When it is about the number of edits, it is clear to the pure encyclopedistas that most of the policy wonks have not supported what is the real aim of the project. When you label groups of people, you divide them and it is exactly the egalitarian aspect that makes the community thrive. But this isn't about labeling people for the rest of time and saying that this is how they are defined *on Wikipedia* -- it's about saying how do you study people who regularly contribute to Wikipedia, and as a part of that how do you define the group that you are studying, which is an important question for any research study. Given that it's impossible to study every contributor to the project in every study, and since many researchers are interested in why people who spend a lot of time or effort working on Wikipedia do so (and what exactly it is they do), this is a very relevant question for this list. --phoebe ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org
Re: [Wiki-research-l] Regular contributor
Hello all, I have written a blog post on preferential attachment. It could interest you: http://www.samarkande.com/blog/2008/10/09/wikipedia-et-lattachement-preferentiel/ The post is in French, sorry; but you will find in it links to Englis pages like this one: http://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2008-08-11/Growth_study And here is another link concerning participation (by the famous Jakob Nielsen): http://www.useit.com/alertbox/participation_inequality.html Cheers, — Emilie Ogez Marketing Communication Manager T: (+33) 01.45.42.40.90 Mob: (+33) 06.23.41.43.68 E: [EMAIL PROTECTED] http://www.xwiki.com http://www.wisestamp.com/ Chat: Skype: ogez.emilie Contact Me: [image: Linkedin] http://www.linkedin.com/pub/2/b53/128[image: Facebook]http://www.facebook.com/home.php#/profile.php?id=564738683ref=profile[image: Plaxo]http://www.plaxo.com/profile/show/77311292653?pk=136b7a032cd7d4ff113634e890ce08305df8e7cf[image: Twitter] http://twitter.com/eogez[image: Friendfeed]http://friendfeed.com/eogez --- @ WiseStamp Signature. http://www.wisestamp.com Get it nowhttp://www.wisestamp.com 2008/11/14 Desilets, Alain [EMAIL PROTECTED] Regarding this, I have had heard different stories about contributors. I seem to recall one study that concluded that, while 85% of the **edits** are done by a small core of contributors, if you take a random page and select a sentence from it, this sentence is more likely to be the result of edits by contributors from the long tail than core contributors. I forget the reference for that study though. Does someone on this list have solid information about this? I think it's a fairly crucial piece of information that we should have a clear handle on as a research community. Alain -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Felipe Ortega Sent: November 13, 2008 5:33 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor You have a very similar effect in larger Wikipedias. In those ones, there is no very active, single bus-like contributor, but a core of very active users concentrating about 85% of the total number of edits per month. It seems that in these languages, though, there is a generational relay in which new active users jump into the core to substitute those who eventually give up, for any reason. So, the concentration becomes stable after a couple of years (aprox.) and the encyclopedia is able to continue growing. Best. F. --- El jue, 23/10/08, Gerard Meijssen [EMAIL PROTECTED] escribió: De: Gerard Meijssen [EMAIL PROTECTED] Asunto: Re: [Wiki-research-l] Regular contributor Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Fecha: jueves, 23 octubre, 2008 10:27 Hoi, I missed that this was the research mailing list.. my fault. Consequently my answer was not appropriate. With this in mind, it is interesting to learn how the spread is in particularly the smaller projects. In my opinion there must be a certain amount of productive people in order to get to a community that does not have one person who is the bus factor. Having someone who drives the bus is really important. I wonder how you can point this person out. I think that someone who is just editing is important but it is not all that builds a community. Thanks, GerardM On the Volapuk wikipedia Smeira was really important. When he left, I understand that activity collapsed. 2008/10/22 phoebe ayers [EMAIL PROTECTED] 2008/10/21 Gerard Meijssen [EMAIL PROTECTED] Hoi, When you divide people up in groups, when you single out the ones most valuable, you in effect divide the community. Whatever you base your metrics on, there will be sound arguments to deny the point of view. When it is about the number of edits, it is clear to the pure encyclopedistas that most of the policy wonks have not supported what is the real aim of the project. When you label groups of people, you divide them and it is exactly the egalitarian aspect that makes the community thrive. But this isn't about labeling people for the rest of time and saying that this is how they are defined *on Wikipedia* -- it's about saying how do you study people who regularly contribute to Wikipedia, and as a part of that how do you define the group that you are studying, which is an important question for any research study. Given that it's impossible to study every contributor to the project in every study, and since many researchers are interested in why people who spend a lot of time or effort working on Wikipedia do so (and what exactly it is they do), this is a very relevant question for this list
Re: [Wiki-research-l] Regular contributor
Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm *de http://stats.wikimedia.org/EN/TablesWikipediaDE.htm* *jahttp://stats.wikimedia.org/EN/TablesWikipediaJA.htm * *fr http://stats.wikimedia.org/EN/TablesWikipediaFR.htm* *ithttp://stats.wikimedia.org/EN/TablesWikipediaIT.htm * *pl http://stats.wikimedia.org/EN/TablesWikipediaPL.htm* *eshttp://stats.wikimedia.org/EN/TablesWikipediaES.htm * *nl http://stats.wikimedia.org/EN/TablesWikipediaNL.htm* *pthttp://stats.wikimedia.org/EN/TablesWikipediaPT.htm * *ru http://stats.wikimedia.org/EN/TablesWikipediaRU.htm* *zhhttp://stats.wikimedia.org/EN/TablesWikipediaZH.htm * *sv http://stats.wikimedia.org/EN/TablesWikipediaSV.htm* *fihttp://stats.wikimedia.org/EN/TablesWikipediaFI.htm **8%**6%**22%**25%**26%**15%**29%**30%**26%**15%**23%**22%* The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega [EMAIL PROTECTED] Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Hi Felipe, I cant follow your reasoning how bots are insignificant. Just as Ziko pointed out, the matrix of bot contributions (and our general experience) tells otherwise. On larger wikipedias bots account for 5-30% of edits on smaller wikis anything up to 50-70% or even more in rare cases. Think of the bots that add interwiki links as primary example of activities that account for massive amount of edits. These may be insignificant on popular articles with 1000s of edits, but most articles have very few edits, the long tail one might call it and there it adds up. Cheers, Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de http://stats.wikimedia.org/EN/TablesWikipediaDE.htm ja http://stats.wikimedia.org/EN/TablesWikipediaJA.htm fr http://stats.wikimedia.org/EN/TablesWikipediaFR.htm it http://stats.wikimedia.org/EN/TablesWikipediaIT.htm pl http://stats.wikimedia.org/EN/TablesWikipediaPL.htm es http://stats.wikimedia.org/EN/TablesWikipediaES.htm nl http://stats.wikimedia.org/EN/TablesWikipediaNL.htm pt http://stats.wikimedia.org/EN/TablesWikipediaPT.htm ru http://stats.wikimedia.org/EN/TablesWikipediaRU.htm zh http://stats.wikimedia.org/EN/TablesWikipediaZH.htm sv http://stats.wikimedia.org/EN/TablesWikipediaSV.htm fi http://stats.wikimedia.org/EN/TablesWikipediaFI.htm 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega [EMAIL PROTECTED] Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Felipe, about you second argument, that not all bots are registered as such that (or not anymore, it may change): yes that is a problem. I can only hope that really active bots are caught and registered on large wikis. Many bots that are active on many wikis are not registered as such on smaller wikis. Therefore I treat any user name that is registered as bot on 10+ wikis as bot on all wikis. It is of course again an correction which is not 100% accurate, but close I might hope. Single User Logon can help in this respect some day. In theory we could spot some bots by their behavior, say a user that edits 24 hours per day, of manages 5 updates per second for a long time, or added thousands of articles in a short period. But Im not sure it would be worth the effort, and it would low priority in any case. Erik From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ziko van Dijk Sent: Thursday, November 13, 2008 23:37 To: [EMAIL PROTECTED]; Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Hello Felipe, Maybe we speak about different things now. At http://stats.wikimedia.org/EN/BotActivityMatrix.htm de http://stats.wikimedia.org/EN/TablesWikipediaDE.htm ja http://stats.wikimedia.org/EN/TablesWikipediaJA.htm fr http://stats.wikimedia.org/EN/TablesWikipediaFR.htm it http://stats.wikimedia.org/EN/TablesWikipediaIT.htm pl http://stats.wikimedia.org/EN/TablesWikipediaPL.htm es http://stats.wikimedia.org/EN/TablesWikipediaES.htm nl http://stats.wikimedia.org/EN/TablesWikipediaNL.htm pt http://stats.wikimedia.org/EN/TablesWikipediaPT.htm ru http://stats.wikimedia.org/EN/TablesWikipediaRU.htm zh http://stats.wikimedia.org/EN/TablesWikipediaZH.htm sv http://stats.wikimedia.org/EN/TablesWikipediaSV.htm fi http://stats.wikimedia.org/EN/TablesWikipediaFI.htm 8% 6% 22% 25% 26% 15% 29% 30% 26% 15% 23% 22% The bot share of all edits is not that insignificant. Ziko 2008/11/13 Felipe Ortega [EMAIL PROTECTED] Hi, Erik, and all. IMHO, it would be a good idea...but not definitely an urgent one. In our analyses on the top-ten Wikipedias, we found that bots contributions introduced very few noise in data (to be precise statistically, it was not significant at all). You also have the additional problem that some bots are not identified in the users_group table. My practical impression is that when you deal with overall figures, then bots are irrelevant. However, if you want to focus in special metrics like concentration indexes then their contribution DOES MATTER, since a very active bot in one month may ruin your measurments. Regards, Felipe. --- El mié, 22/10/08, Erik Zachte [EMAIL PROTECTED] escribió: De: Erik Zachte [EMAIL PROTECTED] Asunto: [Wiki-research-l] Regular contributor Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 22 octubre, 2008 9:55 Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. Also keep in mind that most figures in wikistats still include bot edits. IMO it becomes more and more urgent to present separate counts for humans and bots. For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Erik Zachte ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Dear Erik, On Wed, 22 Oct 2008, Erik Zachte wrote: [...] For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Interesting! I wonder why there is a discrepancy between the summary for the total number. Sigma total edits are 119M but Sigma manual edits are higher: 193M. As far as I skimmed the figures are ok for the individual languages. best regards Finn ___ Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Finn, thanks for your attentiveness. Figure 'Sigma total edits' (top left cell) was copied from an earlier calculation, unlike the other totals, which were calculated while building this table. But unlike this table the other table did not calculate monthly totals for months where a major language (in casu English) was not yet processed. See http://stats.wikimedia.org/EN/TablesWikipediaZZ.htm and you get my point. So to be precise: 'Sigma total edits' is actually 'Sigma total edits for all languages for which counts are available'. Fixed report is online. Someday we will have figures for the English Wikipedia, fingers crossed :) Cheers, Erik -Original Message- From: [EMAIL PROTECTED] [mailto:wiki- [EMAIL PROTECTED] On Behalf Of Finn Aarup Nielsen Sent: Thursday, October 23, 2008 13:12 To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Regular contributor Dear Erik, On Wed, 22 Oct 2008, Erik Zachte wrote: [...] For instance in eo: 54% of total edits for all time were bot edits, but most of these will be from recent years, so the percentage will be even higher for recent years. http://stats.wikimedia.org/EN/BotActivityMatrix.htm Interesting! I wonder why there is a discrepancy between the summary for the total number. Sigma total edits are 119M but Sigma manual edits are higher: 193M. As far as I skimmed the figures are ok for the individual languages. best regards Finn ___ Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Hoi, When you divide people up in groups, when you single out the ones most valuable, you in effect divide the community. Whatever you base your metrics on, there will be sound arguments to deny the point of view. When it is about the number of edits, it is clear to the pure encyclopedistas that most of the policy wonks have not supported what is the real aim of the project. When you label groups of people, you divide them and it is exactly the egalitarian aspect that makes the community thrive. It is when people put themselves apart when friction makes an appearance. A good example is the speed used for mindless speedy deletions as was documented in an episode of Not the Wikipedia Weekly. So it is not that I am not interested, it is that I find it a fundamentally bad idea that I am snarky about it.. Thanks, GerardM http://en.wiktionary.org/wiki/snarky 2008/10/22 Liam Wyatt [EMAIL PROTECTED] More to the point: What is the point to your agressive reply? If you're not interested in this thread then you are not obliged to be snarky about it. -Liam On 22/10/2008, at 4:10, Gerard Meijssen [EMAIL PROTECTED] wrote: Hoi, What is the point to the question, are regular contributors entitled to wear a halo or will they get wings to go with the halo ? Thanks, GerardM On Tue, Oct 21, 2008 at 5:52 PM, Ziko van Dijk [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hello, From time to time I ask myself (and others) what is a regular contributor to a Wikipedia language edition. According to Tell us about your Wikipedia the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a regular contributor if someone * made at least one edit in that week * obviously speaks Esperanto (is no foreign helper like someone who does Interwiki linking) * made his first edit at least six months ago * made at least ten edits at all My result was: 71, compared to 141 active users and 50 very active users (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgWiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Put the philosophical questions aside, "analytical" categories (rather than social categories) should be linked to your research questions. Analytical categories should thus not be universal in this sense, but rather are tied back to your research questions. I guess it is better to say, "I develop a way to define a 'regular contributor'in eo.WP" rather than "I calculated a..." because it is not a pure math calculation but a definition with your own making (and the following credits AND responsibility). The below is a point-to-point critique and suggestions... * made at least one edit in that week --It seems arbitrary to come up with a number within a certain time frame. Again, if you can come up with a distribution of edits over contributors, either through previous study or your study, that the contributors who match your profile have made 75% of the new edits in the past month (the time frame issue still needs to be sorted out about the frequency of edits), it will be much convincing * obviously speaks Esperanto (is no "foreign helper" like someone who does Interwiki linking) --If your research question is about actual content contributor in the strict sense, then you might "exclude" those foreign helpers. However, you have take that as limitation because you might lose those who provide foreign links then have real impact on the content. To my limited experience in Chinese Wikipedia, these happen quiet often in entries and issues that involve East Asian or Sino-US context. * made his first edit at least six months ago --Again, it seems arbitrary. If you can come up a distribution of users' contribution over time (i.e. frequency), you might be able to develop a matrix that can include certain amount of people that you call "regular contributors). You have to acknowledge that you exclude the newbies with this because you, again, cite previous research or use common sense, suggesting most of the newbies are not becoming "regular contributors". Still if you do so, you have to follow up on your research to see whether it is true that those newbies do become "regular contributors" will not have significant impact on your results and analysis. * made at least ten edits at all --Again, it seems arbitrary. Find the overall profile. Define your questions. Determine the selection threshold and be ready to defend your picks with previous research or common sense. Ziko van Dijk wrote: Hello, From time to time I ask myself (and others) what is a "regular contributor" to a Wikipedia language edition. According to "Tell us about your Wikipedia" the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a "regular contributor" if someone * made at least one edit in that week * obviously speaks Esperanto (is no "foreign helper" like someone who does Interwiki linking) * made his first edit at least six months ago * made at least ten edits at all My result was: 71, compared to 141 "active users" and 50 "very active users" (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk -- Liao,Han-Teng DPhil student at the OII(web) needs you(blog) ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Dear Han-Teng, Thank you for the substantial answer, which helps me to go on. My problem is that my technical skills are limited, and I am also looking for methods that can easily be applied by all Wikipedia researchers (and to all WPs). There is no problem to tell how many regular contributors vls.WP has, because they are only three guys who know each other well. I have counted with the help of Recent Changes, and looked closer at those Wikipedians who did at least one edit in one specific week. Otherwise I would not have known where to look. Maybe I should look longer that a week (like three months and then drop the six-months-ago-first-edit-criterion), but that would mean a lot of more work, at least in those bigger Wikipedias. I have chosen a minimum of 10 edits because Wikimedia Statistics does so for Wikipedians. It seems enough to see wether a person (usually an I.P.) shows interest only in one specific article he wants to set right, but is not interested in editing after that. By the way, if I would shorten the six months (first edit) to three, the number of regular contributors would raise from 71 to 80. May be suitable as well. I consider only speakers of the language concerned because only they can contribute sence having text (it does not matter whether they contribute a lot of content, but that they can do). The Foreign Helpers are very important, but secondary. They would not exist if speakers of the language had not created content etc. One cannot do interwiki linking and anti-vandalism if there is no WP or no article. Ziko 2008/10/22 Han-Teng Liao (OII) [EMAIL PROTECTED]: Put the philosophical questions aside, analytical categories (rather than social categories) should be linked to your research questions. Analytical categories should thus not be universal in this sense, but rather are tied back to your research questions. I guess it is better to say, I develop a way to define a 'regular contributor'in eo.WP rather than I calculated a... because it is not a pure math calculation but a definition with your own making (and the following credits AND responsibility). The below is a point-to-point critique and suggestions... * made at least one edit in that week --It seems arbitrary to come up with a number within a certain time frame. Again, if you can come up with a distribution of edits over contributors, either through previous study or your study, that the contributors who match your profile have made 75% of the new edits in the past month (the time frame issue still needs to be sorted out about the frequency of edits), it will be much convincing * obviously speaks Esperanto (is no foreign helper like someone who does Interwiki linking) --If your research question is about actual content contributor in the strict sense, then you might exclude those foreign helpers. However, you have take that as limitation because you might lose those who provide foreign links then have real impact on the content. To my limited experience in Chinese Wikipedia, these happen quiet often in entries and issues that involve East Asian or Sino-US context. * made his first edit at least six months ago --Again, it seems arbitrary. If you can come up a distribution of users' contribution over time (i.e. frequency), you might be able to develop a matrix that can include certain amount of people that you call regular contributors). You have to acknowledge that you exclude the newbies with this because you, again, cite previous research or use common sense, suggesting most of the newbies are not becoming regular contributors. Still if you do so, you have to follow up on your research to see whether it is true that those newbies do become regular contributors will not have significant impact on your results and analysis. * made at least ten edits at all --Again, it seems arbitrary. Find the overall profile. Define your questions. Determine the selection threshold and be ready to defend your picks with previous research or common sense. Ziko van Dijk wrote: Hello, From time to time I ask myself (and others) what is a regular contributor to a Wikipedia language edition. According to Tell us about your Wikipedia the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a regular contributor if someone * made at least one edit in that week * obviously speaks Esperanto (is no foreign helper like someone who does Interwiki linking) * made his first edit at least six months ago * made at least ten edits at all My result was: 71, compared to 141 active users and 50 very active users (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk -- Liao,Han-Teng DPhil student at the OII(web) needs you(blog) ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org
Re: [Wiki-research-l] Regular contributor
On Tuesday 21 October 2008, Ziko van Dijk wrote: ::Archived at: http://marc.info/[EMAIL PROTECTED] Hello, From time to time I ask myself (and others) what is a regular contributor to a Wikipedia language edition. How categories are constituted are central to the findings one claims. (As Han-Teng said, these are analytical categories and we are researchers and on a research list, meaning we're not making judgements of worth, but trying to understand a phenomenon.) If one looks at the whole line of research on elite v. bourgeoisie it turns out that researchers' finding differ based on how they define contribution (small tweaks, winnowing, talk page usage, integration/flow edits) and the classes of users (elite and bourgeoisie) -- this latter point about classes of users can be seen in (Ortega and Gonzalez-Barahona 2007, Ortega and Gonzalez-Barahona 2008). But, as a (not-very-active) Wikipedian, I'm grateful for all such contributions. In my usage of active users [1] and admins [2], I rely upon the natives' categorization ;). [1]:http://en.wikipedia.org/w/index.php?title=Wikipedia:Aboutoldid=216496280 [2]:http://en.wikipedia.org/w/index.php?title=Wikipedia:List_of_administratorsoldid=170097284 Ziko's definition sounds appropriate to me and I think it's a good question as this community at some point might want to move towards consistent definitions for such things. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
2008/10/21 Gerard Meijssen [EMAIL PROTECTED] Hoi, When you divide people up in groups, when you single out the ones most valuable, you in effect divide the community. Whatever you base your metrics on, there will be sound arguments to deny the point of view. When it is about the number of edits, it is clear to the pure encyclopedistas that most of the policy wonks have not supported what is the real aim of the project. When you label groups of people, you divide them and it is exactly the egalitarian aspect that makes the community thrive. But this isn't about labeling people for the rest of time and saying that this is how they are defined *on Wikipedia* -- it's about saying how do you study people who regularly contribute to Wikipedia, and as a part of that how do you define the group that you are studying, which is an important question for any research study. Given that it's impossible to study every contributor to the project in every study, and since many researchers are interested in why people who spend a lot of time or effort working on Wikipedia do so (and what exactly it is they do), this is a very relevant question for this list. --phoebe ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Hello, I have distinguished four ways of counting Wikipedians: - Wikimedia Statistics, with Wikipedians, active and very active users; like often, Zachte's Statistics are great, but easily misleading. -Looking at user pages with babel lists; but not all active people have babel lists (or user pages or are registered), and some people's only edit at all is creating a user page with a babel list. Often there are many babel lists indicating level zero, sometimes even more than native speakers. - Asking Wikipedians about what they know or what they estimate. For that, a definition is important, of course, especially for the bigger WPs. The small ones have few fluctuation. - Counting them according to the edits people make. I have tried to outline a workable definition, as I explained. My observations at Recent Changes show that in many tiny WPs (I call them Micro-WPs) most of the activity is vandalism, countervandalism and bot activity, mostly interwiki linking. The interwiki linking relates usually to geographical stubs. This is true also for nearly all human Foreign helpers: They took a picture of their home town, put it in Commons, and integrate it into articles of all language editions of that town (and the like). So, without the bot generated pseudo content there would hardly be any activity at all. In my definition it is not important whether a foreign helper is a native speaker, he can also contribute with a lower level. If necessary, I look at the kind of edits. In nearly all cases it was very obvious whether the edit was made knowing the language or not. (Certainly if considered only editors with at least 10 edits.) For example, I am not a native speaker of Dutch, and do not often contribute to nl.WP, but according to my edits and my definition I am a regular contributor of nl.WP, not a Foreign helper. Take vo.WP for example. According to WM Statistics, it has ca. 16 very active users a month. According to the babel lists, two persons indicate level 2, and three level 1. 58 incidate zero. Recent changes show that content contributions come only from the five people knowing Volapük. My own concern with my definition is that it I should raise the minimum number of edits of a regular contributor. Also the period of observation should be longer. But that would make it more work to do the observation; counting ten edits is faster than using the user edit counter. Maybe a developer could create a tool that simplifies the work, with a human being only to be needed for telling who is a content contributor and not a Foreign helper. Ziko P.S.: I must say that I find some reactions on this mailing list a little bit strange. I am simply asking what you think about my definition of a regular contributor, trying to get a better picture of Wikipedia language editions in comparison. I am willing to explain what I mean by this or that expression, and I stand open for all kind of suggestions to improve the definition. (Yes, a definition is finally subjective and depends on the researcher's interests.) Although I have become familiar with a number of language editions, I believe that the members of this mailing list know al lot about the issue and have ideas; and I received some good ideas for which I am grateful. But I do not see where I am dividing the community or imagine it too simple. Of course I present things first in a short version, that does not mean that I have not thought them through before asking others. (Maybe I understood some remarks wrongly, and vice versa.) 2008/10/22 Han-Teng Liao (OII) [EMAIL PROTECTED] Dear Ziko, No worries about limitations. The rule is usually simple. Acknowledge them or overcome them, but do not hide them. Still, I am not sure if your goal is a method to be applied by all Wikipedia researchers, you can do without strong empirical data. A universal method requires strong evidence, robust mechanism, or compelling story. May I suggest you if you know vls.WP version so well, you might want to start a model from that and collect necessary data for that particular version. Do not assume you will find no problem in the process. Since your methods seem to be very quantitative, you can try to start small from that. The time-edit distribution (71-80) explanation seems plausible, and that is exactly what I have suggested earlier about determining the threshold from the actual distribution. You might not have the whole distribution at this moment, but it sounds much better if you at least provide a concrete example to explain why you pick that number. Still, your definition will be much more definitive if you have solid overall data, previous study, etc. The more supporting material you have, the stronger the threshold number that you pick. (you then can change may be into more likely) Again, as for the foreign helpers, I do think it depends on contexts and the questions you are asking. Try to think how do you apply that model into
Re: [Wiki-research-l] Regular contributor
Hoi, What is the point to the question, are regular contributors entitled to wear a halo or will they get wings to go with the halo ? Thanks, GerardM On Tue, Oct 21, 2008 at 5:52 PM, Ziko van Dijk [EMAIL PROTECTED]wrote: Hello, From time to time I ask myself (and others) what is a regular contributor to a Wikipedia language edition. According to Tell us about your Wikipedia the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a regular contributor if someone * made at least one edit in that week * obviously speaks Esperanto (is no foreign helper like someone who does Interwiki linking) * made his first edit at least six months ago * made at least ten edits at all My result was: 71, compared to 141 active users and 50 very active users (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
More to the point: What is the point to your agressive reply? If you're not interested in this thread then you are not obliged to be snarky about it. -Liam On 22/10/2008, at 4:10, Gerard Meijssen [EMAIL PROTECTED] wrote: Hoi, What is the point to the question, are regular contributors entitled to wear a halo or will they get wings to go with the halo ? Thanks, GerardM On Tue, Oct 21, 2008 at 5:52 PM, Ziko van Dijk [EMAIL PROTECTED] wrote: Hello, From time to time I ask myself (and others) what is a regular contributor to a Wikipedia language edition. According to Tell us about your Wikipedia the definitions are quite different. At eo.WP I once checked a week long (in this August) who was making edits, and I calculated a regular contributor if someone * made at least one edit in that week * obviously speaks Esperanto (is no foreign helper like someone who does Interwiki linking) * made his first edit at least six months ago * made at least ten edits at all My result was: 71, compared to 141 active users and 50 very active users (Wikimedia Statistics, May 2008). What do you think about this definition? Kind regards Ziko van Dijk -- Ziko van Dijk NL-Silvolde ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Regular contributor
Liam Wyatt wrote: More to the point: What is the point to your agressive reply? If you're not interested in this thread then you are not obliged to be snarky about it. -Liam I don't think Gerard is trying to be aggressive. The point is, everyone has a different understanding of regular. It is inherently subjective, and there is no point in trying to agree on a definition. It makes more sense just to say explicitly, e.g. This study will focus on contributors who made more than 50 edits in the last year [or whatever]. on a case by case basis. Matt Flaschen ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l