Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Haifeng , While some suggests the dumps or notice boards, my immediate thought was a database query, e.g., through Quarry. It just happens that Jonathan T. Morgan has created a query there: https://quarry.wmflabs.org/query/310 SELECT user_id, user_name, user_registration, user_editcount FROM enwiki_p.user WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 DAY),'%Y%m%d%H%i%s') AND user_editcount > 10 AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE ug_group = 'bot') AND user_name not in (SELECT REPLACE(log_title,"_"," ") from enwiki_p.logging where log_type = "block" and log_action = "block" and log_timestamp > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2 DAY),'%Y%m%d%H%i%s')); You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork here https://quarry.wmflabs.org/query/34256 querying for month, - as another example. Finn Årup Nielsen http://people.compute.dtu.dk/faan/ On 12/03/2019 19:18, Haifeng Zhang wrote: Hi folks, My work needs to randomly sample new editors in each month, e.g., 100 editors per month. Do any of you have good suggestions for how to do this efficiently? I could think of using the dump files, but wonder are there other options? Thanks, Haifeng Zhang ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Let's do it. On Tue, Mar 12, 2019 at 3:04 PM Pine W wrote: > > Leila, can we discuss this off list? > > Thanks, > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Tue, Mar 12, 2019 at 9:29 PM Leila Zia wrote: > > > On Tue, Mar 12, 2019 at 1:56 PM Pine W wrote: > > > > > > Hi Leila, I believe that I asked for more information regarding Heifeng's > > > work. > > > > You stated > > > > "However, if you're planning to send surveys or messages to them, > > sending them barnstars, or otherwise manipulating their on-wiki > > experience, that would be problematic." > > > > and I'm suggesting that you enter from a question angle, please. > > > > > > There has been discussion on English Wikipedia regarding volunteers > > > being unhappy with the interventions or proposed interventions of > > > researchers. I think that asking about the nature of Haifeng's research > > is > > > legitimate, and I tried to provide some examples of possible types of > > > research. > > > > Please check your email. There was no question there in the part > > related to this discussion. Also, even if there was a question posed, > > I highly recommend you enter from a different angle to these > > conversations. There are many reasons someone may need the sampled > > data of newcomers. A few examples: they may want to test the > > assumption whether the arrivals (registrations) to a specific > > Wikipedia language follow a Poisson process or not, they may want to > > learn about the distribution of topics editors in a given language > > edit in the first 24 hours after they open the account, they may want > > to build a prediction model to predict whether the editor will make > > the n-th edit or not given that they have started at time x, they may > > want to see whether external events have strong correlations with > > account registration and Wikipedia activity, they may want to see if > > the change to HTTPS had impact on registrations, etc. There are > > literally millions of questions people may ask (given that the data is > > available to them) with respect to Wikipedia. The answer to some of > > them may require interaction with Wikipedia editors, the answer to > > some may not. So the safest bet to start having a fruitful > > conversation is to ask: can you tell us more about what you're trying > > to do? > > > > > I'm trying to protect the community from problematic > > > interventions, while also welcoming research that is accepted by the > > > community. > > > > I understand and I'm looking forward to having conversations with you > > all about how to achieve that. > > > > Best, > > Leila > > > > ___ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Leila, can we discuss this off list? Thanks, Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Tue, Mar 12, 2019 at 9:29 PM Leila Zia wrote: > On Tue, Mar 12, 2019 at 1:56 PM Pine W wrote: > > > > Hi Leila, I believe that I asked for more information regarding Heifeng's > > work. > > You stated > > "However, if you're planning to send surveys or messages to them, > sending them barnstars, or otherwise manipulating their on-wiki > experience, that would be problematic." > > and I'm suggesting that you enter from a question angle, please. > > > There has been discussion on English Wikipedia regarding volunteers > > being unhappy with the interventions or proposed interventions of > > researchers. I think that asking about the nature of Haifeng's research > is > > legitimate, and I tried to provide some examples of possible types of > > research. > > Please check your email. There was no question there in the part > related to this discussion. Also, even if there was a question posed, > I highly recommend you enter from a different angle to these > conversations. There are many reasons someone may need the sampled > data of newcomers. A few examples: they may want to test the > assumption whether the arrivals (registrations) to a specific > Wikipedia language follow a Poisson process or not, they may want to > learn about the distribution of topics editors in a given language > edit in the first 24 hours after they open the account, they may want > to build a prediction model to predict whether the editor will make > the n-th edit or not given that they have started at time x, they may > want to see whether external events have strong correlations with > account registration and Wikipedia activity, they may want to see if > the change to HTTPS had impact on registrations, etc. There are > literally millions of questions people may ask (given that the data is > available to them) with respect to Wikipedia. The answer to some of > them may require interaction with Wikipedia editors, the answer to > some may not. So the safest bet to start having a fruitful > conversation is to ask: can you tell us more about what you're trying > to do? > > > I'm trying to protect the community from problematic > > interventions, while also welcoming research that is accepted by the > > community. > > I understand and I'm looking forward to having conversations with you > all about how to achieve that. > > Best, > Leila > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
There are thousands and thousands of editors with multiple accounts. Those who have been bothered to add a category are listed at https://en.wikipedia.org/wiki/Category:Wikipedians_with_alternative_accounts Many editors who engage in outreach are advised to create new accounts for themselves regularly, simply because the experience of new account creation changes over time and helping users streamline that (especially in situations such as editathons) requires thorough knowledge of account creation and the things that can make it go wrong. Pretty much a prerequisite for the old accountcreator userright https://en.wikipedia.org/wiki/Wikipedia:Account_creator (which I've had on several occasions) and the new eventcoordinator userright https://en.wikipedia.org/wiki/Wikipedia:Event_coordinator (which is too new for me to have had yet). cheers stuart -- ...let us be heard from red core to black sky On Wed, 13 Mar 2019 at 10:40, Isaac Johnson wrote: > > Yes, thanks for the clarification Stuart. I don't know of any statistics to > suggest how widespread this is, but it might be worth checking, especially > if you are focusing on editors with higher edit counts (who I suspect are > more likely to have multiple accounts for licit reasons). > > On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates wrote: > > > Note that this code deals with accounts, not editors, which is what > > Haifeng asked for. > > > > There are many reasons, both licit and illicit for editors to have > > more than one account. I know I have more than ten for > > policy-compliant reasons. > > > > cheers > > stuart > > > > > > -- > > ...let us be heard from red core to black sky > > > > On Wed, 13 Mar 2019 at 10:21, Isaac Johnson wrote: > > > > > > Hey Haifeng, > > > If you decide to process the dumps, you should be able to easily > > repurpose > > > some quick code that I wrote for a similar project: > > > > > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover > > > > > > Notably, I'd suggest using the stub history dumps as they are much > > smaller > > > because they do not include the actual content. For instance, for March > > 1st > > > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), > > this > > > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB. > > > > > > Best, > > > Isaac > > > > > > On Tue, Mar 12, 2019 at 3:56 PM Pine W wrote: > > > > > > > Hi Haifeng, thanks for the information. I think that your idea of > > looking > > > > in the dumps makes sense. Am I understanding correctly that you would > > like > > > > advice regarding how to do that in the most efficient way? > > > > > > > > Hi Leila, I believe that I asked for more information regarding > > Heifeng's > > > > work. There has been discussion on English Wikipedia regarding > > volunteers > > > > being unhappy with the interventions or proposed interventions of > > > > researchers. I think that asking about the nature of Haifeng's > > research is > > > > legitimate, and I tried to provide some examples of possible types of > > > > research. I'm trying to protect the community from problematic > > > > interventions, while also welcoming research that is accepted by the > > > > community. > > > > > > > > Pine > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang > > > > > > wrote: > > > > > > > > > Pine and Stuart, > > > > > > > > > > I meant extracting a random sample of new editors (month by month) > > from > > > > > Wikipedia edit history. > > > > > > > > > > It is not about survey of new editors, but still thanks for your > > > > > suggestions. > > > > > > > > > > > > > > > Thanks, > > > > > Haifeng Zhang > > > > > > > > > > Postdoctoral Research Fellow > > > > > Human-Computer Interaction Institute > > > > > Carnegie Mellon University > > > > > > > > > > From: Wiki-research-l > > on > > > > > behalf of Stuart A. Yeates > > > > > Sent: Tuesday, March 12, 2019 3:46:19 PM > > > > > To: Research into Wikimedia content and communities > > > > > Subject: Re: [Wiki-research-l] Sampling new editors in English > > Wikipedia > > > > > > > > > > There are a number of new-editor-heavy noticeboards. I would suggest > > > > > posting an invite there to your survey (or whatever) If you ask for > > > > > editor's usernames you can filter out those who don't meet your > > > > > definition of 'new' > > > > > > > > > > I'm thinking of places like: > > > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and > > > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk > > > > > > > > > > cheers > > > > > stuart > > > > > > > > > > > > > > > -- > > > > > ...let us be heard from red core to black sky > > > > > > > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > > > > > > > > > > > Hi Pine, > > > > > > > > > > > > Haifeng has a simple question about how to sample editors other > > than > > > > > > via dumps. It
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Yes, thanks for the clarification Stuart. I don't know of any statistics to suggest how widespread this is, but it might be worth checking, especially if you are focusing on editors with higher edit counts (who I suspect are more likely to have multiple accounts for licit reasons). On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates wrote: > Note that this code deals with accounts, not editors, which is what > Haifeng asked for. > > There are many reasons, both licit and illicit for editors to have > more than one account. I know I have more than ten for > policy-compliant reasons. > > cheers > stuart > > > -- > ...let us be heard from red core to black sky > > On Wed, 13 Mar 2019 at 10:21, Isaac Johnson wrote: > > > > Hey Haifeng, > > If you decide to process the dumps, you should be able to easily > repurpose > > some quick code that I wrote for a similar project: > > > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover > > > > Notably, I'd suggest using the stub history dumps as they are much > smaller > > because they do not include the actual content. For instance, for March > 1st > > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), > this > > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB. > > > > Best, > > Isaac > > > > On Tue, Mar 12, 2019 at 3:56 PM Pine W wrote: > > > > > Hi Haifeng, thanks for the information. I think that your idea of > looking > > > in the dumps makes sense. Am I understanding correctly that you would > like > > > advice regarding how to do that in the most efficient way? > > > > > > Hi Leila, I believe that I asked for more information regarding > Heifeng's > > > work. There has been discussion on English Wikipedia regarding > volunteers > > > being unhappy with the interventions or proposed interventions of > > > researchers. I think that asking about the nature of Haifeng's > research is > > > legitimate, and I tried to provide some examples of possible types of > > > research. I'm trying to protect the community from problematic > > > interventions, while also welcoming research that is accepted by the > > > community. > > > > > > Pine > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang > > > > wrote: > > > > > > > Pine and Stuart, > > > > > > > > I meant extracting a random sample of new editors (month by month) > from > > > > Wikipedia edit history. > > > > > > > > It is not about survey of new editors, but still thanks for your > > > > suggestions. > > > > > > > > > > > > Thanks, > > > > Haifeng Zhang > > > > > > > > Postdoctoral Research Fellow > > > > Human-Computer Interaction Institute > > > > Carnegie Mellon University > > > > > > > > From: Wiki-research-l > on > > > > behalf of Stuart A. Yeates > > > > Sent: Tuesday, March 12, 2019 3:46:19 PM > > > > To: Research into Wikimedia content and communities > > > > Subject: Re: [Wiki-research-l] Sampling new editors in English > Wikipedia > > > > > > > > There are a number of new-editor-heavy noticeboards. I would suggest > > > > posting an invite there to your survey (or whatever) If you ask for > > > > editor's usernames you can filter out those who don't meet your > > > > definition of 'new' > > > > > > > > I'm thinking of places like: > > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and > > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk > > > > > > > > cheers > > > > stuart > > > > > > > > > > > > -- > > > > ...let us be heard from red core to black sky > > > > > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > > > > > > > > > Hi Pine, > > > > > > > > > > Haifeng has a simple question about how to sample editors other > than > > > > > via dumps. It would be great if someone who knows the answer to > help > > > > > them to move forward. > > > > > > > > > > If you are interested to learn more about their research, instead > of > > > > > answering their question, my recommendation would be to start the > > > > > conversation with: "can you tell us more about your research?" > kind of > > > > > question. I find the current way of communication very speculative, > > > > > and that is not good for making a vibrant research community that > can > > > > > help us address some of our big questions. > > > > > > > > > > Best, > > > > > Leila > > > > > > > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W > wrote: > > > > > > > > > > > > Hi, can you expand on what you mean by "sample"? If you're > referring > > > to > > > > > > analyzing users' edit histories then that should be fine. > However, if > > > > > > you're planning to send surveys or messages to them, sending them > > > > > > barnstars, or otherwise manipulating their on-wiki experience, > that > > > > would > > > > > > be problematic. > > > > > > > > > > > > Pine > > > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > > > > > > > On Tue, Mar 12, 2019
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Note that this code deals with accounts, not editors, which is what Haifeng asked for. There are many reasons, both licit and illicit for editors to have more than one account. I know I have more than ten for policy-compliant reasons. cheers stuart -- ...let us be heard from red core to black sky On Wed, 13 Mar 2019 at 10:21, Isaac Johnson wrote: > > Hey Haifeng, > If you decide to process the dumps, you should be able to easily repurpose > some quick code that I wrote for a similar project: > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover > > Notably, I'd suggest using the stub history dumps as they are much smaller > because they do not include the actual content. For instance, for March 1st > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB. > > Best, > Isaac > > On Tue, Mar 12, 2019 at 3:56 PM Pine W wrote: > > > Hi Haifeng, thanks for the information. I think that your idea of looking > > in the dumps makes sense. Am I understanding correctly that you would like > > advice regarding how to do that in the most efficient way? > > > > Hi Leila, I believe that I asked for more information regarding Heifeng's > > work. There has been discussion on English Wikipedia regarding volunteers > > being unhappy with the interventions or proposed interventions of > > researchers. I think that asking about the nature of Haifeng's research is > > legitimate, and I tried to provide some examples of possible types of > > research. I'm trying to protect the community from problematic > > interventions, while also welcoming research that is accepted by the > > community. > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang > > wrote: > > > > > Pine and Stuart, > > > > > > I meant extracting a random sample of new editors (month by month) from > > > Wikipedia edit history. > > > > > > It is not about survey of new editors, but still thanks for your > > > suggestions. > > > > > > > > > Thanks, > > > Haifeng Zhang > > > > > > Postdoctoral Research Fellow > > > Human-Computer Interaction Institute > > > Carnegie Mellon University > > > > > > From: Wiki-research-l on > > > behalf of Stuart A. Yeates > > > Sent: Tuesday, March 12, 2019 3:46:19 PM > > > To: Research into Wikimedia content and communities > > > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia > > > > > > There are a number of new-editor-heavy noticeboards. I would suggest > > > posting an invite there to your survey (or whatever) If you ask for > > > editor's usernames you can filter out those who don't meet your > > > definition of 'new' > > > > > > I'm thinking of places like: > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk > > > > > > cheers > > > stuart > > > > > > > > > -- > > > ...let us be heard from red core to black sky > > > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > > > > > > > Hi Pine, > > > > > > > > Haifeng has a simple question about how to sample editors other than > > > > via dumps. It would be great if someone who knows the answer to help > > > > them to move forward. > > > > > > > > If you are interested to learn more about their research, instead of > > > > answering their question, my recommendation would be to start the > > > > conversation with: "can you tell us more about your research?" kind of > > > > question. I find the current way of communication very speculative, > > > > and that is not good for making a vibrant research community that can > > > > help us address some of our big questions. > > > > > > > > Best, > > > > Leila > > > > > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > > > > > > > > > Hi, can you expand on what you mean by "sample"? If you're referring > > to > > > > > analyzing users' edit histories then that should be fine. However, if > > > > > you're planning to send surveys or messages to them, sending them > > > > > barnstars, or otherwise manipulating their on-wiki experience, that > > > would > > > > > be problematic. > > > > > > > > > > Pine > > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang < > > haife...@andrew.cmu.edu > > > > > > > > > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > My work needs to randomly sample new editors in each month, e.g., > > 100 > > > > > > editors per month. > > > > > > > > > > > > Do any of you have good suggestions for how to do this efficiently? > > > > > > > > > > > > I could think of using the dump files, but wonder are there other > > > options? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Haifeng Zhang > > > > > > ___ > > > > > > Wiki-research-l mailing list > > >
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
On Tue, Mar 12, 2019 at 1:56 PM Pine W wrote: > > Hi Leila, I believe that I asked for more information regarding Heifeng's > work. You stated "However, if you're planning to send surveys or messages to them, sending them barnstars, or otherwise manipulating their on-wiki experience, that would be problematic." and I'm suggesting that you enter from a question angle, please. > There has been discussion on English Wikipedia regarding volunteers > being unhappy with the interventions or proposed interventions of > researchers. I think that asking about the nature of Haifeng's research is > legitimate, and I tried to provide some examples of possible types of > research. Please check your email. There was no question there in the part related to this discussion. Also, even if there was a question posed, I highly recommend you enter from a different angle to these conversations. There are many reasons someone may need the sampled data of newcomers. A few examples: they may want to test the assumption whether the arrivals (registrations) to a specific Wikipedia language follow a Poisson process or not, they may want to learn about the distribution of topics editors in a given language edit in the first 24 hours after they open the account, they may want to build a prediction model to predict whether the editor will make the n-th edit or not given that they have started at time x, they may want to see whether external events have strong correlations with account registration and Wikipedia activity, they may want to see if the change to HTTPS had impact on registrations, etc. There are literally millions of questions people may ask (given that the data is available to them) with respect to Wikipedia. The answer to some of them may require interaction with Wikipedia editors, the answer to some may not. So the safest bet to start having a fruitful conversation is to ask: can you tell us more about what you're trying to do? > I'm trying to protect the community from problematic > interventions, while also welcoming research that is accepted by the > community. I understand and I'm looking forward to having conversations with you all about how to achieve that. Best, Leila ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Hey Haifeng, If you decide to process the dumps, you should be able to easily repurpose some quick code that I wrote for a similar project: https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover Notably, I'd suggest using the stub history dumps as they are much smaller because they do not include the actual content. For instance, for March 1st and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB. Best, Isaac On Tue, Mar 12, 2019 at 3:56 PM Pine W wrote: > Hi Haifeng, thanks for the information. I think that your idea of looking > in the dumps makes sense. Am I understanding correctly that you would like > advice regarding how to do that in the most efficient way? > > Hi Leila, I believe that I asked for more information regarding Heifeng's > work. There has been discussion on English Wikipedia regarding volunteers > being unhappy with the interventions or proposed interventions of > researchers. I think that asking about the nature of Haifeng's research is > legitimate, and I tried to provide some examples of possible types of > research. I'm trying to protect the community from problematic > interventions, while also welcoming research that is accepted by the > community. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang > wrote: > > > Pine and Stuart, > > > > I meant extracting a random sample of new editors (month by month) from > > Wikipedia edit history. > > > > It is not about survey of new editors, but still thanks for your > > suggestions. > > > > > > Thanks, > > Haifeng Zhang > > > > Postdoctoral Research Fellow > > Human-Computer Interaction Institute > > Carnegie Mellon University > > > > From: Wiki-research-l on > > behalf of Stuart A. Yeates > > Sent: Tuesday, March 12, 2019 3:46:19 PM > > To: Research into Wikimedia content and communities > > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia > > > > There are a number of new-editor-heavy noticeboards. I would suggest > > posting an invite there to your survey (or whatever) If you ask for > > editor's usernames you can filter out those who don't meet your > > definition of 'new' > > > > I'm thinking of places like: > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk > > > > cheers > > stuart > > > > > > -- > > ...let us be heard from red core to black sky > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > > > > > Hi Pine, > > > > > > Haifeng has a simple question about how to sample editors other than > > > via dumps. It would be great if someone who knows the answer to help > > > them to move forward. > > > > > > If you are interested to learn more about their research, instead of > > > answering their question, my recommendation would be to start the > > > conversation with: "can you tell us more about your research?" kind of > > > question. I find the current way of communication very speculative, > > > and that is not good for making a vibrant research community that can > > > help us address some of our big questions. > > > > > > Best, > > > Leila > > > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > > > > > > > Hi, can you expand on what you mean by "sample"? If you're referring > to > > > > analyzing users' edit histories then that should be fine. However, if > > > > you're planning to send surveys or messages to them, sending them > > > > barnstars, or otherwise manipulating their on-wiki experience, that > > would > > > > be problematic. > > > > > > > > Pine > > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang < > haife...@andrew.cmu.edu > > > > > > > wrote: > > > > > > > > > Hi folks, > > > > > > > > > > My work needs to randomly sample new editors in each month, e.g., > 100 > > > > > editors per month. > > > > > > > > > > Do any of you have good suggestions for how to do this efficiently? > > > > > > > > > > I could think of using the dump files, but wonder are there other > > options? > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Haifeng Zhang > > > > > ___ > > > > > Wiki-research-l mailing list > > > > > Wiki-research-l@lists.wikimedia.org > > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > > > > ___ > > > > Wiki-research-l mailing list > > > > Wiki-research-l@lists.wikimedia.org > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > ___ > > > Wiki-research-l mailing list > > > Wiki-research-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > ___ > > Wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Hi Haifeng, thanks for the information. I think that your idea of looking in the dumps makes sense. Am I understanding correctly that you would like advice regarding how to do that in the most efficient way? Hi Leila, I believe that I asked for more information regarding Heifeng's work. There has been discussion on English Wikipedia regarding volunteers being unhappy with the interventions or proposed interventions of researchers. I think that asking about the nature of Haifeng's research is legitimate, and I tried to provide some examples of possible types of research. I'm trying to protect the community from problematic interventions, while also welcoming research that is accepted by the community. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang wrote: > Pine and Stuart, > > I meant extracting a random sample of new editors (month by month) from > Wikipedia edit history. > > It is not about survey of new editors, but still thanks for your > suggestions. > > > Thanks, > Haifeng Zhang > > Postdoctoral Research Fellow > Human-Computer Interaction Institute > Carnegie Mellon University > > From: Wiki-research-l on > behalf of Stuart A. Yeates > Sent: Tuesday, March 12, 2019 3:46:19 PM > To: Research into Wikimedia content and communities > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia > > There are a number of new-editor-heavy noticeboards. I would suggest > posting an invite there to your survey (or whatever) If you ask for > editor's usernames you can filter out those who don't meet your > definition of 'new' > > I'm thinking of places like: > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and > https://en.wikipedia.org/wiki/Wikipedia:Help_desk > > cheers > stuart > > > -- > ...let us be heard from red core to black sky > > On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > > > Hi Pine, > > > > Haifeng has a simple question about how to sample editors other than > > via dumps. It would be great if someone who knows the answer to help > > them to move forward. > > > > If you are interested to learn more about their research, instead of > > answering their question, my recommendation would be to start the > > conversation with: "can you tell us more about your research?" kind of > > question. I find the current way of communication very speculative, > > and that is not good for making a vibrant research community that can > > help us address some of our big questions. > > > > Best, > > Leila > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > > > > > Hi, can you expand on what you mean by "sample"? If you're referring to > > > analyzing users' edit histories then that should be fine. However, if > > > you're planning to send surveys or messages to them, sending them > > > barnstars, or otherwise manipulating their on-wiki experience, that > would > > > be problematic. > > > > > > Pine > > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang > > > > wrote: > > > > > > > Hi folks, > > > > > > > > My work needs to randomly sample new editors in each month, e.g., 100 > > > > editors per month. > > > > > > > > Do any of you have good suggestions for how to do this efficiently? > > > > > > > > I could think of using the dump files, but wonder are there other > options? > > > > > > > > > > > > Thanks, > > > > > > > > Haifeng Zhang > > > > ___ > > > > Wiki-research-l mailing list > > > > Wiki-research-l@lists.wikimedia.org > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > > > ___ > > > Wiki-research-l mailing list > > > Wiki-research-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > ___ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Pine and Stuart, I meant extracting a random sample of new editors (month by month) from Wikipedia edit history. It is not about survey of new editors, but still thanks for your suggestions. Thanks, Haifeng Zhang Postdoctoral Research Fellow Human-Computer Interaction Institute Carnegie Mellon University From: Wiki-research-l on behalf of Stuart A. Yeates Sent: Tuesday, March 12, 2019 3:46:19 PM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia There are a number of new-editor-heavy noticeboards. I would suggest posting an invite there to your survey (or whatever) If you ask for editor's usernames you can filter out those who don't meet your definition of 'new' I'm thinking of places like: https://en.wikipedia.org/wiki/Wikipedia:Teahouse and https://en.wikipedia.org/wiki/Wikipedia:Help_desk cheers stuart -- ...let us be heard from red core to black sky On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > Hi Pine, > > Haifeng has a simple question about how to sample editors other than > via dumps. It would be great if someone who knows the answer to help > them to move forward. > > If you are interested to learn more about their research, instead of > answering their question, my recommendation would be to start the > conversation with: "can you tell us more about your research?" kind of > question. I find the current way of communication very speculative, > and that is not good for making a vibrant research community that can > help us address some of our big questions. > > Best, > Leila > > On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > > > Hi, can you expand on what you mean by "sample"? If you're referring to > > analyzing users' edit histories then that should be fine. However, if > > you're planning to send surveys or messages to them, sending them > > barnstars, or otherwise manipulating their on-wiki experience, that would > > be problematic. > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang > > wrote: > > > > > Hi folks, > > > > > > My work needs to randomly sample new editors in each month, e.g., 100 > > > editors per month. > > > > > > Do any of you have good suggestions for how to do this efficiently? > > > > > > I could think of using the dump files, but wonder are there other options? > > > > > > > > > Thanks, > > > > > > Haifeng Zhang > > > ___ > > > Wiki-research-l mailing list > > > Wiki-research-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > ___ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
There are a number of new-editor-heavy noticeboards. I would suggest posting an invite there to your survey (or whatever) If you ask for editor's usernames you can filter out those who don't meet your definition of 'new' I'm thinking of places like: https://en.wikipedia.org/wiki/Wikipedia:Teahouse and https://en.wikipedia.org/wiki/Wikipedia:Help_desk cheers stuart -- ...let us be heard from red core to black sky On Wed, 13 Mar 2019 at 08:37, Leila Zia wrote: > > Hi Pine, > > Haifeng has a simple question about how to sample editors other than > via dumps. It would be great if someone who knows the answer to help > them to move forward. > > If you are interested to learn more about their research, instead of > answering their question, my recommendation would be to start the > conversation with: "can you tell us more about your research?" kind of > question. I find the current way of communication very speculative, > and that is not good for making a vibrant research community that can > help us address some of our big questions. > > Best, > Leila > > On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > > > Hi, can you expand on what you mean by "sample"? If you're referring to > > analyzing users' edit histories then that should be fine. However, if > > you're planning to send surveys or messages to them, sending them > > barnstars, or otherwise manipulating their on-wiki experience, that would > > be problematic. > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang > > wrote: > > > > > Hi folks, > > > > > > My work needs to randomly sample new editors in each month, e.g., 100 > > > editors per month. > > > > > > Do any of you have good suggestions for how to do this efficiently? > > > > > > I could think of using the dump files, but wonder are there other options? > > > > > > > > > Thanks, > > > > > > Haifeng Zhang > > > ___ > > > Wiki-research-l mailing list > > > Wiki-research-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > > > ___ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Hi Pine, Haifeng has a simple question about how to sample editors other than via dumps. It would be great if someone who knows the answer to help them to move forward. If you are interested to learn more about their research, instead of answering their question, my recommendation would be to start the conversation with: "can you tell us more about your research?" kind of question. I find the current way of communication very speculative, and that is not good for making a vibrant research community that can help us address some of our big questions. Best, Leila On Tue, Mar 12, 2019 at 12:08 PM Pine W wrote: > > Hi, can you expand on what you mean by "sample"? If you're referring to > analyzing users' edit histories then that should be fine. However, if > you're planning to send surveys or messages to them, sending them > barnstars, or otherwise manipulating their on-wiki experience, that would > be problematic. > > Pine > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang > wrote: > > > Hi folks, > > > > My work needs to randomly sample new editors in each month, e.g., 100 > > editors per month. > > > > Do any of you have good suggestions for how to do this efficiently? > > > > I could think of using the dump files, but wonder are there other options? > > > > > > Thanks, > > > > Haifeng Zhang > > ___ > > Wiki-research-l mailing list > > Wiki-research-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > > > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Sampling new editors in English Wikipedia
Hi, can you expand on what you mean by "sample"? If you're referring to analyzing users' edit histories then that should be fine. However, if you're planning to send surveys or messages to them, sending them barnstars, or otherwise manipulating their on-wiki experience, that would be problematic. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang wrote: > Hi folks, > > My work needs to randomly sample new editors in each month, e.g., 100 > editors per month. > > Do any of you have good suggestions for how to do this efficiently? > > I could think of using the dump files, but wonder are there other options? > > > Thanks, > > Haifeng Zhang > ___ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Sampling new editors in English Wikipedia
Hi folks, My work needs to randomly sample new editors in each month, e.g., 100 editors per month. Do any of you have good suggestions for how to do this efficiently? I could think of using the dump files, but wonder are there other options? Thanks, Haifeng Zhang ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l