Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread fn

Haifeng ,


While some suggests the dumps or notice boards, my immediate thought was 
a database query, e.g., through Quarry. It just happens that Jonathan T. 
Morgan has created a query there:


https://quarry.wmflabs.org/query/310

SELECT user_id, user_name, user_registration, user_editcount
FROM enwiki_p.user
	WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 
DAY),'%Y%m%d%H%i%s')

AND user_editcount > 10
	AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE 
ug_group = 'bot')
	AND user_name not in (SELECT REPLACE(log_title,"_"," ") from 
enwiki_p.logging

where log_type = "block" and log_action = "block"
		and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2 
DAY),'%Y%m%d%H%i%s'));



You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork 
here https://quarry.wmflabs.org/query/34256 querying for month, - as 
another example.




Finn Årup Nielsen
http://people.compute.dtu.dk/faan/


On 12/03/2019 19:18, Haifeng Zhang wrote:

Hi folks,

My work needs to randomly sample new editors in each month, e.g., 100 editors 
per month.

Do any of you have good suggestions for how to do this efficiently?

I could think of using the dump files, but wonder are there other options?


Thanks,

Haifeng Zhang
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Leila Zia
Let's do it.


On Tue, Mar 12, 2019 at 3:04 PM Pine W  wrote:
>
> Leila, can we discuss this off list?
>
> Thanks,
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 9:29 PM Leila Zia  wrote:
>
> > On Tue, Mar 12, 2019 at 1:56 PM Pine W  wrote:
> > >
> > > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > > work.
> >
> > You stated
> >
> > "However, if you're planning to send surveys or messages to them,
> > sending them barnstars, or otherwise manipulating their on-wiki
> > experience, that would be problematic."
> >
> > and I'm suggesting that you enter from a question angle, please.
> >
>
> > > There has been discussion on English Wikipedia regarding volunteers
> > > being unhappy with the interventions or proposed interventions of
> > > researchers. I think that asking about the nature of Haifeng's research
> > is
> > > legitimate, and I tried to provide some examples of possible types of
> > > research.
> >
> > Please check your email. There was no question there in the part
> > related to this discussion. Also, even if there was a question posed,
> > I highly recommend you enter from a different angle to these
> > conversations. There are many reasons someone may need the sampled
> > data of newcomers. A few examples: they may want to test the
> > assumption whether the arrivals (registrations) to a specific
> > Wikipedia language follow a Poisson process or not, they may want to
> > learn about the distribution of topics editors in a given language
> > edit in the first 24 hours after they open the account, they may want
> > to build a prediction model to predict whether the editor will make
> > the n-th edit or not given that they have started at time x, they may
> > want to see whether external events have strong correlations with
> > account registration and Wikipedia activity, they may want to see if
> > the change to HTTPS had impact on registrations, etc. There are
> > literally millions of questions people may ask (given that the data is
> > available to them) with respect to Wikipedia. The answer to some of
> > them may require interaction with Wikipedia editors, the answer to
> > some may not. So the safest bet to start having a fruitful
> > conversation is to ask: can you tell us more about what you're trying
> > to do?
> >
> > > I'm trying to protect the community from problematic
> > > interventions, while also welcoming research that is accepted by the
> > > community.
> >
> > I understand and I'm looking forward to having conversations with you
> > all about how to achieve that.
> >
> > Best,
> > Leila
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Pine W
Leila, can we discuss this off list?

Thanks,

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 9:29 PM Leila Zia  wrote:

> On Tue, Mar 12, 2019 at 1:56 PM Pine W  wrote:
> >
> > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > work.
>
> You stated
>
> "However, if you're planning to send surveys or messages to them,
> sending them barnstars, or otherwise manipulating their on-wiki
> experience, that would be problematic."
>
> and I'm suggesting that you enter from a question angle, please.
>

> > There has been discussion on English Wikipedia regarding volunteers
> > being unhappy with the interventions or proposed interventions of
> > researchers. I think that asking about the nature of Haifeng's research
> is
> > legitimate, and I tried to provide some examples of possible types of
> > research.
>
> Please check your email. There was no question there in the part
> related to this discussion. Also, even if there was a question posed,
> I highly recommend you enter from a different angle to these
> conversations. There are many reasons someone may need the sampled
> data of newcomers. A few examples: they may want to test the
> assumption whether the arrivals (registrations) to a specific
> Wikipedia language follow a Poisson process or not, they may want to
> learn about the distribution of topics editors in a given language
> edit in the first 24 hours after they open the account, they may want
> to build a prediction model to predict whether the editor will make
> the n-th edit or not given that they have started at time x, they may
> want to see whether external events have strong correlations with
> account registration and Wikipedia activity, they may want to see if
> the change to HTTPS had impact on registrations, etc. There are
> literally millions of questions people may ask (given that the data is
> available to them) with respect to Wikipedia. The answer to some of
> them may require interaction with Wikipedia editors, the answer to
> some may not. So the safest bet to start having a fruitful
> conversation is to ask: can you tell us more about what you're trying
> to do?
>
> > I'm trying to protect the community from problematic
> > interventions, while also welcoming research that is accepted by the
> > community.
>
> I understand and I'm looking forward to having conversations with you
> all about how to achieve that.
>
> Best,
> Leila
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
There are thousands and thousands of editors with multiple accounts.
Those who have been bothered to add a category are listed at
https://en.wikipedia.org/wiki/Category:Wikipedians_with_alternative_accounts

Many editors who engage in outreach are advised to create new accounts
for themselves regularly, simply because the experience of new account
creation changes over time and helping users streamline that
(especially in situations such as editathons) requires thorough
knowledge of account creation and the things that can make it go
wrong. Pretty much a prerequisite for the old  accountcreator
userright https://en.wikipedia.org/wiki/Wikipedia:Account_creator
(which I've had on several occasions) and the new eventcoordinator
userright  https://en.wikipedia.org/wiki/Wikipedia:Event_coordinator
(which is too new for me to have had yet).

cheers
stuart
--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:40, Isaac Johnson  wrote:
>
> Yes, thanks for the clarification Stuart. I don't know of any statistics to
> suggest how widespread this is, but it might be worth checking, especially
> if you are focusing on editors with higher edit counts (who I suspect are
> more likely to have multiple accounts for licit reasons).
>
> On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates  wrote:
>
> > Note that this code deals with accounts, not editors, which is what
> > Haifeng asked for.
> >
> > There are many reasons, both licit and illicit for editors to have
> > more than one account. I know I have more than ten for
> > policy-compliant reasons.
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 10:21, Isaac Johnson  wrote:
> > >
> > > Hey Haifeng,
> > > If you decide to process the dumps, you should be able to easily
> > repurpose
> > > some quick code that I wrote for a similar project:
> > >
> > https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
> > >
> > > Notably, I'd suggest using the stub history dumps as they are much
> > smaller
> > > because they do not include the actual content. For instance, for March
> > 1st
> > > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/),
> > this
> > > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
> > >
> > > Best,
> > > Isaac
> > >
> > > On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:
> > >
> > > > Hi Haifeng, thanks for the information. I think that your idea of
> > looking
> > > > in the dumps makes sense. Am I understanding correctly that you would
> > like
> > > > advice regarding how to do that in the most efficient way?
> > > >
> > > > Hi Leila, I believe that I asked for more information regarding
> > Heifeng's
> > > > work. There has been discussion on English Wikipedia regarding
> > volunteers
> > > > being unhappy with the interventions or proposed interventions of
> > > > researchers. I think that asking about the nature of Haifeng's
> > research is
> > > > legitimate, and I tried to provide some examples of possible types of
> > > > research. I'm trying to protect the community from problematic
> > > > interventions, while also welcoming research that is accepted by the
> > > > community.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang  > >
> > > > wrote:
> > > >
> > > > > Pine and Stuart,
> > > > >
> > > > > I meant extracting a random sample of new editors (month by month)
> > from
> > > > > Wikipedia edit history.
> > > > >
> > > > > It is not about survey of new editors, but still thanks for your
> > > > > suggestions.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Haifeng Zhang
> > > > >
> > > > > Postdoctoral Research Fellow
> > > > > Human-Computer Interaction Institute
> > > > > Carnegie Mellon University
> > > > > 
> > > > > From: Wiki-research-l 
> > on
> > > > > behalf of Stuart A. Yeates 
> > > > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > > > To: Research into Wikimedia content and communities
> > > > > Subject: Re: [Wiki-research-l] Sampling new editors in English
> > Wikipedia
> > > > >
> > > > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > > > posting an invite there to your survey (or whatever) If you ask for
> > > > > editor's usernames you can filter out those who don't meet your
> > > > > definition of 'new'
> > > > >
> > > > > I'm thinking of places like:
> > > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > > > >
> > > > > cheers
> > > > > stuart
> > > > >
> > > > >
> > > > > --
> > > > > ...let us be heard from red core to black sky
> > > > >
> > > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> > > > > >
> > > > > > Hi Pine,
> > > > > >
> > > > > > Haifeng has a simple question about how to sample editors other
> > than
> > > > > > via dumps. It

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Isaac Johnson
Yes, thanks for the clarification Stuart. I don't know of any statistics to
suggest how widespread this is, but it might be worth checking, especially
if you are focusing on editors with higher edit counts (who I suspect are
more likely to have multiple accounts for licit reasons).

On Tue, Mar 12, 2019 at 4:34 PM Stuart A. Yeates  wrote:

> Note that this code deals with accounts, not editors, which is what
> Haifeng asked for.
>
> There are many reasons, both licit and illicit for editors to have
> more than one account. I know I have more than ten for
> policy-compliant reasons.
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On Wed, 13 Mar 2019 at 10:21, Isaac Johnson  wrote:
> >
> > Hey Haifeng,
> > If you decide to process the dumps, you should be able to easily
> repurpose
> > some quick code that I wrote for a similar project:
> >
> https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
> >
> > Notably, I'd suggest using the stub history dumps as they are much
> smaller
> > because they do not include the actual content. For instance, for March
> 1st
> > and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/),
> this
> > file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
> >
> > Best,
> > Isaac
> >
> > On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:
> >
> > > Hi Haifeng, thanks for the information. I think that your idea of
> looking
> > > in the dumps makes sense. Am I understanding correctly that you would
> like
> > > advice regarding how to do that in the most efficient way?
> > >
> > > Hi Leila, I believe that I asked for more information regarding
> Heifeng's
> > > work. There has been discussion on English Wikipedia regarding
> volunteers
> > > being unhappy with the interventions or proposed interventions of
> > > researchers. I think that asking about the nature of Haifeng's
> research is
> > > legitimate, and I tried to provide some examples of possible types of
> > > research. I'm trying to protect the community from problematic
> > > interventions, while also welcoming research that is accepted by the
> > > community.
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang  >
> > > wrote:
> > >
> > > > Pine and Stuart,
> > > >
> > > > I meant extracting a random sample of new editors (month by month)
> from
> > > > Wikipedia edit history.
> > > >
> > > > It is not about survey of new editors, but still thanks for your
> > > > suggestions.
> > > >
> > > >
> > > > Thanks,
> > > > Haifeng Zhang
> > > >
> > > > Postdoctoral Research Fellow
> > > > Human-Computer Interaction Institute
> > > > Carnegie Mellon University
> > > > 
> > > > From: Wiki-research-l 
> on
> > > > behalf of Stuart A. Yeates 
> > > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > > To: Research into Wikimedia content and communities
> > > > Subject: Re: [Wiki-research-l] Sampling new editors in English
> Wikipedia
> > > >
> > > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > > posting an invite there to your survey (or whatever) If you ask for
> > > > editor's usernames you can filter out those who don't meet your
> > > > definition of 'new'
> > > >
> > > > I'm thinking of places like:
> > > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > > >
> > > > cheers
> > > > stuart
> > > >
> > > >
> > > > --
> > > > ...let us be heard from red core to black sky
> > > >
> > > > On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> > > > >
> > > > > Hi Pine,
> > > > >
> > > > > Haifeng has a simple question about how to sample editors other
> than
> > > > > via dumps. It would be great if someone who knows the answer to
> help
> > > > > them to move forward.
> > > > >
> > > > > If you are interested to learn more about their research, instead
> of
> > > > > answering their question, my recommendation would be to start the
> > > > > conversation with: "can you tell us more about your research?"
> kind of
> > > > > question. I find the current way of communication very speculative,
> > > > > and that is not good for making a vibrant research community that
> can
> > > > > help us address some of our big questions.
> > > > >
> > > > > Best,
> > > > > Leila
> > > > >
> > > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W 
> wrote:
> > > > > >
> > > > > > Hi, can you expand on what you mean by "sample"? If you're
> referring
> > > to
> > > > > > analyzing users' edit histories then that should be fine.
> However, if
> > > > > > you're planning to send surveys or messages to them, sending them
> > > > > > barnstars, or otherwise manipulating their on-wiki experience,
> that
> > > > would
> > > > > > be problematic.
> > > > > >
> > > > > > Pine
> > > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 12, 2019

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
Note that this code deals with accounts, not editors, which is what
Haifeng asked for.

There are many reasons, both licit and illicit for editors to have
more than one account. I know I have more than ten for
policy-compliant reasons.

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 10:21, Isaac Johnson  wrote:
>
> Hey Haifeng,
> If you decide to process the dumps, you should be able to easily repurpose
> some quick code that I wrote for a similar project:
> https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover
>
> Notably, I'd suggest using the stub history dumps as they are much smaller
> because they do not include the actual content. For instance, for March 1st
> and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
> file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.
>
> Best,
> Isaac
>
> On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:
>
> > Hi Haifeng, thanks for the information. I think that your idea of looking
> > in the dumps makes sense. Am I understanding correctly that you would like
> > advice regarding how to do that in the most efficient way?
> >
> > Hi Leila, I believe that I asked for more information regarding Heifeng's
> > work. There has been discussion on English Wikipedia regarding volunteers
> > being unhappy with the interventions or proposed interventions of
> > researchers. I think that asking about the nature of Haifeng's research is
> > legitimate, and I tried to provide some examples of possible types of
> > research. I'm trying to protect the community from problematic
> > interventions, while also welcoming research that is accepted by the
> > community.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang 
> > wrote:
> >
> > > Pine and Stuart,
> > >
> > > I meant extracting a random sample of new editors (month by month) from
> > > Wikipedia edit history.
> > >
> > > It is not about survey of new editors, but still thanks for your
> > > suggestions.
> > >
> > >
> > > Thanks,
> > > Haifeng Zhang
> > >
> > > Postdoctoral Research Fellow
> > > Human-Computer Interaction Institute
> > > Carnegie Mellon University
> > > 
> > > From: Wiki-research-l  on
> > > behalf of Stuart A. Yeates 
> > > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > > To: Research into Wikimedia content and communities
> > > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> > >
> > > There are a number of new-editor-heavy noticeboards. I would suggest
> > > posting an invite there to your survey (or whatever) If you ask for
> > > editor's usernames you can filter out those who don't meet your
> > > definition of 'new'
> > >
> > > I'm thinking of places like:
> > > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> > >
> > > cheers
> > > stuart
> > >
> > >
> > > --
> > > ...let us be heard from red core to black sky
> > >
> > > On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> > > >
> > > > Hi Pine,
> > > >
> > > > Haifeng has a simple question about how to sample editors other than
> > > > via dumps. It would be great if someone who knows the answer to help
> > > > them to move forward.
> > > >
> > > > If you are interested to learn more about their research, instead of
> > > > answering their question, my recommendation would be to start the
> > > > conversation with: "can you tell us more about your research?" kind of
> > > > question. I find the current way of communication very speculative,
> > > > and that is not good for making a vibrant research community that can
> > > > help us address some of our big questions.
> > > >
> > > > Best,
> > > > Leila
> > > >
> > > > On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> > > > >
> > > > > Hi, can you expand on what you mean by "sample"? If you're referring
> > to
> > > > > analyzing users' edit histories then that should be fine. However, if
> > > > > you're planning to send surveys or messages to them, sending them
> > > > > barnstars, or otherwise manipulating their on-wiki experience, that
> > > would
> > > > > be problematic.
> > > > >
> > > > > Pine
> > > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > > >
> > > > >
> > > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> > haife...@andrew.cmu.edu
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > My work needs to randomly sample new editors in each month, e.g.,
> > 100
> > > > > > editors per month.
> > > > > >
> > > > > > Do any of you have good suggestions for how to do this efficiently?
> > > > > >
> > > > > > I could think of using the dump files, but wonder are there other
> > > options?
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Haifeng Zhang
> > > > > > ___
> > > > > > Wiki-research-l mailing list
> > >

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Leila Zia
On Tue, Mar 12, 2019 at 1:56 PM Pine W  wrote:
>
> Hi Leila, I believe that I asked for more information regarding Heifeng's
> work.

You stated

"However, if you're planning to send surveys or messages to them,
sending them barnstars, or otherwise manipulating their on-wiki
experience, that would be problematic."

and I'm suggesting that you enter from a question angle, please.

> There has been discussion on English Wikipedia regarding volunteers
> being unhappy with the interventions or proposed interventions of
> researchers. I think that asking about the nature of Haifeng's research is
> legitimate, and I tried to provide some examples of possible types of
> research.

Please check your email. There was no question there in the part
related to this discussion. Also, even if there was a question posed,
I highly recommend you enter from a different angle to these
conversations. There are many reasons someone may need the sampled
data of newcomers. A few examples: they may want to test the
assumption whether the arrivals (registrations) to a specific
Wikipedia language follow a Poisson process or not, they may want to
learn about the distribution of topics editors in a given language
edit in the first 24 hours after they open the account, they may want
to build a prediction model to predict whether the editor will make
the n-th edit or not given that they have started at time x, they may
want to see whether external events have strong correlations with
account registration and Wikipedia activity, they may want to see if
the change to HTTPS had impact on registrations, etc. There are
literally millions of questions people may ask (given that the data is
available to them) with respect to Wikipedia. The answer to some of
them may require interaction with Wikipedia editors, the answer to
some may not. So the safest bet to start having a fruitful
conversation is to ask: can you tell us more about what you're trying
to do?

> I'm trying to protect the community from problematic
> interventions, while also welcoming research that is accepted by the
> community.

I understand and I'm looking forward to having conversations with you
all about how to achieve that.

Best,
Leila

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Isaac Johnson
Hey Haifeng,
If you decide to process the dumps, you should be able to easily repurpose
some quick code that I wrote for a similar project:
https://github.com/geohci/miscellaneous-wikimedia/tree/master/editor-turnover

Notably, I'd suggest using the stub history dumps as they are much smaller
because they do not include the actual content. For instance, for March 1st
and English Wikipedia (https://dumps.wikimedia.org/enwiki/20190301/), this
file would be enwiki-20190301-stub-meta-history.xml.gz and is 57.9 GB.

Best,
Isaac

On Tue, Mar 12, 2019 at 3:56 PM Pine W  wrote:

> Hi Haifeng, thanks for the information. I think that your idea of looking
> in the dumps makes sense. Am I understanding correctly that you would like
> advice regarding how to do that in the most efficient way?
>
> Hi Leila, I believe that I asked for more information regarding Heifeng's
> work. There has been discussion on English Wikipedia regarding volunteers
> being unhappy with the interventions or proposed interventions of
> researchers. I think that asking about the nature of Haifeng's research is
> legitimate, and I tried to provide some examples of possible types of
> research. I'm trying to protect the community from problematic
> interventions, while also welcoming research that is accepted by the
> community.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang 
> wrote:
>
> > Pine and Stuart,
> >
> > I meant extracting a random sample of new editors (month by month) from
> > Wikipedia edit history.
> >
> > It is not about survey of new editors, but still thanks for your
> > suggestions.
> >
> >
> > Thanks,
> > Haifeng Zhang
> >
> > Postdoctoral Research Fellow
> > Human-Computer Interaction Institute
> > Carnegie Mellon University
> > 
> > From: Wiki-research-l  on
> > behalf of Stuart A. Yeates 
> > Sent: Tuesday, March 12, 2019 3:46:19 PM
> > To: Research into Wikimedia content and communities
> > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
> >
> > There are a number of new-editor-heavy noticeboards. I would suggest
> > posting an invite there to your survey (or whatever) If you ask for
> > editor's usernames you can filter out those who don't meet your
> > definition of 'new'
> >
> > I'm thinking of places like:
> > https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> > https://en.wikipedia.org/wiki/Wikipedia:Help_desk
> >
> > cheers
> > stuart
> >
> >
> > --
> > ...let us be heard from red core to black sky
> >
> > On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> > >
> > > Hi Pine,
> > >
> > > Haifeng has a simple question about how to sample editors other than
> > > via dumps. It would be great if someone who knows the answer to help
> > > them to move forward.
> > >
> > > If you are interested to learn more about their research, instead of
> > > answering their question, my recommendation would be to start the
> > > conversation with: "can you tell us more about your research?" kind of
> > > question. I find the current way of communication very speculative,
> > > and that is not good for making a vibrant research community that can
> > > help us address some of our big questions.
> > >
> > > Best,
> > > Leila
> > >
> > > On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> > > >
> > > > Hi, can you expand on what you mean by "sample"? If you're referring
> to
> > > > analyzing users' edit histories then that should be fine. However, if
> > > > you're planning to send surveys or messages to them, sending them
> > > > barnstars, or otherwise manipulating their on-wiki experience, that
> > would
> > > > be problematic.
> > > >
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang <
> haife...@andrew.cmu.edu
> > >
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > My work needs to randomly sample new editors in each month, e.g.,
> 100
> > > > > editors per month.
> > > > >
> > > > > Do any of you have good suggestions for how to do this efficiently?
> > > > >
> > > > > I could think of using the dump files, but wonder are there other
> > options?
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Haifeng Zhang
> > > > > ___
> > > > > Wiki-research-l mailing list
> > > > > Wiki-research-l@lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > > >
> > > > ___
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l

Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Pine W
Hi Haifeng, thanks for the information. I think that your idea of looking
in the dumps makes sense. Am I understanding correctly that you would like
advice regarding how to do that in the most efficient way?

Hi Leila, I believe that I asked for more information regarding Heifeng's
work. There has been discussion on English Wikipedia regarding volunteers
being unhappy with the interventions or proposed interventions of
researchers. I think that asking about the nature of Haifeng's research is
legitimate, and I tried to provide some examples of possible types of
research. I'm trying to protect the community from problematic
interventions, while also welcoming research that is accepted by the
community.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 8:00 PM Haifeng Zhang 
wrote:

> Pine and Stuart,
>
> I meant extracting a random sample of new editors (month by month) from
> Wikipedia edit history.
>
> It is not about survey of new editors, but still thanks for your
> suggestions.
>
>
> Thanks,
> Haifeng Zhang
>
> Postdoctoral Research Fellow
> Human-Computer Interaction Institute
> Carnegie Mellon University
> 
> From: Wiki-research-l  on
> behalf of Stuart A. Yeates 
> Sent: Tuesday, March 12, 2019 3:46:19 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> There are a number of new-editor-heavy noticeboards. I would suggest
> posting an invite there to your survey (or whatever) If you ask for
> editor's usernames you can filter out those who don't meet your
> definition of 'new'
>
> I'm thinking of places like:
> https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
> https://en.wikipedia.org/wiki/Wikipedia:Help_desk
>
> cheers
> stuart
>
>
> --
> ...let us be heard from red core to black sky
>
> On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
> >
> > Hi Pine,
> >
> > Haifeng has a simple question about how to sample editors other than
> > via dumps. It would be great if someone who knows the answer to help
> > them to move forward.
> >
> > If you are interested to learn more about their research, instead of
> > answering their question, my recommendation would be to start the
> > conversation with: "can you tell us more about your research?" kind of
> > question. I find the current way of communication very speculative,
> > and that is not good for making a vibrant research community that can
> > help us address some of our big questions.
> >
> > Best,
> > Leila
> >
> > On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> > >
> > > Hi, can you expand on what you mean by "sample"? If you're referring to
> > > analyzing users' edit histories then that should be fine. However, if
> > > you're planning to send surveys or messages to them, sending them
> > > barnstars, or otherwise manipulating their on-wiki experience, that
> would
> > > be problematic.
> > >
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang  >
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > My work needs to randomly sample new editors in each month, e.g., 100
> > > > editors per month.
> > > >
> > > > Do any of you have good suggestions for how to do this efficiently?
> > > >
> > > > I could think of using the dump files, but wonder are there other
> options?
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Haifeng Zhang
> > > > ___
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Haifeng Zhang
Pine and Stuart,

I meant extracting a random sample of new editors (month by month) from 
Wikipedia edit history.

It is not about survey of new editors, but still thanks for your suggestions.


Thanks,
Haifeng Zhang

Postdoctoral Research Fellow
Human-Computer Interaction Institute
Carnegie Mellon University

From: Wiki-research-l  on behalf 
of Stuart A. Yeates 
Sent: Tuesday, March 12, 2019 3:46:19 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

There are a number of new-editor-heavy noticeboards. I would suggest
posting an invite there to your survey (or whatever) If you ask for
editor's usernames you can filter out those who don't meet your
definition of 'new'

I'm thinking of places like:
https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
https://en.wikipedia.org/wiki/Wikipedia:Help_desk

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
>
> Hi Pine,
>
> Haifeng has a simple question about how to sample editors other than
> via dumps. It would be great if someone who knows the answer to help
> them to move forward.
>
> If you are interested to learn more about their research, instead of
> answering their question, my recommendation would be to start the
> conversation with: "can you tell us more about your research?" kind of
> question. I find the current way of communication very speculative,
> and that is not good for making a vibrant research community that can
> help us address some of our big questions.
>
> Best,
> Leila
>
> On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> >
> > Hi, can you expand on what you mean by "sample"? If you're referring to
> > analyzing users' edit histories then that should be fine. However, if
> > you're planning to send surveys or messages to them, sending them
> > barnstars, or otherwise manipulating their on-wiki experience, that would
> > be problematic.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang 
> > wrote:
> >
> > > Hi folks,
> > >
> > > My work needs to randomly sample new editors in each month, e.g., 100
> > > editors per month.
> > >
> > > Do any of you have good suggestions for how to do this efficiently?
> > >
> > > I could think of using the dump files, but wonder are there other options?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Stuart A. Yeates
There are a number of new-editor-heavy noticeboards. I would suggest
posting an invite there to your survey (or whatever) If you ask for
editor's usernames you can filter out those who don't meet your
definition of 'new'

I'm thinking of places like:
https://en.wikipedia.org/wiki/Wikipedia:Teahouse and
https://en.wikipedia.org/wiki/Wikipedia:Help_desk

cheers
stuart


--
...let us be heard from red core to black sky

On Wed, 13 Mar 2019 at 08:37, Leila Zia  wrote:
>
> Hi Pine,
>
> Haifeng has a simple question about how to sample editors other than
> via dumps. It would be great if someone who knows the answer to help
> them to move forward.
>
> If you are interested to learn more about their research, instead of
> answering their question, my recommendation would be to start the
> conversation with: "can you tell us more about your research?" kind of
> question. I find the current way of communication very speculative,
> and that is not good for making a vibrant research community that can
> help us address some of our big questions.
>
> Best,
> Leila
>
> On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
> >
> > Hi, can you expand on what you mean by "sample"? If you're referring to
> > analyzing users' edit histories then that should be fine. However, if
> > you're planning to send surveys or messages to them, sending them
> > barnstars, or otherwise manipulating their on-wiki experience, that would
> > be problematic.
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> > On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang 
> > wrote:
> >
> > > Hi folks,
> > >
> > > My work needs to randomly sample new editors in each month, e.g., 100
> > > editors per month.
> > >
> > > Do any of you have good suggestions for how to do this efficiently?
> > >
> > > I could think of using the dump files, but wonder are there other options?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > ___
> > > Wiki-research-l mailing list
> > > Wiki-research-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Leila Zia
Hi Pine,

Haifeng has a simple question about how to sample editors other than
via dumps. It would be great if someone who knows the answer to help
them to move forward.

If you are interested to learn more about their research, instead of
answering their question, my recommendation would be to start the
conversation with: "can you tell us more about your research?" kind of
question. I find the current way of communication very speculative,
and that is not good for making a vibrant research community that can
help us address some of our big questions.

Best,
Leila

On Tue, Mar 12, 2019 at 12:08 PM Pine W  wrote:
>
> Hi, can you expand on what you mean by "sample"? If you're referring to
> analyzing users' edit histories then that should be fine. However, if
> you're planning to send surveys or messages to them, sending them
> barnstars, or otherwise manipulating their on-wiki experience, that would
> be problematic.
>
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
> On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang 
> wrote:
>
> > Hi folks,
> >
> > My work needs to randomly sample new editors in each month, e.g., 100
> > editors per month.
> >
> > Do any of you have good suggestions for how to do this efficiently?
> >
> > I could think of using the dump files, but wonder are there other options?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> > ___
> > Wiki-research-l mailing list
> > Wiki-research-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Pine W
Hi, can you expand on what you mean by "sample"? If you're referring to
analyzing users' edit histories then that should be fine. However, if
you're planning to send surveys or messages to them, sending them
barnstars, or otherwise manipulating their on-wiki experience, that would
be problematic.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Tue, Mar 12, 2019 at 6:19 PM Haifeng Zhang 
wrote:

> Hi folks,
>
> My work needs to randomly sample new editors in each month, e.g., 100
> editors per month.
>
> Do any of you have good suggestions for how to do this efficiently?
>
> I could think of using the dump files, but wonder are there other options?
>
>
> Thanks,
>
> Haifeng Zhang
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Sampling new editors in English Wikipedia

2019-03-12 Thread Haifeng Zhang
Hi folks,

My work needs to randomly sample new editors in each month, e.g., 100 editors 
per month.

Do any of you have good suggestions for how to do this efficiently?

I could think of using the dump files, but wonder are there other options?


Thanks,

Haifeng Zhang
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l