Thanks a lot for help, Finn. Now my query can draw sample of new registered editors.
Best, Haifeng Zhang ________________________________ From: Wiki-research-l <wiki-research-l-boun...@lists.wikimedia.org> on behalf of f...@imm.dtu.dk <f...@imm.dtu.dk> Sent: Wednesday, March 13, 2019 12:01:59 PM To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia Haifeng, On 13/03/2019 15:56, Haifeng Zhang wrote: > Thanks for pointing me to Quarray, Finn. > > I tried a couple queries, but not sure why all took forever to get result. I am not familiar with Quarry. It might have a timeout. The user table associated with the English Wikipedia is quite large, so any operation on that may take long time. You might be able to get "timein" with a simplified SQL. For instance, the query below takes 52.35 seconds: USE enwiki_p; SELECT user_id, user_name, user_registration, user_editcount FROM user LIMIT 1000 OFFSET 32000000 > Is it possible to download relevant Media Wiki database tables (e.g., user, > user_groups, logging) and run SQL in my local machine? There are SQL files available here https://dumps.wikimedia.org/enwiki/20190301/ but I do not think the user table is there, - at least I cannot identify it. Perhaps other people would know. You might be able try the Toolforge https://tools.wmflabs.org/ You should be able to access the tables via mysql on the prompt. Login to dev.tools.wmflabs.org Then do "sql enwiki" Read more about Toolforge here: https://wikitech.wikimedia.org/wiki/Help:Toolforge /Finn > > Thanks, > > Haifeng Zhang > ________________________________ > From: Wiki-research-l <wiki-research-l-boun...@lists.wikimedia.org> on behalf > of f...@imm.dtu.dk <f...@imm.dtu.dk> > Sent: Tuesday, March 12, 2019 7:25:53 PM > To: wiki-research-l@lists.wikimedia.org > Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia > > Haifeng , > > > While some suggests the dumps or notice boards, my immediate thought was > a database query, e.g., through Quarry. It just happens that Jonathan T. > Morgan has created a query there: > > https://quarry.wmflabs.org/query/310 > > SELECT user_id, user_name, user_registration, user_editcount > FROM enwiki_p.user > WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1 > DAY),'%Y%m%d%H%i%s') > AND user_editcount > 10 > AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE > ug_group = 'bot') > AND user_name not in (SELECT REPLACE(log_title,"_"," ") from > enwiki_p.logging > where log_type = "block" and log_action = "block" > and log_timestamp > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2 > DAY),'%Y%m%d%H%i%s')); > > > You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork > here https://quarry.wmflabs.org/query/34256 querying for month, - as > another example. > > > > Finn Årup Nielsen > http://people.compute.dtu.dk/faan/ > > > On 12/03/2019 19:18, Haifeng Zhang wrote: >> Hi folks, >> >> My work needs to randomly sample new editors in each month, e.g., 100 >> editors per month. >> >> Do any of you have good suggestions for how to do this efficiently? >> >> I could think of using the dump files, but wonder are there other options? >> >> >> Thanks, >> >> Haifeng Zhang >> _______________________________________________ >> Wiki-research-l mailing list >> Wiki-research-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l