Thanks a lot for help, Finn. Now my query can draw sample of new registered 
editors.


Best,

Haifeng Zhang
________________________________
From: Wiki-research-l <wiki-research-l-boun...@lists.wikimedia.org> on behalf 
of f...@imm.dtu.dk <f...@imm.dtu.dk>
Sent: Wednesday, March 13, 2019 12:01:59 PM
To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia

Haifeng,


On 13/03/2019 15:56, Haifeng Zhang wrote:
> Thanks for pointing me to Quarray, Finn.
>
> I tried a couple queries, but not sure why all took forever to get result.

I am not familiar with Quarry. It might have a timeout. The user table
associated with the English Wikipedia is quite large, so any operation
on that may take long time.

You might be able to get "timein" with a simplified SQL. For instance,
the query below takes 52.35 seconds:

USE enwiki_p;

SELECT user_id, user_name, user_registration, user_editcount
        FROM user
LIMIT 1000
OFFSET 32000000



> Is it possible to download relevant Media Wiki database tables (e.g., user, 
> user_groups, logging) and run SQL in my local machine?

There are SQL files available here
https://dumps.wikimedia.org/enwiki/20190301/ but I do not think the user
table is there, - at least I cannot identify it. Perhaps other people
would know.

You might be able try the Toolforge https://tools.wmflabs.org/ You
should be able to access the tables via mysql on the prompt.

Login to dev.tools.wmflabs.org
Then do "sql enwiki"

Read more about Toolforge here:
https://wikitech.wikimedia.org/wiki/Help:Toolforge


/Finn

>
> Thanks,
>
> Haifeng Zhang
> ________________________________
> From: Wiki-research-l <wiki-research-l-boun...@lists.wikimedia.org> on behalf 
> of f...@imm.dtu.dk <f...@imm.dtu.dk>
> Sent: Tuesday, March 12, 2019 7:25:53 PM
> To: wiki-research-l@lists.wikimedia.org
> Subject: Re: [Wiki-research-l] Sampling new editors in English Wikipedia
>
> Haifeng ,
>
>
> While some suggests the dumps or notice boards, my immediate thought was
> a database query, e.g., through Quarry. It just happens that Jonathan T.
> Morgan has created a query there:
>
> https://quarry.wmflabs.org/query/310
>
> SELECT user_id, user_name, user_registration, user_editcount
>          FROM enwiki_p.user
>          WHERE user_registration > DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 1
> DAY),'%Y%m%d%H%i%s')
>          AND user_editcount > 10
>          AND user_id NOT IN (SELECT ug_user FROM enwiki_p.user_groups WHERE
> ug_group = 'bot')
>          AND user_name not in (SELECT REPLACE(log_title,"_"," ") from
> enwiki_p.logging
>                  where log_type = "block" and log_action = "block"
>                  and log_timestamp >  DATE_FORMAT(DATE_SUB(NOW(),INTERVAL 2
> DAY),'%Y%m%d%H%i%s'));
>
>
> You may fork from that query. There is R. Stuart Geiger (Staeiou)'s fork
> here https://quarry.wmflabs.org/query/34256 querying for month, - as
> another example.
>
>
>
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> On 12/03/2019 19:18, Haifeng Zhang wrote:
>> Hi folks,
>>
>> My work needs to randomly sample new editors in each month, e.g., 100 
>> editors per month.
>>
>> Do any of you have good suggestions for how to do this efficiently?
>>
>> I could think of using the dump files, but wonder are there other options?
>>
>>
>> Thanks,
>>
>> Haifeng Zhang
>> _______________________________________________
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to