On 02/07/12 18:45, Gregor Martynus wrote:
> Hey there,
> 
> I really appreciate the help of you and the others in the mailing list,
> thanks to your help I was able to extract the data my friend needs for
> her wikipedia study. 
> 
> While it worked as a proof of concept, that the data she needs is
> available, it took to much time to process all needed data and she needs
> it another 3 to 4 times.
> 
> I contact you because you offered me to send an SQL dump back then, does
> that mean you've access to the Wikipedia database? 
> 
> I'm not an SQL expert, but I think based on the MediaWiki database
> schema, I could build the SQL queries for the 3 things she needs
> 
> 1. user_accounts(name, created_at, blocked_at) created manually in 2012
> 2. pages (namespace_id, title, is_redirect)
> 3. revisions (page_id, user_id, created_at) created by users from 1.
>     bytes_diff would be nice, if available
> 
> The goal is to find users that manually created an account, have not
> been blocked and that made their first contribution in June 2012.
> 
> Any chance you could run the queries and send me the results as sql
> dumps? Or if not, do you know somebody I could ask?
> 
> Thanks again for your help!
> 
> -- 
> Gregor Martynus

Given that your requisites here are more specific than those of the bug,
I have made the following query for you, taking advantage that the
recentchanges talbe keeps the data from last month:
 select  rc_timestamp, rc_user_text, user_registration, rc_namespace,
rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type,
page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFT
JOIN page ON (page_id=rc_cur_id)  where rc_timestamp > '201206000000'
and user_registration LIKE '2012%' order by rc_timestamp  asc


Some notes:
- This will contain from June 2 to July 2.
- No bytes_diff, but you have old and new byte len :)
- It may have some non-edit log entries.


File lives at http://toolserver.org/~platonides/sandbox/martynus.txt


I also made a list of the accounts created in 2012 which were made manually:
 SELECT   user_id,   user_name,   log_timestamp AS signed_up_at FROM
user LEFT JOIN logging  ON user_id = log_user WHERE log_type =
'newusers'    AND log_action = 'create'   AND log_timestamp >
'20120000000000

File lives at http://toolserver.org/~platonides/sandbox/martynus2.txt


And of block actions done to users registered in 2012:
 SELECT log_action, log_timestamp, log_title, user_registration FROM
logging  JOIN user on (log_title=user_name) WHERE log_type = 'block'AND
log_timestamp > '20120000000000' and user_registration > '20120000000000'

File lives at http://toolserver.org/~platonides/sandbox/martynus3.txt

I guess that will work for your needs.

Regards


PS: Rows per file:
   73426 martynus.txt
   54429 martynus2.txt
    2893 martynus3.txt


_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to