On 02/07/12 18:45, Gregor Martynus wrote: > Hey there, > > I really appreciate the help of you and the others in the mailing list, > thanks to your help I was able to extract the data my friend needs for > her wikipedia study. > > While it worked as a proof of concept, that the data she needs is > available, it took to much time to process all needed data and she needs > it another 3 to 4 times. > > I contact you because you offered me to send an SQL dump back then, does > that mean you've access to the Wikipedia database? > > I'm not an SQL expert, but I think based on the MediaWiki database > schema, I could build the SQL queries for the 3 things she needs > > 1. user_accounts(name, created_at, blocked_at) created manually in 2012 > 2. pages (namespace_id, title, is_redirect) > 3. revisions (page_id, user_id, created_at) created by users from 1. > bytes_diff would be nice, if available > > The goal is to find users that manually created an account, have not > been blocked and that made their first contribution in June 2012. > > Any chance you could run the queries and send me the results as sql > dumps? Or if not, do you know somebody I could ask? > > Thanks again for your help! > > -- > Gregor Martynus
Given that your requisites here are more specific than those of the bug, I have made the following query for you, taking advantage that the recentchanges talbe keeps the data from last month: select rc_timestamp, rc_user_text, user_registration, rc_namespace, rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type, page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFT JOIN page ON (page_id=rc_cur_id) where rc_timestamp > '201206000000' and user_registration LIKE '2012%' order by rc_timestamp asc Some notes: - This will contain from June 2 to July 2. - No bytes_diff, but you have old and new byte len :) - It may have some non-edit log entries. File lives at http://toolserver.org/~platonides/sandbox/martynus.txt I also made a list of the accounts created in 2012 which were made manually: SELECT user_id, user_name, log_timestamp AS signed_up_at FROM user LEFT JOIN logging ON user_id = log_user WHERE log_type = 'newusers' AND log_action = 'create' AND log_timestamp > '20120000000000 File lives at http://toolserver.org/~platonides/sandbox/martynus2.txt And of block actions done to users registered in 2012: SELECT log_action, log_timestamp, log_title, user_registration FROM logging JOIN user on (log_title=user_name) WHERE log_type = 'block'AND log_timestamp > '20120000000000' and user_registration > '20120000000000' File lives at http://toolserver.org/~platonides/sandbox/martynus3.txt I guess that will work for your needs. Regards PS: Rows per file: 73426 martynus.txt 54429 martynus2.txt 2893 martynus3.txt _______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l