Ladsgroup added a comment. |
In T193641#4321435, @Addshore wrote:@Ladsgroup what were the queries you used to generate the data in the above comment?
To get answer for the first question:
WITH wikipedias AS ( SELECT DISTINCT dbname FROM wmf_raw.mediawiki_project_namespace_map WHERE snapshot = '2018-05' AND hostname LIKE '%wikipedia.org' ),
wikidata_editors AS ( SELECT DISTINCT event_user_text FROM wmf.mediawiki_history WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND wiki_db = 'wikidatawiki' AND NOT event_user_is_anonymous AND NOT ARRAY_CONTAINS(event_user_groups, 'bot') AND event_timestamp RLIKE '^2017-0[45].*' ) SELECT mwh.wiki_db, COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors FROM wmf.mediawiki_history mwh JOIN wikipedias w ON (mwh.wiki_db = w.dbname) JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text) WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND NOT mwh.event_user_is_anonymous AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot') AND mwh.event_timestamp RLIKE '^2017-0[45].*' GROUP BY mwh.wiki_db ORDER BY wikidata_coeditors DESC LIMIT 1000;
To get answer to the second question
WITH wikipedias AS ( SELECT DISTINCT dbname FROM wmf_raw.mediawiki_project_namespace_map WHERE snapshot = '2018-05' AND hostname LIKE '%wikipedia.org' ), wikidata_editors AS ( SELECT DISTINCT event_user_text FROM wmf.mediawiki_history WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND wiki_db = 'wikidatawiki' AND NOT event_user_is_anonymous AND NOT ARRAY_CONTAINS(event_user_groups, 'bot') AND event_timestamp RLIKE '^2018-0[45].*' ) SELECT mwh.wiki_db, COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors FROM wmf.mediawiki_history mwh JOIN wikipedias w ON (mwh.wiki_db = w.dbname) JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text) WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND NOT mwh.event_user_is_anonymous AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot') AND mwh.event_timestamp RLIKE '^2018-0[45].*' GROUP BY mwh.wiki_db ORDER BY wikidata_coeditors DESC LIMIT 1000;
And to answer the third question:
WITH wikipedias AS ( SELECT DISTINCT dbname FROM wmf_raw.mediawiki_project_namespace_map WHERE snapshot = '2018-05' AND hostname LIKE '%wikipedia.org' ), wikidata_editors AS ( SELECT DISTINCT event_user_text FROM wmf.mediawiki_history WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND wiki_db = 'wikidatawiki' AND NOT event_user_is_anonymous AND NOT ARRAY_CONTAINS(event_user_groups, 'bot') AND event_timestamp RLIKE '^201[78].*' ) SELECT mwh.wiki_db, COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors FROM wmf.mediawiki_history mwh JOIN wikipedias w ON (mwh.wiki_db = w.dbname) JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text) WHERE snapshot = '2018-05' AND event_entity = 'revision' AND event_type = 'create' AND NOT mwh.event_user_is_anonymous AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot') AND mwh.event_timestamp RLIKE '^201[78].*' GROUP BY mwh.wiki_db ORDER BY wikidata_coeditors DESC LIMIT 1000;
T193641#4304748 has a couple of issues:
- You looking at data from 2017 (not sure if that was intended or not), the example query does indeed have 2017 in it.
That was for comparison.
- I now realize that this is probably fine in the queries you ran in the above comment
- The snapshots don't match, one is using 2018-05 and the other 2018-2, these should probably match, #analytics and @JAllemandou would be able to clear up this. I'm also not sure where we can get the ID of the latest snapshot either.
I fixed it in the next run but at the end it doesn't matter much, there hasn't been lots of wikipedias created in Q2 of 2018 and they are really small to contribute much to the whole stats.
TASK DETAIL
EMAIL PREFERENCES
To: Ladsgroup
Cc: JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Aklapper, Lydia_Pintscher, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
Cc: JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Aklapper, Lydia_Pintscher, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs