Ladsgroup added a comment.

@Ladsgroup what were the queries you used to generate the data in the above comment?

To get answer for the first question:

WITH
wikipedias AS (
  SELECT
    DISTINCT dbname
  FROM wmf_raw.mediawiki_project_namespace_map
  WHERE snapshot = '2018-05'
    AND hostname LIKE '%wikipedia.org'
),

wikidata_editors AS (
  SELECT
    DISTINCT event_user_text
  FROM wmf.mediawiki_history
  WHERE snapshot = '2018-05'
    AND event_entity = 'revision'
    AND event_type = 'create'
    AND wiki_db = 'wikidatawiki'
    AND NOT event_user_is_anonymous
    AND NOT ARRAY_CONTAINS(event_user_groups, 'bot')
    AND event_timestamp RLIKE '^2017-0[45].*'
)

SELECT
  mwh.wiki_db,
  COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors
FROM wmf.mediawiki_history mwh
  JOIN wikipedias w ON (mwh.wiki_db = w.dbname)
  JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text)
WHERE snapshot = '2018-05'
  AND event_entity = 'revision'
  AND event_type = 'create'
  AND NOT mwh.event_user_is_anonymous
  AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot')
  AND mwh.event_timestamp RLIKE '^2017-0[45].*'
GROUP BY
  mwh.wiki_db
ORDER BY wikidata_coeditors DESC
LIMIT 1000;

To get answer to the second question

WITH
wikipedias AS (
  SELECT
    DISTINCT dbname
  FROM wmf_raw.mediawiki_project_namespace_map
  WHERE snapshot = '2018-05'
    AND hostname LIKE '%wikipedia.org'
),

wikidata_editors AS (
  SELECT
    DISTINCT event_user_text
  FROM wmf.mediawiki_history
  WHERE snapshot = '2018-05'
    AND event_entity = 'revision'
    AND event_type = 'create'
    AND wiki_db = 'wikidatawiki'
    AND NOT event_user_is_anonymous
    AND NOT ARRAY_CONTAINS(event_user_groups, 'bot')
    AND event_timestamp RLIKE '^2018-0[45].*'
)

SELECT
  mwh.wiki_db,
  COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors
FROM wmf.mediawiki_history mwh
  JOIN wikipedias w ON (mwh.wiki_db = w.dbname)
  JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text)
WHERE snapshot = '2018-05'
  AND event_entity = 'revision'
  AND event_type = 'create'
  AND NOT mwh.event_user_is_anonymous
  AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot')
  AND mwh.event_timestamp RLIKE '^2018-0[45].*'
GROUP BY
  mwh.wiki_db
ORDER BY wikidata_coeditors DESC
LIMIT 1000;

And to answer the third question:

WITH
wikipedias AS (
  SELECT
    DISTINCT dbname
  FROM wmf_raw.mediawiki_project_namespace_map
  WHERE snapshot = '2018-05'
    AND hostname LIKE '%wikipedia.org'
),

wikidata_editors AS (
  SELECT
    DISTINCT event_user_text
  FROM wmf.mediawiki_history
  WHERE snapshot = '2018-05'
    AND event_entity = 'revision'
    AND event_type = 'create'
    AND wiki_db = 'wikidatawiki'
    AND NOT event_user_is_anonymous
    AND NOT ARRAY_CONTAINS(event_user_groups, 'bot')
    AND event_timestamp RLIKE '^201[78].*'
)

SELECT
  mwh.wiki_db,
  COUNT(DISTINCT mwh.event_user_text) as wikidata_coeditors
FROM wmf.mediawiki_history mwh
  JOIN wikipedias w ON (mwh.wiki_db = w.dbname)
  JOIN wikidata_editors wde ON (mwh.event_user_text = wde.event_user_text)
WHERE snapshot = '2018-05'
  AND event_entity = 'revision'
  AND event_type = 'create'
  AND NOT mwh.event_user_is_anonymous
  AND NOT ARRAY_CONTAINS(mwh.event_user_groups, 'bot')
  AND mwh.event_timestamp RLIKE '^201[78].*'
GROUP BY
  mwh.wiki_db
ORDER BY wikidata_coeditors DESC
LIMIT 1000;

T193641#4304748 has a couple of issues:

  • You looking at data from 2017 (not sure if that was intended or not), the example query does indeed have 2017 in it.

That was for comparison.

    • I now realize that this is probably fine in the queries you ran in the above comment
  • The snapshots don't match, one is using 2018-05 and the other 2018-2, these should probably match, #analytics and @JAllemandou would be able to clear up this. I'm also not sure where we can get the ID of the latest snapshot either.

I fixed it in the next run but at the end it doesn't matter much, there hasn't been lots of wikipedias created in Q2 of 2018 and they are really small to contribute much to the whole stats.


TASK DETAIL
https://phabricator.wikimedia.org/T193641

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: JAllemandou, Jonas, RazShuty, Ladsgroup, Addshore, Aklapper, Lydia_Pintscher, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to