GoranSMilovanovic added a subscriber: Milimetric. GoranSMilovanovic added a comment.
@Lea_WMDE Ok, here is a direct test (Pyspark code against the wmf.pageviews_hourly <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_hourly> table): pw = sqlContext.sql('SELECT namespace_id, access_method, agent_type, SUM(view_count) AS pageviews \ FROM wmf.pageview_hourly\ WHERE year = ' + str(d.year) + ' AND month = ' + str(d.month) + ' AND day = ' + str(d.day) + \ ' AND project = "wikidata" \ AND (namespace_id = 0 OR namespace_id = 120 OR namespace_id = 146 OR namespace_id = 640) \ GROUP BY namespace_id, access_method, agent_type ORDER BY namespace_id, access_method, agent_type') where `d` is June 13, 2019: In [31]: d Out[31]: datetime.datetime(2019, 6, 13, 15, 29, 14, 874165) The query results in the `pw` DataFrame: [Row(namespace_id=0, access_method='desktop', agent_type='spider', pageviews=3713136), Row(namespace_id=0, access_method='desktop', agent_type='user', pageviews=413537), Row(namespace_id=0, access_method='mobile web', agent_type='spider', pageviews=408138), Row(namespace_id=0, access_method='mobile web', agent_type='user', pageviews=115864), Row(namespace_id=120, access_method='desktop', agent_type='spider', pageviews=7084), Row(namespace_id=120, access_method='desktop', agent_type='user', pageviews=11586), Row(namespace_id=120, access_method='mobile web', agent_type='spider', pageviews=1418), Row(namespace_id=120, access_method='mobile web', agent_type='user', pageviews=3193), Row(namespace_id=146, access_method='desktop', agent_type='spider', pageviews=938), Row(namespace_id=146, access_method='desktop', agent_type='user', pageviews=179), Row(namespace_id=146, access_method='mobile web', agent_type='spider', pageviews=167), Row(namespace_id=146, access_method='mobile web', agent_type='user', pageviews=8), Row(namespace_id=640, access_method='desktop', agent_type='spider', pageviews=1086), Row(namespace_id=640, access_method='desktop', agent_type='user', pageviews=133), Row(namespace_id=640, access_method='mobile web', agent_type='spider', pageviews=3)] which matches exactly what we get for June 13, 2019 from our new Dashboard <http://wmdeanalytics.wmflabs.org/WD_pageviewsPerNamespace/>. Moreover, let's have a look at the total number of pageviews for `user` (i.e. `spiders` are excluded like in Wikistats2) for June 13, 2019: pw = sqlContext.sql('SELECT SUM(view_count) AS pageviews \ FROM wmf.pageview_hourly\ WHERE year = ' + str(d.year) + ' AND month = ' + str(d.month) + ' AND day = ' + str(d.day) + \ ' AND project = "wikidata" \ AND agent_type = "user"') results in Row(pageviews=1420740) which is far bellow the number reported on Wikistats2 for June 13, 2019, which is: `5,764,558`. @Milimetric I am looking at the pageviews data from Wikidata for June 13, 2019, at: https://stats.wikimedia.org/v2/#/wikidata.org/reading/total-page-views/normal|bar|1-month|~total|daily and I can't seem to be able to reproduce it. Could you let me know what could be the possible source of differences? Thank you. TASK DETAIL https://phabricator.wikimedia.org/T208567 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Milimetric, GoranSMilovanovic, Aklapper, WMDE-leszek, Lea_WMDE, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs