Hi wikitech-l, After the discussion in analytics-l [1][2] and Phabricator [3], the Analytics team added a small amendment [4] to Wikimedia's user-agent policy [5] with the intention of improving the quality of WMF's pageview statistics.
The amendment asks Wikimedia bot/framework maintainers to optionally add the word *bot* (case insensitive) to their user-agents. With that, the analytical jobs that process request data into pageview statistics will be capable of better identifying traffic generated by bots, and thus of better isolating traffic originated by humans (corresponding code is already in production [6]). The convention is optional, because modifications to the user-agent can be a breaking change. Targets of this convention are: bots/frameworks that can generate Wikimedia pageviews [7] to Wikimedia sites and/or API and are not for in-situ human consumption. Not targets: bots/frameworks used to assist in-situ human consumption, and bots/frameworks that are otherwise well known and recognizable like WordPress, Scrapy, etc. Note that there are many editing bots that also generate pageviews, like when trying to copy content from one page to another the source page is requested and the corresponding pageview is generated. Cheers! [1] https://lists.wikimedia.org/pipermail/analytics/2016-January/004858.html [2] https://lists.wikimedia.org/pipermail/analytics/2016-February/004882.html [3] https://phabricator.wikimedia.org/T108599 [4] https://meta.wikimedia.org/w/index.php?title=User-Agent_policy&type=revision&diff=15343269&oldid=14833024 [5] https://meta.wikimedia.org/wiki/User-Agent_policy [6] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L58 [7] https://meta.wikimedia.org/wiki/Research:Page_view -- *Marcel Ruiz Forns* Analytics Developer Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l