Hi wikitech-l,

After the discussion in analytics-l [1][2] and Phabricator [3], the
Analytics team added a small amendment [4] to Wikimedia's user-agent policy
[5] with the intention of improving the quality of WMF's pageview
statistics.

The amendment asks Wikimedia bot/framework maintainers to optionally add
the word *bot* (case insensitive) to their user-agents. With that, the
analytical jobs that process request data into pageview statistics will be
capable of better identifying traffic generated by bots, and thus of better
isolating traffic originated by humans (corresponding code is already in
production [6]). The convention is optional, because modifications to the
user-agent can be a breaking change.

Targets of this convention are: bots/frameworks that can generate Wikimedia
pageviews [7] to Wikimedia sites and/or API and are not for in-situ human
consumption. Not targets: bots/frameworks used to assist in-situ human
consumption, and bots/frameworks that are otherwise well known and
recognizable like WordPress, Scrapy, etc. Note that there are many editing
bots that also generate pageviews, like when trying to copy content from
one page to another the source page is requested and the corresponding
pageview is generated.

Cheers!

[1] https://lists.wikimedia.org/pipermail/analytics/2016-January/004858.html
[2]
https://lists.wikimedia.org/pipermail/analytics/2016-February/004882.html
[3] https://phabricator.wikimedia.org/T108599
[4]
https://meta.wikimedia.org/w/index.php?title=User-Agent_policy&type=revision&diff=15343269&oldid=14833024
[5] https://meta.wikimedia.org/wiki/User-Agent_policy
[6]
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L58
[7] https://meta.wikimedia.org/wiki/Research:Page_view

-- 
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to