Re: [Wikitech-l] Pageview API

2015-11-18 Thread Dan Andreescu
Quick general reminder.  Please tag tasks with "Analytics-Backlog" instead
of "Analytics" for now.  We need to clean that up, but we just haven't
gotten around to it.

On Wed, Nov 18, 2015 at 9:05 AM, Dan Andreescu 
wrote:

> Nice work on the API!
>>
>> I wrote a basic consumer of this API at
>> http://codepen.io/Krinkle/full/wKOMMN#wikimdia-pageviews
>
>
> Cool!  Check out dv.wikipedia.org though, some of the RTL is messing with
> your (N views) parens.
>
> The only hurdle I found is that the 'articles' property is itself
>> nested/double encoded JSON, instead of a plain object. This was somewhat
>> unexpected and makes the API harder to use.
>>
>
> Right, for sure.  The data had to be stuffed that way to save space in
> Cassandra.  So we could parse it and reshape the response in RESTBase, and
> that seems like a good idea and probably wouldn't hurt performance too
> much.  Do you think it's worth the breaking change to the format?  I'll
> post on the bug that MZ filed.
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Pageview API

2015-11-18 Thread Dan Andreescu
>
> Nice work on the API!
>
> I wrote a basic consumer of this API at
> http://codepen.io/Krinkle/full/wKOMMN#wikimdia-pageviews


Cool!  Check out dv.wikipedia.org though, some of the RTL is messing with
your (N views) parens.

The only hurdle I found is that the 'articles' property is itself
> nested/double encoded JSON, instead of a plain object. This was somewhat
> unexpected and makes the API harder to use.
>

Right, for sure.  The data had to be stuffed that way to save space in
Cassandra.  So we could parse it and reshape the response in RESTBase, and
that seems like a good idea and probably wouldn't hurt performance too
much.  Do you think it's worth the breaking change to the format?  I'll
post on the bug that MZ filed.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Pageview API

2015-11-17 Thread MZMcBride
Dan Andreescu wrote:
>The API can tell you how many times a wiki article or project is viewed
>over a certain period.  You can break that down by views from web crawlers
>or humans, and by desktop, mobile site, or mobile app.  And you can find
>the 1000 most viewed articles
>access/2015/11/11>
>on any project, on any given day or month that we have data for.  We
>currently have data back through October and we will be able to go back to
>May 2015 when the loading jobs are all done.

This looks very promising. Congratulations to all on getting this launched!

I hit a bug involving gzip compression and HTTP headers that was quickly
fixed (). Now that I'm
lightly poking at the API's data, the "Paul_Elio" article on the English
Wikipedia allegedly received 4,832,338 views on 2015-11-16, according to
. This anomaly and a few others
make me a bit wary about the accuracy of this data. That said, a lot of
the results look about right and are easily explained ("Charlie_Sheen" and
"November_2015_Paris_attacks" are easy examples).

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Pageview API

2015-11-17 Thread Krinkle
Nice work on the API!

I wrote a basic consumer of this API at
http://codepen.io/Krinkle/full/wKOMMN#wikimdia-pageviews

The only hurdle I found is that the 'articles' property is itself
nested/double encoded JSON, instead of a plain object. This was somewhat
unexpected and makes the API harder to use.

Code at http://codepen.io/Krinkle/pen/wKOMMN?editors=001.

(Supported in Chrome, Firefox, Safari and IE10+)

On Wed, Nov 18, 2015 at 3:09 AM, MZMcBride  wrote:

> Dan Andreescu wrote:
> >The API can tell you how many times a wiki article or project is viewed
> >over a certain period.  You can break that down by views from web crawlers
> >or humans, and by desktop, mobile site, or mobile app.  And you can find
> >the 1000 most viewed articles
> ><
> https://wikimedia.org/api/rest_v1/metrics/pageviews/top/es.wikipedia/all-
> >access/2015/11/11>
> >on any project, on any given day or month that we have data for.  We
> >currently have data back through October and we will be able to go back to
> >May 2015 when the loading jobs are all done.
>
> This looks very promising. Congratulations to all on getting this launched!
>
> I hit a bug involving gzip compression and HTTP headers that was quickly
> fixed (). Now that I'm
> lightly poking at the API's data, the "Paul_Elio" article on the English
> Wikipedia allegedly received 4,832,338 views on 2015-11-16, according to
> . This anomaly and a few others
> make me a bit wary about the accuracy of this data. That said, a lot of
> the results look about right and are easily explained ("Charlie_Sheen" and
> "November_2015_Paris_attacks" are easy examples).
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Pageview API

2015-11-17 Thread MZMcBride
Krinkle wrote:
>The only hurdle I found is that the 'articles' property is itself
>nested/double encoded JSON, instead of a plain object. This was somewhat
>unexpected and makes the API harder to use.

I filed  about this.

MZMcBride wrote:
>Now that I'm lightly poking at the API's data, the "Paul_Elio" article on
>the English Wikipedia allegedly received 4,832,338 views on 2015-11-16,
>according to .

I decided that the numbers for this page are strange enough to warrant
an investigatory task: .

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Pageview API

2015-11-16 Thread Dan Andreescu
Dear Data Enthusiasts,


In collaboration with the Services team, the analytics team wishes to
announce a public Pageview API
.
For an example of what kind of UIs someone could build with it, check out
this excellent demo  (code)

.


The API can tell you how many times a wiki article or project is viewed
over a certain period.  You can break that down by views from web crawlers
or humans, and by desktop, mobile site, or mobile app.  And you can find
the 1000 most viewed articles

on any project, on any given day or month that we have data for.  We
currently have data back through October and we will be able to go back to
May 2015 when the loading jobs are all done.  For more information, take a
look at the user docs
.


After many requests from the community, we were really happy to finally
make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
maintaining stats.grok, and, of course, the many community members who have
been so patient with us all this time.


The Research team’s Article Recommender tool 
already uses the API to rank pages and determine relative importance.  Wiki
Education Foundation’s dashboard  is going
to be using it to count how many times an article has been viewed since a
student edited it.  And there are other grand plans for this data like
“article finder”, which will find low-rated articles with a lot of
pageviews; this can be used by editors looking for high-impact work.  Join
the fun, we’re happy to help get you started and listen to your ideas.
Also, if you find bugs or want to suggest improvements, please create a
task in Phabricator and tag it with #Analytics-Backlog
.


So what’s next?  We can think of too many directions to go into, for
pageview data and Wikimedia project data, in general.  We need to work with
you to make a great plan for the next few quarters.  Please chime in here
 with your needs.


Team Analytics
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l