Re: [Wiki-research-l] Pageview API

2015-11-18 Thread Maximilian Klein
I'm not even annoyed anymore that it's 10 years late! Seriously, I really
appreciate that this came out with added features, since we all would have
been alright with just a stable stats.grok.se replacement.
On 18 Nov 2015 12:18 p.m., "Heather Ford"  wrote:

> This is awesome. Thank you!
>
> Dr Heather Ford
> University Academic Fellow
> School of Media and Communications , The
> University of Leeds
> w: hblog.org / EthnographyMatters.net  /
> t: @hfordsa 
>
>
> On 18 November 2015 at 16:02, Dario Taraborelli <
> dtarabore...@wikimedia.org> wrote:
>
>> -- Forwarded message --
>> From: Dan Andreescu 
>> To: Research into Wikimedia content and communities <
>> wiki-research-l@lists.wikimedia.org>
>> Cc:
>> Date: Wed, 18 Nov 2015 08:43:10 -0500
>> Subject: Pageview API
>>
>> Dear Data Enthusiasts,
>>
>> In collaboration with the Services team, the analytics team wishes to
>> announce a public Pageview API
>> .
>> For an example of what kind of UIs someone could build with it, check out
>> this excellent demo 
>> (code)
>> 
>> .
>>
>> The API can tell you how many times a wiki article or project is viewed
>> over a certain period.  You can break that down by views from web crawlers
>> or humans, and by desktop, mobile site, or mobile app.  And you can find
>> the 1000 most viewed articles
>> 
>> on any project, on any given day or month that we have data for.  We
>> currently have data back through October and we will be able to go back to
>> May 2015 when the loading jobs are all done.  For more information, take a
>> look at the user docs
>> .
>>
>> After many requests from the community, we were really happy to finally
>> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
>> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
>> maintaining stats.grok, and, of course, the many community members who have
>> been so patient with us all this time.
>>
>> The Research team’s Article Recommender tool
>>  already uses the API to rank pages and
>> determine relative importance.  Wiki Education Foundation’s dashboard
>>  is going to be using it to count how
>> many times an article has been viewed since a student edited it.  And there
>> are other grand plans for this data like “article finder”, which will find
>> low-rated articles with a lot of pageviews; this can be used by editors
>> looking for high-impact work.  Join the fun, we’re happy to help get you
>> started and listen to your ideas.  Also, if you find bugs or want to
>> suggest improvements, please create a task in Phabricator and tag it with
>> #Analytics-Backlog
>> .
>>
>> So what’s next?  We can think of too many directions to go into, for
>> pageview data and Wikimedia project data, in general.  We need to work with
>> you to make a great plan for the next few quarters.  Please chime in here
>>  with your needs.
>>
>> Team Analytics
>>
>> (p.s. this was also posted on analytics-l, wikitech-l, and engineering-l,
>> but I suck and forgot to cc the research list. My apologies.)
>>
>>
>>
>>
>> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
>> wikimediafoundation.org • nitens.org • @readermeter
>> 
>>
>>
>> ___
>> Wiki-research-l mailing list
>> Wiki-research-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Breaking into new Data-Spaces (workshop - CFP)

2015-11-18 Thread Aaron Halfaker
*TL;DR*
We want to get researchers in a room to experiment with infrastructure for
making open data science easier. We're focusing on three infrastructural
strategies (1) improving metadata and indexing open online community
datasets, (2) an online querying service that makes processing, joining,
and extracting subsets of data easier and (3) defining a protocol for
reporting research methods that will make studies easier to
replicate/extend.

*Title:* Breaking into new Data-Spaces: Infrastructure for Open Community
Science
*Date:* February 27, 2016
*Application deadline:* December 31, 2015
*Conference website:* http://cscw.acm.org/2016/program/workshops.php#WP-10
*Apply/info:*
https://meta.wikimedia.org/wiki/Research:Breaking_into_new_Data-Spaces
*Participants announced:* January 15, 2016

We encourage you to apply
 to a CSCW 2016
 workshop focused on advancing your ability to
do work with datasets from online communities. We will experiment with
documentation protocols and technologies that are designed to make the
process of “breaking into” a new dataset more tractable for researchers
studying open online communities.

*Who can participate*
Anyone who builds, manages, studies or is interested in studying open
online communities can apply. Fill out our application form and tell us a
bit about your relevant interests and experience.

*Organizers*
Aaron Halfaker, Jonathan Morgan, Yuvaraj Pandian - Wikimedia Foundation
Elizabeth Thiry - Boundless
Kristen Schuster, A.J. Million, Sean Goggins - University of Missouri
William Rand - University of Maryland
David Laniado - Eurecat

*Abstract*
Despite being easily accessible, open online community (OOC) data can be
difficult to use effectively. In order to access and analyze large amounts
of data, researchers must first become familiar with the meaning of data
values. Then they must find a way to obtain and process the datasets to
extract their desired vectors of behavior and content. This process is
fraught with problems that are solved (through great difficulty) over and
over again by each research team/lab that breaks into datasets for a new
OOC.

In this workshop, we'll experiment with documentation protocols and
technologies that are designed to make the process of “breaking into” a new
dataset more tractable for researchers studying open online communities.
This workshop’s purpose is to bring together researchers to test these
systems and discover problems and missed opportunities to support
iteration. Participants will also be given the opportunity to use
state-of-the-art documentation and technologies to break into a new
collection of datasets. This workshop is the direct result of a call to
action to build infrastructure for data sharing between researchers from
past CSCW workshops and related conferences.

For more information and to apply see:
https://meta.wikimedia.org/wiki/Research:Breaking_into_new_Data-Spaces
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] November 2015 Research Showcase

2015-11-18 Thread Leila Zia
A friendly reminder that this is happening in 5 min. :-)

On Mon, Nov 16, 2015 at 3:37 PM, Sarah Rodlund 
wrote:

> Hi Everyone,
>
> The next Research Showcase will be live-streamed this Wednesday, November
> 18, 2015 at 11:30 (PST).
>
> YouTube stream: http://www.youtube.com/watch?v=kXCI6whgdUA
>
> As usual, you can join the conversation on IRC at #wikimedia-research.
> And, you can watch our past research showcases here
> .
>
> We look forward to seeing you!
>
> Kind regards,
>
> Sarah R. Rodlund
> Project Coordinator-Engineering, Wikimedia Foundation
> srodl...@wikimedia.org
>
> This month:
>
> *Impact, Characteristics, and Detection of Wikipedia Hoaxes*
>
> By Srijan Kumar
>
> False information on Wikipedia raises concerns about its credibility. One
> way in which false information may be presented on Wikipedia is in the form
> of hoax articles, i.e. articles containing fabricated facts about
> nonexistent entities or events. In this talk, we study false information on
> Wikipedia by focusing on the hoax articles that have been created
> throughout its history. First, we assess the real-world impact of hoax
> articles by measuring how long they survive before being debunked, how many
> pageviews they receive, and how heavily they are referred to by documents
> on the Web. We find that, while most hoaxes are detected quickly and have
> little impact on Wikipedia, a small number of hoaxes survive long and are
> well cited across the Web. Second, we characterize the nature of successful
> hoaxes by comparing them to legitimate articles and to failed hoaxes that
> were discovered shortly after being created. We find characteristic
> differences in terms of article structure and content, embeddedness into
> the rest of Wikipedia, and features of the editor who created the hoax.
> Third, we successfully apply our findings to address a series of
> classification tasks, most notably to determine whether a given article is
> a hoax. And finally, we describe and evaluate a task involving humans
> distinguishing hoaxes from non-hoaxes. We find that humans are not
> particularly good at the task and that our automated classifier outperforms
> them by a big margin.
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] To: Research into Wikimedia content and communities

2015-11-18 Thread Aaron Halfaker
Hi James,

Somehow, this email came without context.  Is it in response to the
pageview API posting?  Regretfully that email is stuck in the mailman
moderator queue and it hasn't come through yet.  That might explain why
you've ended up in a weird thread all by yourself :)

See the archive version of this thread here:
https://lists.wikimedia.org/pipermail/wiki-research-l/2015-November/004861.html

-Aaron

On Wed, Nov 18, 2015 at 12:33 PM, James Heilman  wrote:

> Agree. A great step forwards for all of us who do outreach. Many thanks to
> everyone who made this happen :-)
>
> --
> James Heilman
> MD, CCFP-EM, Wikipedian
>
> The Wikipedia Open Textbook of Medicine
> www.opentextbookofmedicine.com
>
> As of July 2015 I am a board member of the Wikimedia Foundation
> My emails; however, do not represent the official position of the WMF
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] To: Research into Wikimedia content and communities

2015-11-18 Thread James Heilman
Agree. A great step forwards for all of us who do outreach. Many thanks to
everyone who made this happen :-)

-- 
James Heilman
MD, CCFP-EM, Wikipedian

The Wikipedia Open Textbook of Medicine
www.opentextbookofmedicine.com

As of July 2015 I am a board member of the Wikimedia Foundation
My emails; however, do not represent the official position of the WMF
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


Re: [Wiki-research-l] Pageview API

2015-11-18 Thread Heather Ford
This is awesome. Thank you!

Dr Heather Ford
University Academic Fellow
School of Media and Communications , The
University of Leeds
w: hblog.org / EthnographyMatters.net  / t:
@hfordsa 


On 18 November 2015 at 16:02, Dario Taraborelli 
wrote:

> -- Forwarded message --
> From: Dan Andreescu 
> To: Research into Wikimedia content and communities <
> wiki-research-l@lists.wikimedia.org>
> Cc:
> Date: Wed, 18 Nov 2015 08:43:10 -0500
> Subject: Pageview API
>
> Dear Data Enthusiasts,
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API
> .
> For an example of what kind of UIs someone could build with it, check out
> this excellent demo 
> (code)
> 
> .
>
> The API can tell you how many times a wiki article or project is viewed
> over a certain period.  You can break that down by views from web crawlers
> or humans, and by desktop, mobile site, or mobile app.  And you can find
> the 1000 most viewed articles
> 
> on any project, on any given day or month that we have data for.  We
> currently have data back through October and we will be able to go back to
> May 2015 when the loading jobs are all done.  For more information, take a
> look at the user docs
> .
>
> After many requests from the community, we were really happy to finally
> make this our top priority and get it done.  Huge thanks to Gabriel, Marko,
> Petr, and Eric from Services, Alexandros and all of Ops really, Henrik for
> maintaining stats.grok, and, of course, the many community members who have
> been so patient with us all this time.
>
> The Research team’s Article Recommender tool
>  already uses the API to rank pages and
> determine relative importance.  Wiki Education Foundation’s dashboard
>  is going to be using it to count how
> many times an article has been viewed since a student edited it.  And there
> are other grand plans for this data like “article finder”, which will find
> low-rated articles with a lot of pageviews; this can be used by editors
> looking for high-impact work.  Join the fun, we’re happy to help get you
> started and listen to your ideas.  Also, if you find bugs or want to
> suggest improvements, please create a task in Phabricator and tag it with
> #Analytics-Backlog
> .
>
> So what’s next?  We can think of too many directions to go into, for
> pageview data and Wikimedia project data, in general.  We need to work with
> you to make a great plan for the next few quarters.  Please chime in here
>  with your needs.
>
> Team Analytics
>
> (p.s. this was also posted on analytics-l, wikitech-l, and engineering-l,
> but I suck and forgot to cc the research list. My apologies.)
>
>
>
>
> *Dario Taraborelli  *Head of Research, Wikimedia Foundation
> wikimediafoundation.org • nitens.org • @readermeter
> 
>
>
> ___
> Wiki-research-l mailing list
> Wiki-research-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


[Wiki-research-l] Pageview API

2015-11-18 Thread Dario Taraborelli
-- Forwarded message --
From: Dan Andreescu mailto:dandree...@wikimedia.org>>
To: Research into Wikimedia content and communities 
mailto:wiki-research-l@lists.wikimedia.org>>
Cc: 
Date: Wed, 18 Nov 2015 08:43:10 -0500
Subject: Pageview API

Dear Data Enthusiasts,

In collaboration with the Services team, the analytics team wishes to announce 
a public Pageview API 
.
  For an example of what kind of UIs someone could build with it, check out 
this excellent demo  (code) 
.

The API can tell you how many times a wiki article or project is viewed over a 
certain period.  You can break that down by views from web crawlers or humans, 
and by desktop, mobile site, or mobile app.  And you can find the 1000 most 
viewed articles 

 on any project, on any given day or month that we have data for.  We currently 
have data back through October and we will be able to go back to May 2015 when 
the loading jobs are all done.  For more information, take a look at the user 
docs .

After many requests from the community, we were really happy to finally make 
this our top priority and get it done.  Huge thanks to Gabriel, Marko, Petr, 
and Eric from Services, Alexandros and all of Ops really, Henrik for 
maintaining stats.grok, and, of course, the many community members who have 
been so patient with us all this time.

The Research team’s Article Recommender tool  
already uses the API to rank pages and determine relative importance.  Wiki 
Education Foundation’s dashboard  is going to 
be using it to count how many times an article has been viewed since a student 
edited it.  And there are other grand plans for this data like “article 
finder”, which will find low-rated articles with a lot of pageviews; this can 
be used by editors looking for high-impact work.  Join the fun, we’re happy to 
help get you started and listen to your ideas.  Also, if you find bugs or want 
to suggest improvements, please create a task in Phabricator and tag it with 
#Analytics-Backlog .

So what’s next?  We can think of too many directions to go into, for pageview 
data and Wikimedia project data, in general.  We need to work with you to make 
a great plan for the next few quarters.  Please chime in here 
 with your needs.

Team Analytics

(p.s. this was also posted on analytics-l, wikitech-l, and engineering-l, but I 
suck and forgot to cc the research list.  My apologies.)




Dario Taraborelli  Head of Research, Wikimedia Foundation
wikimediafoundation.org  • nitens.org 
 • @readermeter 
___
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l