Hey folks,

Since OpenSym was in San Francisco this year, we welcomed researchers
working in the Wikimedia space to join us for breakfast at the Wikimedia
Foundation on the morning after the conference.  During this event, we
shook hands with made a few quick presentations about ongoing projects and
stuff that's right around the corner.

I took some notes on what was presented and I figured that many on this
list might appreciate the notes as well.

*Wikimedia research (*Collaborate with us!
<https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations>
*):*

   - Research and data
   <https://www.mediawiki.org/wiki/Wikimedia_Research/Research_and_Data> --
   Data science and experimental systems development.
   - Design Research
   <https://www.mediawiki.org/wiki/Wikimedia_Research/Design_Research> --
   Generative and evaluative research support for product development.

(With much more overlap than is implied by the distinction)

*Communication channels:*

   - IRC: #wikimedia-research on freenode.net (webchat
   <http://webchat.freenode.net/?channels=wikimedia-research>)
      - This is "the office" for us.  It's an excellent channel for asking
      a quick question or discussing an idea.
   - Mailing list: wiki-research-l@lists.wikimedia.org (signup
   <https://lists.wikimedia.org/mailman/listinfo/wiki-research-l>)
   - WikiResearch Showcase
   <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase> -- Monthly
   event covering WMF research results and invited speakers researching WMF
   projects.

*Projects presented:*

   - Revision scoring as a service
   <https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service>
   -- State-of-the-art AI (vandalism & article quality prediction) as a web
   service.  See ORES
   <https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service>
   for current system capabilities and Wiki labels
   <https://meta.wikimedia.org/wiki/Wiki_labels>, our crowdsourced data
   gathering system.
   - Scholarly article citations
   
<https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia>
   -- An open-licensed dataset of scholarly identifiers in Wikipedia which
   notes when, historically, and identifier was first added.
   - Activity sessions
   <https://meta.wikimedia.org/wiki/Research:Activity_session> -- (coming
   soon) A dataset of sessionized editing activity.  Useful for measuring
   labor hours or studying work patterns.
   - Measuring value-added
   <https://meta.wikimedia.org/wiki/Research:Measuring_value-added> --
   (coming soon) A dataset of robust measurements of editor productivity and
   value-added.  See also Content persistence
   <https://meta.wikimedia.org/wiki/Research:Content_persistence>.
   - Clickstream dataset
   <http://ewulczyn.github.io/Wikipedia_Clickstream_Getting_Started/> -- An
   open-licensed dataset containing page view pair counts (as inferred by the
   "Referrer" header)
   - Increasing article coverage
   <https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage>
   -- This research aims to identify important content available in one
   language edition but missing from another and recommend the work to editors
   who would be most interested in translating.
   - Improving link coverage
   <https://meta.wikimedia.org/wiki/Research:Improving_link_coverage> --  an
   approach for automatically finding useful hyperlinks to add to a website by
   analyzing server access logs.

I'm sure I missed some stuff.  I invite my colleagues to supplement my
notes in their replies.  Thanks to all who joined us!

-Aaron
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to