Hi all,

The next Research Showcase will be live-streamed Wednesday, March 16 at
6:30AM PT / 13:30 UTC. Find your local time here:
https://zonestamp.toolforge.org/1647437436.

The theme is: Patterns and dynamics of article quality.

YouTube stream: https://www.youtube.com/watch?v=o5e6S7ac4q4

You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase.

The Showcase will feature the following talks:
Quality monitoring in Wikipedia - A computational perspectiveBy *Animesh
Mukherjee <https://cse.iitkgp.ac.in/~animeshm/> (Indian Institute of
Technology, Kharagpur)*In this talk, I shall summarize our five-year long
research highlights concerning Wikipedia. In particular, I shall deep dive
into two of our recent works; while the first one attempts to understand
the early indications of which editors would soon go "missing" (aka missing
editors) [1], the second one investigates how the quality of a Wikipedia
article transitions over time and whether computational models could be
built to understand the characteristics of future transitions [2]. In each
case, I will present a suite of key results and the main insights that we
obtained thereof.[1] When expertise gone missing: Uncovering the loss of
prolific contributors in Wikipedia
<https://link.springer.com/chapter/10.1007/978-3-030-91669-5_23>, ICADL
2021 (pdf <https://arxiv.org/pdf/2109.09979>)[2] Quality Change: norm or
exception? Measurement, Analysis and Detection of Quality Change in
Wikipedia <https://arxiv.org/abs/2111.01496>, CSCW 2022 (pdf
<https://arxiv.org/pdf/2111.01496>)


Automatically Labeling Low Quality Content on Wikipedia by Leveraging
Editing BehaviorsBy *Sumit Asthana <http://sumitasthana.xyz/> (University
of Michigan, Ann Arbor)*Wikipedia articles aim to be definitive sources of
encyclopedic content. Yet, only 0.6% of Wikipedia articles have high
quality according to its quality scale due to insufficient number of
Wikipedia editors and enormous number of articles. Supervised Machine
Learning (ML) quality improvement approaches that can automatically
identify and fix content issues rely on manual labels of individual
Wikipedia sentence quality. However, current labeling approaches are
tedious and produce noisy labels. In this talk, I will discuss an automated
labeling approach that identifies the semantic category (e.g., adding
citations, clarifications) of historic Wikipedia edits and uses the
modified sentences prior to the edit as examples that require that semantic
improvement. Highest-rated article sentences are examples that no longer
need semantic improvements. I will discuss the performance of models
training with this labeling approach over models trained with existing
labeling approaches, and also the implications of such a large scale semi
supervised labeling approach in capturing the editing practices of
Wikipedia editors and helping them improve articles faster.Related
paper: Automatically
Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing
Behaviors <https://dl.acm.org/doi/10.1145/3479503>, CSCW 2021 (pdf
<https://arxiv.org/pdf/2108.02252>)

--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Reply via email to