Hi all, Just a friendly reminder that we'll be starting in approximately 30 minutes. https://www.youtube.com/watch?v=ntgRsMaDlsw
On Mon, Oct 16, 2023 at 3:29 PM Kinneret Gordon <kgor...@wikimedia.org> wrote: > Hi all, > > The next Research Showcase, focused on *Data Privacy*, will be > live-streamed on Wednesday, October 18, at 9:30 AM PST / 16:30 UTC. Find > your local time here <https://zonestamp.toolforge.org/1697646641>. > > YouTube stream: https://www.youtube.com/watch?v=ntgRsMaDlsw. As usual, > you can join the conversation in the YouTube chat as soon as the showcase goes > live. > > This month's presentations: > Wikipedia Reader Navigation: When Synthetic Data Is EnoughBy *Akhil > Arora, EPFL*Every day millions of people read Wikipedia. When navigating > the vast space of available topics using hyperlinks, readers describe > trajectories on the article network. Understanding these navigation > patterns is crucial to better serve readers’ needs and address structural > biases and knowledge gaps. However, systematic studies of navigation on > Wikipedia are hindered by a lack of publicly available data due to the > commitment to protect readers' privacy by not storing or sharing > potentially sensitive data. In this paper, we ask: How well can Wikipedia > readers' navigation be approximated by using publicly available resources, > most notably the Wikipedia clickstream data > <https://wikinav.toolforge.org/>? We systematically quantify the > differences between real navigation sequences and synthetic sequences > generated from the clickstream data, in 6 analyses across 8 Wikipedia > language versions. Overall, we find that the differences between real and > synthetic sequences are statistically significant, but with small effect > sizes, often well below 10%. This constitutes quantitative evidence for the > utility of the Wikipedia clickstream data as a public resource: clickstream > data can closely capture reader navigation on Wikipedia and provides a > sufficient approximation for most practical downstream applications relying > on reader data. More broadly, this study provides an example for how > clickstream-like data can generally enable research on user navigation on > online platforms while protecting users’ privacy. > How to tell the world about data you cannot show them: Differential > privacy at the Wikimedia FoundationBy *Hal Triedman, Wikimedia Foundation*The > Wikimedia Foundation (WMF), by virtue of its centrality on the internet, > collects lots of data about platform activities. Some of that data is made > public (e.g. global daily pageviews); other data types are not shared (or > are pseudonymized prior to sharing), largely due to privacy concerns. > Differential privacy is a statistical definition of privacy that has gained > prominence in academia, but is still an emerging technology in industry. In > this talk, I share the story of how we put differential privacy into > production at the WMF, through looking at the case study of geolocated > daily pageview counts. > You can also watch our past research showcases here: > https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase > > Best, > Kinneret > -- > > Kinneret Gordon > > Lead Research Community Officer > > Wikimedia Foundation <https://wikimediafoundation.org/> > > > -- > > Kinneret Gordon > > Lead Research Community Officer > > Wikimedia Foundation <https://wikimediafoundation.org/> > >
_______________________________________________ Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/U7BVSTYXRAGASQ7Z43DXJDZ2E6UIEMFG/ To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org