[Wiki-research-l] Re: New private, granular pageview dataset

2023-06-21 Thread Nathan TeBlunthuis

Congrats on this release!  Looking forward to using it in some projects 

--

Nate

Hal Triedman  writes:

> Hello world!
>
> My name is Hal Triedman, and I’m a senior privacy engineer at WMF. I work
> to make data that WMF releases about reading, editing, and other on-wiki
> behavior safer, more granular, and more accessible to the world using
> differential
> privacy 
>   >.
>
> Today I’m reaching out to share that WMF has released almost 8 years (from
> 1 July 2015 to present) of privatized pageview data
>   >,
> partitioned by country, project, and page. This data is significantly more
> granular than other datasets we release, and should help researchers to
> disambiguate both long- and short-term trends within languages on a
> country-by-country basis — several
>   > long-standing requests
>   > from Wikimedia communities.
>
> Due to various technical factors, there are three distinct datasets:
>
>-
>
>1 July 2015 – 8 Feb 2017
>
>   >
>/ README
>
>   >
>(publishing threshold [1]: 3,500 pageviews)
>-
>
>9 Feb 2017 – 5 Feb 2023
>
>   >
>/ README
>
>   >
>(publishing threshold: 450 pageviews)
>-
>
>6 Feb 2023 – present
>
>   >
>/ README
>
>   >
>(publishing threshold: 90 pageviews)
>
>
> API access to this data should be coming in the next few months. In the
> interim, I’ve built an example python notebook
>   >
> illustrating how one might access the data in its current csv format, as
> well as several different kinds of simple analyses that can be done with it.
>
> I also want to invite the research community to join me for a brief demo of
> this project at the July Research Showcase
>   >. In the
> meantime, please feel free to reach out with any questions on the project talk
> page 
>   >.
>
> For more information about WMF’s work on differential privacy more
> generally, see the differential privacy homepage on meta
>   >. And in the future,
> look for more announcements of privatized datasets on editor behavior,
> on-wiki search, centralnotice impressions and clicks, and more.
>
> Best,
>
> Hal
>

[Wiki-research-l] Re: Wikimedia Research Showcase June 21 at 16:30 UTC

2023-06-21 Thread Pablo Aragón
Hi all,

A friendly reminder that this is starting in about 30 minutes. We hope you
can join us!

Best,

On Thu, Jun 15, 2023 at 10:53 AM Pablo Aragón  wrote:

> Hi again,
>
> There was an error in the previous message: the title of the second
> presentation is *“How do you represent my gender? Challenges and
> opportunities from the Wikidata Gender Diversity project”*.
>
> Hope you can join us!
>
> Warm regards,
>
> On Thu, Jun 15, 2023 at 9:16 AM Pablo Aragón 
> wrote:
>
>> Hi all,
>>
>> The next Research Showcase, with the theme of *Wikimedia and LGBTQIA+*,
>> will be live-streamed Wednesday, June 21 at 16:30 UTC. Find your local time
>> here .
>>
>> YouTube stream: https://www.youtube.com/watch?v=AOD2ZdxRNfo
>>
>> You can join the conversation on IRC at #wikimedia-research or on the
>> YouTube chat.
>>
>> This month's presentations:
>>
>>- *Multilingual Contextual Affective Analysis of LGBT People
>>Portrayals in Wikipedia*
>>- *Speaker*: Chan Park, Carnegie Mellon University
>>   - *Abstract*: In this talk, I present our research on analyzing
>>   the portrayal of LGBT individuals in their biographies on Wikipedia, 
>> with a
>>   particular focus on subtle word connotations and cross-cultural
>>   comparisons. We aim to address two primary research questions: 1) How 
>> can
>>   we effectively measure the nuanced connotations of words in 
>> multilingual
>>   texts, which reflect sentiments, power dynamics, and agency? 2) How 
>> can we
>>   analyze the portrayal of a specific group, such as the LGBT community, 
>> and
>>   compare these portrayals across different languages? To answer these
>>   questions, we collect the Multilingual Contextualized Connotation 
>> Frames
>>   dataset, comprising 2,700 examples in English, Spanish, and Russian. We
>>   also develop a new multilingual model based on pre-trained multilingual
>>   language models. Additionally, we devise a matching algorithm to 
>> construct
>>   a comparison corpus for the target corpus, isolating the attribute of
>>   interest. Finally, we showcase how our developed models and constructed
>>   corpora enable us to conduct cross-cultural analysis of LGBT People
>>   Portrayals on Wikipedia. Our results reveal systematic differences in 
>> how
>>   the LGBT community is portrayed across languages, surfacing cultural
>>   differences in narratives and signs of social biases.
>>   - *Paperː* Park, C. Y., Yan, X., Field, A., & Tsvetkov, Y. (2021,
>>   May). Multilingual contextual affective analysis of LGBT people 
>> portrayals
>>   in Wikipedia. In Proceedings of the International AAAI Conference on 
>> Web
>>   and Social Media (Vol. 15, pp. 479-490).
>>   
>>
>>
>>- *Visual gender biases in Wikipediaː A systematic evaluation across
>>the ten most spoken languages*
>>   - *Speaker*: Daniele Metilli, University College London
>>   - *Abstract*: Wikidata Gender Diversity (WiGeDi) is a one-year
>>   project funded through the Wikimedia Research Fund. The project is 
>> studying
>>   gender diversity in Wikidata, focusing on marginalized gender 
>> identities
>>   such as those of trans and non-binary people, and adopting a queer and
>>   intersectional feminist perspective. The project is organised in three
>>   strands — model, data, and community. First, we are looking at how the
>>   current Wikidata ontology model represents gender, and the extent to 
>> which
>>   this representation is inclusive of marginalized gender identities. We 
>> are
>>   analysing the data stored in the knowledge base to gather insights and
>>   identify possible gaps and biases. Finally, we are looking at how the
>>   community has handled the move towards the inclusion of a wider 
>> spectrum of
>>   gender identities by studying a corpus of user discussions through
>>   computational linguistics methods. This presentation will report on the
>>   current status of the Wikidata Gender Diversity project and the 
>> envisioned
>>   outcomes. We will discuss the main challenges that we are facing and 
>> the
>>   opportunities that our project will potentially enable, on Wikidata and
>>   beyond.
>>   - *Paperː* Metilli D. & Paolini C. (in press). ‘Non-binary gender
>>   representation in Wikidata’. In: Provo A., Burlingame K. & Watson B.M.
>>   Ethics in Linked Data. Litwin Books.
>>   
>>
>> You can watch our past Research Showcases here: 
>> https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
>>
>>
>> Hope you can join us!
>>
>> Warm regards,
>>
>> --
>>
>> *Pablo Aragón (he/him)*
>> Research Scientist
>> Wikimedia Foundation
>> https://research.wikimedia.org
>>
>
___
Wiki-research-l mailing list -- 

[Wiki-research-l] [CfP] SEMANTiCS 2023 EU - Call for Workshop Papers

2023-06-21 Thread Anisa Rula & Jennifer D'Souza

* We apologize if you receive multiple copies of this CFP *
* For the online version of this call, please visit: 
https://2023-eu.semantics.cc/page/workshops *


SEMANTiCS 2023 (20th-22nd September - Leipzig, Germany) is hosting an 
enriched collection of three workshops accepting submissions for 
contributions:


Onto4FAIR: 3rd Workshop on Ontologies for FAIR and FAIR Ontologies
Organizers: Cassia Trojahn (Institut de Recherche en Informatique de 
Toulouse, France), Luiz Olavo Bonino da Silva Santos (University of 
Twente, Leiden University Medical Centre, the Netherlands), Giancarlo 
Guizzardi (University of Twente, the Netherlands), Clement Jonquet 
(French National Research Institute for Agriculture, Food and 
Environment, Mathematics, Informatics and Statistics for Environment and 
Agronomy research unit, Montpellier, France)

https://onto4fair.github.io/2023-semantics.html

Sem4Tra: 5th International Workshop On A Semantic Data Space For Transport
Organizers: David Chaves Fraga (Senior Researcher, UPM & KULeuven), 
Mersedeh Sadeghi (Senior Researcher, University of Cologne), Shahrom 
Sohi (Researcher, WU), Julián Rojas (Postdoc Researcher, imec - IDLab 
UGent), Pieter Colpaert (Senior Researcher, imec - IDLab UGent)

https://sem4tra2023.linkeddata.es/

NLP4KGC: 2nd Workshop on Natural Language Processing for Knowledge Graph 
Construction
Organizers: Edlira Vakaj (Birmingham City University, Bermingham, UK), 
Sanju Tiwari (Universidad Autónoma de Tamaulipas, Tamaulipas, Mexico), 
Rizou Stamatia (Singular Logic, Athens, Greece), Nandana 
Mihindukulasooriya (IBM Research, Dublin, Ireland), Fernando 
Ortiz-Rodríguez (Universidad Autónoma de Tamaulipas, Tamaulipas, 
Mexico), Ryan Mcgranaghan (NASA Jet Propulsion Laboratory, California, 
United States)

https://sites.google.com/view/2nd-nlp4kgc/home

Looking forward to your submissions!

With kind regards,

Workshop & Tutorial Chairs
___
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org