Hi all,

As part of our efforts to better serve the Wikimedia research community, we
are happy to share that we are collaborating with the Security team at WMF
to help prioritize the release of data that can be useful for your
research. The Security team is working to make more datasets privatized and
public to avoid the need for non-disclosure agreements. You can learn more
here: https://meta.wikimedia.org/wiki/Differential_privacy.

Over the next 12 months, the Security team plans to release 5 datasets:

   -

   country-language-pageview ongoing (end of 2022)
   -

   country-language-pageview historical (March 2023)
   -

   geo-aggregated grants data back to 2009 (Feb 2023)
   -

   geoeditors monthly (June 2023)
   -

   dataset informed by research community priorities identified in this
   survey (second half of 2023)

The released datasets need to meet certain privacy requirements:

   -

   They can not include any natural language (e.g. specific search queries
   or deletion logs) so as to avoid the release of personally identifiable
   information;
   -

   They need to be sufficiently large (at least thousands of entries,
   preferably more) so as to reduce noise;
   -

   The data can not be so sensitive that an individual user will be harmed
   by disclosure of the data (e.g. IP addresses, content containing personally
   identifying information).


We invite you to complete a brief survey
<https://docs.google.com/forms/d/e/1FAIpQLSe_LAt6V2Q1GUf3Z8lnt8uAOZnHTO5rNgFfufx_gDKk1znrlw/viewform?usp=sf_link>
to help us identify and prioritize the types of datasets that you would
find useful for your work. Results of this survey will inform the fifth
dataset, scheduled to be released in late 2023. This survey is conducted
via a third-party service, which may subject it to additional terms. For
more information on privacy and data-handling, see the survey privacy
statement:
https://foundation.wikimedia.org/wiki/Legal:Data_Release_Priorities_Survey_Privacy_Statement

The survey will remain open until November 3, 2022. After that time,
members of the Research and Security teams will review the data and report
out about the suggestions that were received and how the work will proceed.
If you prefer to not respond via the Google form, you can email your
feedback to us or set up a time to discuss. You can also leave questions
and comments on the Talk page:
https://meta.wikimedia.org/wiki/Differential_privacy

Thanks for your help!


Emily Lescak, WMF Research team

Hal Triedman, WMF Security team


-- 
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l@lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-le...@lists.wikimedia.org

Reply via email to