I applaud this idea. Preferably a language family with a large community of practice, 'minority' in the sense of coverage and support by modern tools and scaffolding, not in the sense of limited use.
We used to have a roughly weighted list of major world languages by (spoken, written; primary, secondary) and how well covered they were by wp (articles, contributors). Is there something like that still? //S 🌍🌏🌎 On Wed., Aug. 5, 2020, 3:19 p.m. C. Scott Ananian, <canan...@wikimedia.org> wrote: > Sorry I'm coming to this discussion a bit late, but I'd like to underline a > slightly different aspect of the concern that Phoebe raised: > > > It concerns me that, at least in the high-level project proposals I've > > seen (I haven't been tracking this closely, and haven't read the academic > > papers) I have not yet seen discussions of ethical data, or how we might > > think about identifying bias, or even how to recruit contributors and the > > impact on existing contributors. > > > > Using the terminology of Ibram X. Kendi (and others), I'd put this as: > "it's not enough to not be racist, you must actively be *anti-racist*." > > Abstract Wikipedia is a "color blind" project. Indeed it is often > described as advancing WMF goals by improving the amount of content > available for minority languages. > > However, it is built on a huge edifice of ML and AI technology which > advantages majority languages and the already-powerful. > > As Phoebe mentioned, the subtle biases of ML translation toward majority > views (selecting the "proper" gender pronoun for someone described as a > "doctor" or "professor", say) are well known, and certainly deserve to be > foregrounded from the start, as Danny has pledged to do in his response to > Phoebe. > > But the infrastructure of this project is built this way from the ground > up. Language models for European languages are orders of magnitude better > than language models for minority languages (if the latter exist at all). > The same is true for ontologies and every other constructed abstraction, > down to choices of what topics are significant enough to include in an > abstract article---but that ground has been ably covered by Kaldari and > others. So let me concentrate solely on language models in the remainder > (with some parenthetical asides, for which I hope you'll forgive me). > > I would like to challenge Abstract Wikipedia not only to be "not racist" or > "color blind", but to be actively *antiracist*. That is, instead of > passively accepting the status quo wrt language models (& etc), to commit > to actively supporting a language model in *at least one* minority > language, treating it as a first-class citizen or (better) the *main* > output of the project. That means not just looking for "a good enough > language model that happens not to be a European language" but *actively > developing the language model* so that the Abstract Wikipedia project *from > inception* has a positive effect on *at least one* community speaking a > underrepresented language with a small Wikipedia. (Again, WLOG this could > apply to general AI/ML support for many many minority groups, but I'm > sticking with "at least one" and "language model" in order to make this as > concrete and actionable as possible.) This of course also means committing > to hire a speaker of that non-European language as part of the core team > (not just an "and translations" afterthought), committing to foregrounding > that language in demonstrations, and doing outreach and community building > to the language group in question. (All the mockups I've seen have been in > German and English, and have been pitched to an English-speaking audience.) > > I don't think it is wise in 2020 to pretend that "colorblind" business as > usual will advance the goals of our organization. We need to actively work > to ensure this project has effects that *work against* the significant > pre-existing biases toward highly-educated speakers of European languages. > It is not enough to say that "someday" this "may" have an effect on > minority language groups if "somebody" ever gets around to doing it. We > must make those investments proactively and with clear intention in order > to effect the change we wish to see in the world. > -- C. Scott Ananian > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and > https://meta.wikimedia.org/wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe> _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>