Nikki added a comment.
In T341409#9018965 <https://phabricator.wikimedia.org/T341409#9018965>, @thiemowmde wrote: > I might get this wrong. But as I understand the proposal it would make the currently established processes of how languages on wikidata.org are managed, requested, and confirmed (briefly described in T312845 <https://phabricator.wikimedia.org/T312845>) obsolete. The documentation would need to be updated but it wouldn't make it completely obsolete. `LanguageNameUtils::ALL` is all of the language codes that MediaWiki knows about (≈ those it uses itself plus those which CLDR has locale data for), but that's still only a fraction of all valid ISO 639/BCP 47 language codes (languageinfo <https://www.wikidata.org/w/api.php?action=query&meta=languageinfo> has 978, ISO 639-3 has 7916), so people would still need a way to request missing codes. Whether requests for things that are still missing should be accepted by Wikidata first or go straight to the CLDR extension depends whether the people maintaining the CLDR extension are ok with people making requests for missing languages there. > the basic idea is that there is an "official" working group that intentionally reviews and accepts new languages one by one only when they are actually needed. That is still how it's intended to work and it still doesn't work well. The people who are being asked to review language codes one by one do not want to. People who request codes still have to wait months, if not years. Both @jhsoby and @amire80 have asked why we can't just enable all ISO 639-3 codes instead of enabling them one by one (or something to that effect), and that's what editors have asked for too (T289776 <https://phabricator.wikimedia.org/T289776>). In T341409#9148879 <https://phabricator.wikimedia.org/T341409#9148879>, @Lucas_Werkmeister_WMDE wrote: >> - This would make another 230+ languages available, reducing the number of languages we have to dump under `mis` (related: T289776 <https://phabricator.wikimedia.org/T289776>) > > And if T168799: Integrate IANA language registry with language-data and MediaWiki (let MediaWiki "knows" all languages with ISO 639-1/2/3 codes) <https://phabricator.wikimedia.org/T168799> happens, that would take us the rest of the way to T289776: Enable all ISO 639-3 codes on Wikidata <https://phabricator.wikimedia.org/T289776>, right? I don't know. Does T289776 <https://phabricator.wikimedia.org/T289776> include labels or not? I limited this request to monolingual text and lexemes because almost every valid language code would be useful in Wikidata for those (lexemes: any known word in the language, monolingual text: native label <https://www.wikidata.org/wiki/Property:P1705> on the language itself, usage example <https://www.wikidata.org/wiki/Property:P5831> on lexemes, etc). People are going to add that data whether the right code is available or not, so if MediaWiki already knows a language code exists, I think it makes sense to allow it. > From a technical side, I don’t see major issues with this proposal. But we might want to consolidate language name sources; currently, we have some `wikibase-lexeme-language-name-*` messages in WikibaseLexeme (but not used by Wikibase), and also some languages names in the cldr extension (`LocalNames/` directory). Maybe we can make Wikibase fall back to the language code and also track the missing language name, so we can have a Grafana board for the most frequently used language codes without names. But I think that doesn’t need to block this task. MediaWiki normally shows the language code if it can't find a name, so I don't think Wikibase would need to do anything special there, would it? If I'm not mistaken, it should already be possible to determine which ones are missing using wbcontentlanguages <https://www.wikidata.org/w/api.php?action=query&format=json&meta=wbcontentlanguages&formatversion=2&wbclcontext=monolingualtext&wbclprop=name> (although I recently added all the missing names so you'd need to test it locally). I would be happy to see the names consolidated, they're inconsistent at the moment (T322139 <https://phabricator.wikimedia.org/T322139>). It's difficult to translate the names in the CLDR extension though, but perhaps it could be made translatable on translatewiki.net (like I suggested in this year's community wishlist <https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2023/Translation/Translatable_language_names>). > The additional cldr language codes are only added when asking for language names in a specific language, and the returned language codes vary slightly depending on which language you ask for: > [...] > (`de` and `bar` have additionally `en-uk`, with `bar` presumably inheriting it from `de` via language fallback; `pt`’s extra language code is `az-arab`.) I assume we always want to request the same language here, rather than make this depend on the user / request language; should it be the wiki content language (`en` on Wikidata), a hard-coded one (e.g. `en` or `qqq`), or something else? Hm, that doesn't sound good. Is that actually a bug in the CLDR extension? I would expect the set of language codes to be the same regardless of the language being used and that not being the case sounds like it would cause problems eventually. Perhaps it should have tests to make sure none of the files have extra codes that don't exist for English, or perhaps it should ignore any codes that aren't defined for all languages? Making the extension translatable would help here too, I imagine. TASK DETAIL https://phabricator.wikimedia.org/T341409 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: ItamarWMDE, Bugreporter, thiemowmde, Lucas_Werkmeister_WMDE, jhsoby, Amire80, Lydia_Pintscher, Manuel, mrephabricator, Nikki, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, srishakatux, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org