Nikki added a comment.

  In T341409#9018965 <https://phabricator.wikimedia.org/T341409#9018965>, 
@thiemowmde wrote:
  
  > I might get this wrong. But as I understand the proposal it would make the 
currently established processes of how languages on wikidata.org are managed, 
requested, and confirmed (briefly described in T312845 
<https://phabricator.wikimedia.org/T312845>) obsolete.
  
  The documentation would need to be updated but it wouldn't make it completely 
obsolete. `LanguageNameUtils::ALL` is all of the language codes that MediaWiki 
knows about (≈ those it uses itself plus those which CLDR has locale data for), 
but that's still only a fraction of all valid ISO 639/BCP 47 language codes 
(languageinfo 
<https://www.wikidata.org/w/api.php?action=query&meta=languageinfo> has 978, 
ISO 639-3 has 7916), so people would still need a way to request missing codes.
  
  Whether requests for things that are still missing should be accepted by 
Wikidata first or go straight to the CLDR extension depends whether the people 
maintaining the CLDR extension are ok with people making requests for missing 
languages there.
  
  > the basic idea is that there is an "official" working group that 
intentionally reviews and accepts new languages one by one only when they are 
actually needed.
  
  That is still how it's intended to work and it still doesn't work well. The 
people who are being asked to review language codes one by one do not want to. 
People who request codes still have to wait months, if not years. Both @jhsoby 
and @amire80 have asked why we can't just enable all ISO 639-3 codes instead of 
enabling them one by one (or something to that effect), and that's what editors 
have asked for too (T289776 <https://phabricator.wikimedia.org/T289776>).
  
  In T341409#9148879 <https://phabricator.wikimedia.org/T341409#9148879>, 
@Lucas_Werkmeister_WMDE wrote:
  
  >> - This would make another 230+ languages available, reducing the number of 
languages we have to dump under `mis` (related: T289776 
<https://phabricator.wikimedia.org/T289776>)
  >
  > And if T168799: Integrate IANA language registry with language-data and 
MediaWiki (let MediaWiki "knows" all languages with ISO 639-1/2/3 codes) 
<https://phabricator.wikimedia.org/T168799> happens, that would take us the 
rest of the way to T289776: Enable all ISO 639-3 codes on Wikidata 
<https://phabricator.wikimedia.org/T289776>, right?
  
  I don't know. Does T289776 <https://phabricator.wikimedia.org/T289776> 
include labels or not?
  
  I limited this request to monolingual text and lexemes because almost every 
valid language code would be useful in Wikidata for those (lexemes: any known 
word in the language, monolingual text: native label 
<https://www.wikidata.org/wiki/Property:P1705> on the language itself, usage 
example <https://www.wikidata.org/wiki/Property:P5831> on lexemes, etc). People 
are going to add that data whether the right code is available or not, so if 
MediaWiki already knows a language code exists, I think it makes sense to allow 
it.
  
  > From a technical side, I don’t see major issues with this proposal. But we 
might want to consolidate language name sources; currently, we have some 
`wikibase-lexeme-language-name-*` messages in WikibaseLexeme (but not used by 
Wikibase), and also some languages names in the cldr extension (`LocalNames/` 
directory). Maybe we can make Wikibase fall back to the language code and also 
track the missing language name, so we can have a Grafana board for the most 
frequently used language codes without names. But I think that doesn’t need to 
block this task.
  
  MediaWiki normally shows the language code if it can't find a name, so I 
don't think Wikibase would need to do anything special there, would it?
  
  If I'm not mistaken, it should already be possible to determine which ones 
are missing using wbcontentlanguages 
<https://www.wikidata.org/w/api.php?action=query&format=json&meta=wbcontentlanguages&formatversion=2&wbclcontext=monolingualtext&wbclprop=name>
 (although I recently added all the missing names so you'd need to test it 
locally).
  
  I would be happy to see the names consolidated, they're inconsistent at the 
moment (T322139 <https://phabricator.wikimedia.org/T322139>). It's difficult to 
translate the names in the CLDR extension though, but perhaps it could be made 
translatable on translatewiki.net (like I suggested in this year's community 
wishlist 
<https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2023/Translation/Translatable_language_names>).
  
  > The additional cldr language codes are only added when asking for language 
names in a specific language, and the returned language codes vary slightly 
depending on which language you ask for:
  > [...]
  > (`de` and `bar` have additionally `en-uk`, with `bar` presumably inheriting 
it from `de` via language fallback; `pt`’s extra language code is `az-arab`.) I 
assume we always want to request the same language here, rather than make this 
depend on the user / request language; should it be the wiki content language 
(`en` on Wikidata), a hard-coded one (e.g. `en` or `qqq`), or something else?
  
  Hm, that doesn't sound good. Is that actually a bug in the CLDR extension? I 
would expect the set of language codes to be the same regardless of the 
language being used and that not being the case sounds like it would cause 
problems eventually. Perhaps it should have tests to make sure none of the 
files have extra codes that don't exist for English, or perhaps it should 
ignore any codes that aren't defined for all languages? Making the extension 
translatable would help here too, I imagine.

TASK DETAIL
  https://phabricator.wikimedia.org/T341409

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: ItamarWMDE, Bugreporter, thiemowmde, Lucas_Werkmeister_WMDE, jhsoby, 
Amire80, Lydia_Pintscher, Manuel, mrephabricator, Nikki, Danny_Benjafield_WMDE, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, Mahir256, QZanden, srishakatux, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to