[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
cscott added a comment. I've got a patch to fix the BCP 47 mappings in core: https://gerrit.wikimedia.org/r/442200 I'm hoping that if/when that's merged, we can remove some of the redundancy in wikibase and have wikidata just use the core code to do the remappings.TASK DETAILhttps://phabricator.wikimedia.org/T125073EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscottCc: cscott, hoo, XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
cscott added a comment. roa-tara would more specifically be nap-x-tara, since it is a dialect of Neapolitan.TASK DETAILhttps://phabricator.wikimedia.org/T125073EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscottCc: cscott, hoo, XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
adrianheine added a comment. @XXN First, there is no urgency whatsoever. I don't currently plan on doing this story, it's just for future reference. Second, even if I would do this change, existing data on Wikidata.org would continue to work, we would just prevent saving of these language codes. TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: adrianheine Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
XXN added a comment. @Nikki Are there cases where exists `mo` terms in Latin script, but no `ro` terms? I thinks no, so such mo terms can safely be removed. In any case, anything merged from `mo` to `ro` needs verification by native Romanian speakers. Are we forced to change these lang. codes right now? I ask because the current proposal for deletion of Moldovan Wikipedia can help us a lot to decide what to do. The normal and expected result of that proposal for deletion is deletion of all `mo` Wikimedia projects and then Wikidata lang code `mo` with all his values can be deleted all at once. There is a //little// problem with moving `mo` terms to `ro-cyrl`: Romanian Cyrillic alphabet was used before 1862 and it is *not the same* as the Moldovan Cyrillic alphabet used between 1924-1989. It's not recommended to do this move. By me, the best way is to wait a decision on proposal for deletion of Moldovan projects. Anyway, until that moment there exists several wikimedia projects with `mo` subdomain lang code and a synchronization between sitelinks code and label-description-alias code is necessary. TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: XXN Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
Nikki added a comment. @XXN: I had seen that, although as I described above, the current situation in Wikidata is different, since we have a mixture of Latin script and Cyrillic script terms for `mo` (where the Latin ones largely come from a bot copying the `ro` label and the Cyrillic ones largely come from mowiki page names). What do you think of what I proposed? (move Cyrillic terms to `ro-cyrl`, merge Latin terms with `ro`) TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
XXN added a subscriber: XXN. XXN added a comment. Regarding //mo/ro-md/ro-cyrl// see https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Deletion_of_Moldovan_Wikipedia_2 TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: XXN Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
Nikki added a comment. Yeah, `ckb-x-zam` and `roa-x-tara` should be valid (I tested them on http://r12a.github.io/apps/subtags/ and it agrees). For Serbian, even if the comments in one of the source files say it's supposed to be the Ekavian variety, I would expect users to go by what the user interface says (which doesn't seem to mention Ekavian anywhere). It'd be helpful if we could find a Serbian speaker who would know whether it really is only used for Ekavian... I was mostly talking about terms because they're more common than monolingual text statements. :) I can't think of anything where I would expect them be treated differently though, other than the special codes (`mul`, `zxx`, etc) which don't make much sense for terms. I remembered some more invalid codes: `de-formal`, `nl-informal` and `simple`. They're UI languages but occasionally people use them for content. If they stop being allowed for content, we should replace them with `de`, `nl` and `en` respectively. If they continue being allowed for content, `simple` would become `en-simple`, but there are no subtags for formal/informal, so I guess they would have to be something like `de-x-formal` and `nl-x-informal`. By the way, language names are not always localised (e.g. in English `nl` shows up as "Dutch" but `nl-informal` shows up as "Nederlands (informeel)"), is that a bug or do they need translating somewhere? (and if so, where?) TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
adrianheine added a comment. Thanks for your feedback, @Nikki. I added `nrm` to the list. As for `cbk-zam` and `roa-tara`, they could be `cbk-x-zam` and `roa-x-tara` to be valid, right? I fixed `sr` in the description. The comments in `languages/Names.php` say it's »Serbian Cyrillic ekavian« and »Serbian Latin ekavian«. I also updated `ro-mo`. In general, this is not only about terms but also about monolingual text values, and the best way to handle these codes might be different in both cases. TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: adrianheine Cc: Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes
Nikki added a subscriber: Nikki. Nikki added a comment. There's also: - `nrm` - currently described as Norman, but that code is assigned to Narum. It's not clear whether Norman has its own code. The closest is `nrf` (Jèrriais, Guernésiais) which are two of the dialects. It was created in http://www-01.sil.org/iso639-3/chg_detail.asp?id=2014-024 where someone requested `jrs` for Jèrriais but ISO 639 decided against assigning a code specifically for Jèrriais because they consider it and Guernésiais to be dialects of the same language. Instead they created `nrf`. That implies to me that `nrf` is supposed to mean Norman even if that's not one of the names they list for the language. - `cbk-zam` - Chavacano de Zamboanga, a variety of Chavacano that doesn't have its own code or language subtag - `roa-tara` - Tarantino, which also doesn't have its own code or language subtag The code for Serbian is `sr` (or `srp` for the 3-letter version, but we currently use 2-letter codes when available). `src` is Logudorese Sardinian. :) The labels for `sr-el` and `sr-ec` are simply "Serbian (Latin script)" and "Serbian (Cyrillic script)", I would have expected `sr-latn` and `sr-cyrl` because it doesn't say it has to be the Ekavian variant and there are no options for other variants. The country code for Moldova is `MD` (`MO` is Macau). The situation for `mo` is kinda weird. The (now closed) Moldovan Wikipedia is entirely in Cyrillic, and apparently the pages were copies of articles from the Romanian Wikipedia converted to Cyrillic, so any of the labels which came from there are `ro-cyrl`. Then there are a couple of thousand Latin labels for `mo`, most of which are identical to the current Romanian label. All the ones I've looked so far which aren't the same are cases where a bot copied `ro` to `mo` ages ago and `ro` was later updated. I wonder if it would actually be better to create `ro-cyrl` for the Cyrillic ones and merge the remaining `mo` things into `ro`? (in most cases we don't have separate variants for different countries, and in the few cases we do, they're really hard to maintain, so I would be in favour of avoiding `ro-md` unless it's really needed) TASK DETAIL https://phabricator.wikimedia.org/T125073 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Nikki Cc: Nikki, Fomafix, adrianheine, Aklapper, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs