[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2018-07-03 Thread cscott
cscott added a comment.
I've got a patch to fix the BCP 47 mappings in core: https://gerrit.wikimedia.org/r/442200

I'm hoping that if/when that's merged, we can remove some of the redundancy in wikibase and have wikidata just use the core code to do the remappings.TASK DETAILhttps://phabricator.wikimedia.org/T125073EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscottCc: cscott, hoo, XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2018-06-29 Thread cscott
cscott added a comment.
roa-tara would more specifically be nap-x-tara, since it is a dialect of Neapolitan.TASK DETAILhttps://phabricator.wikimedia.org/T125073EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: cscottCc: cscott, hoo, XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-02-17 Thread adrianheine
adrianheine added a comment.

@XXN First, there is no urgency whatsoever. I don't currently plan on doing 
this story, it's just for future reference. Second, even if I would do this 
change, existing data on Wikidata.org would continue to work, we would just 
prevent saving of these language codes.


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: adrianheine
Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-02-17 Thread XXN
XXN added a comment.

@Nikki Are there cases where exists `mo` terms in Latin script, but no `ro` 
terms? I thinks no, so such mo terms can safely be removed.
In any case, anything merged from `mo` to `ro` needs verification by native 
Romanian speakers.

Are we forced to change these lang. codes right now? I ask because the current 
proposal for deletion of Moldovan Wikipedia can help us a lot to decide what to 
do. The normal and expected result of that proposal for deletion is deletion of 
all `mo` Wikimedia projects and then Wikidata lang code `mo` with all his 
values can be deleted all at once.

There is a //little// problem with moving `mo` terms to `ro-cyrl`: Romanian 
Cyrillic alphabet was used before 1862 and it is *not the same* as the Moldovan 
Cyrillic alphabet used between 1924-1989. It's not recommended to do this move.

By me, the best way is to wait a decision on proposal for deletion of Moldovan 
projects. Anyway, until that moment there exists several wikimedia projects 
with `mo` subdomain lang code and a synchronization between sitelinks code and 
label-description-alias code is necessary.


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: XXN
Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-02-16 Thread Nikki
Nikki added a comment.

@XXN: I had seen that, although as I described above, the current situation in 
Wikidata is different, since we have a mixture of Latin script and Cyrillic 
script terms for `mo` (where the Latin ones largely come from a bot copying the 
`ro` label and the Cyrillic ones largely come from mowiki page names). What do 
you think of what I proposed? (move Cyrillic terms to `ro-cyrl`, merge Latin 
terms with `ro`)


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-02-16 Thread XXN
XXN added a subscriber: XXN.
XXN added a comment.

Regarding //mo/ro-md/ro-cyrl// see 
https://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Deletion_of_Moldovan_Wikipedia_2


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: XXN
Cc: XXN, Liuxinyu970226, Nikki, Fomafix, adrianheine, Aklapper, Izno, 
Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-02-05 Thread Nikki
Nikki added a comment.

Yeah, `ckb-x-zam` and `roa-x-tara` should be valid (I tested them on 
http://r12a.github.io/apps/subtags/ and it agrees).

For Serbian, even if the comments in one of the source files say it's supposed 
to be the Ekavian variety, I would expect users to go by what the user 
interface says (which doesn't seem to mention Ekavian anywhere). It'd be 
helpful if we could find a Serbian speaker who would know whether it really is 
only used for Ekavian...

I was mostly talking about terms because they're more common than monolingual 
text statements. :) I can't think of anything where I would expect them be 
treated differently though, other than the special codes (`mul`, `zxx`, etc) 
which don't make much sense for terms.

I remembered some more invalid codes: `de-formal`, `nl-informal` and `simple`. 
They're UI languages but occasionally people use them for content. If they stop 
being allowed for content, we should replace them with `de`, `nl` and `en` 
respectively. If they continue being allowed for content, `simple` would become 
`en-simple`, but there are no subtags for formal/informal, so I guess they 
would have to be something like `de-x-formal` and `nl-x-informal`.

By the way, language names are not always localised (e.g. in English `nl` shows 
up as "Dutch" but `nl-informal` shows up as "Nederlands (informeel)‎"), is that 
a bug or do they need translating somewhere? (and if so, where?)


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-01-30 Thread adrianheine
adrianheine added a comment.

Thanks for your feedback, @Nikki. I added `nrm` to the list. As for `cbk-zam` 
and `roa-tara`, they could be `cbk-x-zam` and `roa-x-tara` to be valid, right? 
I fixed `sr` in the description. The comments in `languages/Names.php` say it's 
»Serbian Cyrillic ekavian«  and »Serbian Latin ekavian«. I also updated `ro-mo`.

In general, this is not only about terms but also about monolingual text 
values, and the best way to handle these codes might be different in both cases.


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: adrianheine
Cc: Nikki, Fomafix, adrianheine, Aklapper, Izno, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T125073: [Story] Replace bad, but currently necessary language codes

2016-01-28 Thread Nikki
Nikki added a subscriber: Nikki.
Nikki added a comment.

There's also:

- `nrm` - currently described as Norman, but that code is assigned to Narum. 
It's not clear whether Norman has its own code. The closest is `nrf` (Jèrriais, 
Guernésiais) which are two of the dialects. It was created in 
http://www-01.sil.org/iso639-3/chg_detail.asp?id=2014-024 where someone 
requested `jrs` for Jèrriais but ISO 639 decided against assigning a code 
specifically for Jèrriais because they consider it and Guernésiais to be 
dialects of the same language. Instead they created `nrf`. That implies to me 
that `nrf` is supposed to mean Norman even if that's not one of the names they 
list for the language.

- `cbk-zam` - Chavacano de Zamboanga, a variety of Chavacano that doesn't have 
its own code or language subtag
- `roa-tara` - Tarantino, which also doesn't have its own code or language 
subtag

The code for Serbian is `sr` (or `srp` for the 3-letter version, but we 
currently use 2-letter codes when available). `src` is Logudorese Sardinian. :) 
The labels for `sr-el` and `sr-ec` are simply "Serbian (Latin script)" and 
"Serbian (Cyrillic script)", I would have expected `sr-latn` and `sr-cyrl` 
because it doesn't say it has to be the Ekavian variant and there are no 
options for other variants.

The country code for Moldova is `MD` (`MO` is Macau).
The situation for `mo` is kinda weird. The (now closed) Moldovan Wikipedia is 
entirely in Cyrillic, and apparently the pages were copies of articles from the 
Romanian Wikipedia converted to Cyrillic, so any of the labels which came from 
there are `ro-cyrl`. Then there are a couple of thousand Latin labels for `mo`, 
most of which are identical to the current Romanian label. All the ones I've 
looked so far which aren't the same are cases where a bot copied `ro` to `mo` 
ages ago and `ro` was later updated. I wonder if it would actually be better to 
create `ro-cyrl` for the Cyrillic ones and merge the remaining `mo` things into 
`ro`? (in most cases we don't have separate variants for different countries, 
and in the few cases we do, they're really hard to maintain, so I would be in 
favour of avoiding `ro-md` unless it's really needed)


TASK DETAIL
  https://phabricator.wikimedia.org/T125073

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: Nikki, Fomafix, adrianheine, Aklapper, Wikidata-bugs, aude, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs