Let me top-post a question to the Wikidata dev team:
Where can we find documentation on what the Wikidata internal language
codes actually mean? In particular, how do you map the language selector
to the internal codes? I noticed some puzzling details:
* Wikidata uses "be-x-old" as a code, but MediaWiki messages for this
language seem to use "be-tarask" as a language code. So there must be a
mapping somewhere. Where?
* MediaWiki's http://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes
provides some mappings. For example, it maps "zh-yue" to "yue". Yet,
Wikidata use both of these codes. What does this mean?
Answers to Nemo's points inline:
On 04/08/13 06:15, Federico Leva (Nemo) wrote:
Markus Krötzsch, 03/08/2013 15:48:
(3) Limited language support. The script uses Wikidata's internal
language codes for string literals in RDF. In some cases, this might not
be correct. It would be great if somebody could create a mapping from
Wikidata language codes to BCP47 language codes (let me know if you
think you can do this, and I'll tell you where to put it)
These are only a handful, aren't they?
There are about 369 language codes right now. You can see the complete
list in langCodes at the bottom of the file
https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py
Most might be correct already, but it is hard to say. Also, is it okay
to create new (sub)language codes for our own purposes? Something like
simple English will hardly have an official code, but it would be bad to
export is as "en".
(4) Limited site language support. To specify the language of linked
wiki sites, the script extracts a language code from the URL of the
site. Again, this might not be correct in all cases, and it would be
great if somebody had a proper mapping from Wikipedias/Wikivoyages to
language codes.
Apart from the above, doesn't wgLanguageCode in
https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php
have what you need?
Interesting. However, the list there does not contain all 300 sites that
we currently find in Wikidata dumps (and some that we do not find there,
including things like dkwiki that seem to be outdated). The full list of
sites we support is also found in the file I mentioned above, just after
the language list (variable siteLanguageCodes).
Markus
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l