"Scott MacLeod" <worlduniversityandsch...@gmail.com> writes:

> Hi Joe, Magnus, Andrew, GerardM, Jane, Daniel and Wikidatans, 
> Since "Language fallback is not a luxury like it is for
> British English, it is essential for all the smaller languages.
> It is what prevents it from being editable / usable" (per GerardM),
> and in terms of Reasonator, statements, and careful design (DanielK),
> what are current Wikidata processes to plan eventually for all
> 7,106 living languages (plus even dead and invented languages)
> in the world per "Ethnologue: Languages of the World, Seventeenth edition"
> (http://www.ethnologue.com/statistics/size), as people add them, and use,
> for example, the ISO coding system (or similar) for this, to anticipate
> not yet added languages, and especially for 'smaller' languages
> that GerardM mentions?

Just FYI, the ISO 639 and Ethnologue are grossly incomplete in their
coverage of world languages. One must assume some 10 times to 100 times
more natural languages are currently in use than listed.

Some single additions have been made through the BCP47 and IANA, such as
"en-GB-scouse" representing the Scouse dialect of British English, or
"sl-rozaj-lipaw" — the Lipovaz dialect of Resian which is itself a
variant of Slovenian spoken in Italy. In other fields, due differentiation
is still lacking. For example, in the swiss Alps, almost ever village in
ever vallley has its on language variety which are often mutually hardly
comprehesible, but they all together have only one language code, "gsw",
wich also covers a large part of Germanies South West and South Eastern
France and their local language varieties. You can easily look up from
a map that there are hundreds of cities, towns, villages, valleys, and
even if only a thenth of them had a language of their own, "gsw" actually
represts more than 1000 distinct languages. Considerig both spelling AND
pronunciation, the deserve to be  differenciated.

This is not meant do discourage you, or to say it was not manageable.
You only need to be aware, that taking care of the few languages currently
listed in ethnologue will not suffice, and coding them must be expected
to be a bit more complex, than it appears at first sight.

> In terms of British English (en-gb) and English (en) distinction,
> why not just code English in Wikidata as "ISO 639-3eng" per 
> http://www.ethnologue.com/language/eng[http://www.ethnologue.com/language/eng]
> as part of a careful design for all languages, and then build
> out for smaller languages? (CC wiki WUaS is planning wiki schools
> in all 7,106 languages, plus dead and invented languages).

While the current 7106 is way too low, it does include some "Macrolanguages"
(i.e. language groups) and many extinct and some invented languages.

> It seems that using or keying in on the ISO system, or a similar
> one, would allow for remarkable extensibility and careful design
> of Wikidata, as well as fallback for other languages such as Hindi,
> Odia or Malayalam. 

Yes indeed, only blindly following a body like SIL (editor of ISO 639-3
and Etnologue, btw. a fundamental christian missionary organization) with
their rather slow process of adding languages (taking years) might be
limiting our capacities and speed. I suggest that we evaluate our own
needs first, then determine how to meet them best, and then cooperate with
others.

Purodha

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to