Nikerabbit added a comment. |
Sorry for going off topic, feel free to skip:
In T86528#2504269, @Amire80 wrote:Of course it's not only for names of languages. Names of languages is just a first step. The same format can be used for any word, including names units.
Actually, no. It's not currently usable for language names outside your existing efforts. It is barely sufficient for {{SITENAME}} inflection, as is highlighted by the fact that we allow site admins to easily override the inflections when they are wrong.
This will not cover most of world languages in any adequate quality anytime soon. Researchers are spending years to build morphological engines which are far from perfect. Regular expressions are not a tool that allows creating such complex systems (but finite state methods in general are) in a maintainable way.
It is good that we are moving our existing grammatical rules out of PHP code, but I think we are currently at a sweet spot between complexity of the system and the benefit it provides. Extending its usage further will make it increasingly difficult to use (languages are not equal here) until we start interfacing with purpose build morphological tools hiding the complexity in a more maintainable way.
CLDR is a good source of localisation data and many projects will benefit when CLDR data is used, and more importantly, improved.
Cc: Nikerabbit, Scott_WUaS, Amire80, Nemo_bis, hoo, mxn, Snipre, Ricordisamoa, Lydia_Pintscher, thiemowmde, Tobi_WMDE_SW, JeroenDeDauw, JanZerebecki, adrianheine, aude, Snaterlicious, Aklapper, Stryn, daniel, Smalyshev, D3r1ck01, Izno, Luke081515, Wikidata-bugs, fbstj, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs