Nikerabbit added a comment.

Sorry for going off topic, feel free to skip:

Of course it's not only for names of languages. Names of languages is just a first step. The same format can be used for any word, including names units.

Actually, no. It's not currently usable for language names outside your existing efforts. It is barely sufficient for {{SITENAME}} inflection, as is highlighted by the fact that we allow site admins to easily override the inflections when they are wrong.

This will not cover most of world languages in any adequate quality anytime soon. Researchers are spending years to build morphological engines which are far from perfect. Regular expressions are not a tool that allows creating such complex systems (but finite state methods in general are) in a maintainable way.

It is good that we are moving our existing grammatical rules out of PHP code, but I think we are currently at a sweet spot between complexity of the system and the benefit it provides. Extending its usage further will make it increasingly difficult to use (languages are not equal here) until we start interfacing with purpose build morphological tools hiding the complexity in a more maintainable way.


CLDR is a good source of localisation data and many projects will benefit when CLDR data is used, and more importantly, improved.


TASK DETAIL
https://phabricator.wikimedia.org/T86528

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikerabbit
Cc: Nikerabbit, Scott_WUaS, Amire80, Nemo_bis, hoo, mxn, Snipre, Ricordisamoa, Lydia_Pintscher, thiemowmde, Tobi_WMDE_SW, JeroenDeDauw, JanZerebecki, adrianheine, aude, Snaterlicious, Aklapper, Stryn, daniel, Smalyshev, D3r1ck01, Izno, Luke081515, Wikidata-bugs, fbstj, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to