[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

C933103 Mon, 27 Jun 2022 23:28:51 -0700

C933103 added a comment.


  In T236593#5610378 <https://phabricator.wikimedia.org/T236593#5610378>, 
@daniel wrote:
  
  > I recall that we had long discussions about this when initially deciding on 
the data model. In technical terms, the question was whether we would allow 
only a single literal value for a spelling variant, or a list or set of words. 
Allowing a list or set would enable the kind of flexibility @jhsoby is asking 
for. But the down side is that it introduces ambiguity when listing forms (you 
would always have to list all of them, in undefined order), and when generating 
text (which one should you use)?
  >
  > If I recall correctly, we decided that we want to give the consumer of the 
data maximum control over which variant they prefer, by forcing the producer to 
provide different variant codes for all different spellings. We had discussions 
about how to encode this in the variant (language) codes, and how to represent 
it in the UI, but decided to leave that for later.
  >
  > So, the solution that we envisioned when originally discussing this about 
four years ago was: you make up a code for each of the spellings, in a way that 
allows the consumer to choose which variant they prefer. If that is done by 
encoding a region or a rhyme or a tradition or school or whatever will depend 
on the language. If it's a stylistic choice, name the style.
  >
  > The same approach can be used for historical spellings. codes could look 
something like de-x-hist-nd-15jh or something (this code is totally made up and 
probably linguistically nonsense).
  
  The underlying assumption behind this decision is that, different spelling 
forms must be associated with certain variant, or that there are some of the 
spelling being preferred over other spellings, or that some spelling is more 
commonly used for some spoken variant/sociolet/etc than others and is other 
spelling.
  
  None of these are correct assumption, when it come to non-Chinese languages 
that use Chinese characters, or even some Chinese languages that need to apply 
Chinese characters.
  
  Example of Vietnamese chu nom have already been presented above. Other 
examples includes Japanese ateji when Kanji are used for Japanese native words 
except cases where there have been full established transliteration, and its 
Korean equivalent in history, as well as in languages like Cantonese when 
non-Mandarin words need to be expressed in Chinese characters.

TASK DETAIL
  https://phabricator.wikimedia.org/T236593

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: C933103
Cc: C933103, AGutman-WMF, mxn, So9q, Ijon, daniel, Asaf, Mahir256, Danmichaelo, 
Fnielsen, Lucas_Werkmeister_WMDE, Denny, Lydia_Pintscher, jeblad, jhsoby, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Bodhisattwa, Scott_WUaS, Wikidata-bugs, aude, Mbch331

_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

[Wikidata-bugs] [Maniphest] T236593: Cannot enter multiple forms for the same language variant

Reply via email to