LucasWerkmeister added a comment.

  In T236593#8093121 <https://phabricator.wikimedia.org/T236593#8093121>, 
@C933103 wrote:
  
  > As an English example, some religious people might refuse to write the name 
"God" out directly as it is as this would constitute idolatry. For this we can 
tag it as en-x-Qnnnn for which Qnnnn refer to religious group of people, but 
there are more than one alternative way to write "God". They can either write 
"G-d", "G*d", "G_d", "G-o-d", and so on. It would make no contextual 
differences in whether a hyphen or a underscore is being used, and the change 
in which exact symbol being used in place of original alphabet wouldn't affect 
pronunciation or religious connection. Hence all of these alternatives should 
be tagged en-x-Qnnnn, and with the patch it would be possible to have 
"en-x-Qnnnn-1" being "G-d" while "en-x-Qnnnn-2" being "G*d". I can't see how 
more specific labels can be useful in differentiating "G-d" and "G*d"
  
  I don’t follow this example. If you think all of these potential forms are 
significant, and all of them should be tracked in Wikidata, then why do you 
want to combine them all under a single item ID where nobody can tell them 
apart? To me it makes more sense (assuming this data is notable at all) to have 
separate items like “bowdlerized using hyphens”, “bowdlerized using asterisks”, 
etc., which can be subclasses of a more general “avoiding idolatry” item, have 
other statements indicating which character is being used, and so on. 
(“Bowdlerized” definitely isn’t the right word here, but I don’t know what the 
right word is, sorry.)
  
  In T236593#8097326 <https://phabricator.wikimedia.org/T236593#8097326>, 
@AGutman-WMF wrote:
  
  > @LucasWerkmeister I agree with you that if two variants have two different 
pronunciation, they should probably be split into two different lexemes (in 
general, I think we should avoid having multiple forms with the same 
grammatical features within one lexeme). There is some leeway, however, in this 
rule, since different dialects may have slightly different pronunciations which 
we still want to group into a single lexeme/form. For instance American English 
"color" and British English "colour" are in fact pronounced slightly 
differently, but it would be over-kill to split them, since the difference in 
pronunciation is systematic between the dialects.
  
  That’s fair, and I actually almost wrote “if //the same// speaker would 
pronounce them…” in my comment :) I’m not sure how exactly to phrase the rule, 
but mainly I’m glad to have found some rule at all (which I’m not sure I really 
understood, at least consciously, back in 2019 when I was apparently sitting 
next to @jhsoby).

TASK DETAIL
  https://phabricator.wikimedia.org/T236593

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: LucasWerkmeister
Cc: LucasWerkmeister, C933103, AGutman-WMF, mxn, So9q, Ijon, daniel, Asaf, 
Mahir256, Danmichaelo, Fnielsen, Lucas_Werkmeister_WMDE, Denny, 
Lydia_Pintscher, jeblad, jhsoby, Astuthiodit_1, karapayneWMDE, Invadibot, 
maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, 
QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, 
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to