mrephabricator added a comment.

  This should not be done. ک in Urdu is ڪ in Sindhi, but Sindhi still has ک but 
uses it for a different sound. It is exceptional in this regard, so it would 
not be surprising for the "mul" label to be read as using ک to represent what 
it does more commonly. This would mean that a label in Sindhi could be 
identical to an Urdu one while representing a word that is meant to be 
pronounced distinctly from the Urdu one. This likely extends to most scripts.
  
  "W" and "v" are homophonous sounds to many users of Latin scripts. For 
example with Latin script, if we look at this item: 
https://www.wikidata.org/wiki/Q113450202
  I have labeled this in English as "Waddi Punjabi Lughat" as this is how many 
South Asian English speakers and users of Latin script would be inclined to 
spell it. However, Vaddi Punjabi Lughat is the label I have used for Canadian, 
American, and British English because to speakers of these English dialects, 
the sound they would associate with "V" would be a closer match to the correct 
pronunciation. If I were to duplicate the label across dialects, this would be 
indicating the useful information that the "W" would be understood as a typical 
spelling in all of them, meaning that it would be reasonable for an American to 
pronounce "Waddi" like "water" even if this is not the "original" 
pronunciation. That makes duplicating the label an indicator of useful 
information which would not be clear otherwise.
  
  I think it is quite likely that people will use homoglyph letters as 
substitutes to get around this, or even unintentionally. For example, ڻ and ٹ 
are different letters which are associated with different sounds. However, they 
look identical in middle and initial positions. So if we have ڻڻڻ and ٹٹٹ, you 
would have a hard time telling what the first two letters are. There are lots 
of things we can fudge like this in various scripts and have it go unnoticed. 
Hawaii in the native language Hawaiian, which uses the Latin script, is spelled 
Hawaiʻi. If we write this as Hawai'i, using an apostrophe rather than the 
ʻokina character used for Polynesian languages in Latin script, we have now 
"duplicated" the string without using the same characters. Many would do this 
entirely unintentionally not knowing ʻokina is a different character, and then 
if someone wanted to correct the character in the termbox it is in, it would 
give an error.

TASK DETAIL
  https://phabricator.wikimedia.org/T306918

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: mrephabricator
Cc: mrephabricator, Lucas_Werkmeister_WMDE, Lydia_Pintscher, Nikki, Mahir256, 
Manuel, Aklapper, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, 
ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to