TJones added a comment. |
Note that you don’t need to change your interface to Bengali to see these effects, and the fact that it is the Bengali keyword for “category” doesn’t seem to matter either. You can search for single characters and get the described behavior.
(Be sure to clear the search box between examples—otherwise you the old results, as @Mahir256 noted above.)
For Bengali and Devanagari characters, the precomposed versions hang, and the decomposed versions et suggestions:
Bengali | য় | U+09DF | precomposed | hangs |
Bengali | য় | U+09AF U+09BC | decomposed | works |
Devanagari | ग़ | U+095A | precomposed | hangs |
Devanagari | ग़ | U+0917 U+093C | decomposed | works |
Gurmukhi | ਗ਼ | U+0A5A | precomposed | hangs |
Gurmukhi | ਗ਼ | U+0A17 U+0A3C | decomposed | works |
Oddly, the opposite behavior happens for Latin, Cyrillic, and Greek characters—the precomposed versions work and the decomposed versions hang:
Latin | ñ | U+00F1 | precomposed | works |
Latin | ñ | U+006E U+303 | decomposed | hangs |
Latin | é | U+00E9 | precomposed | works |
Latin | é | U+0065 U+0301 | decomposed | hangs |
Latin | ở | U+1EDF | precomposed | works |
Latin | ở | U+01A1 U+0309 | decomposed | hangs |
Cyrillic | Ѓ | U+0403 | precomposed | works |
Cyrillic | Ѓ | U+0413 U+0301 | decomposed | hangs |
Cyrillic | Ѐ | U+0400 | precomposed | works |
Cyrillic | Ѐ | U+0415 U+0300 | decomposed | hangs |
Cyrillic | Ѝ | U+040D | precomposed | works |
Cyrillic | Ѝ | U+0418 U+0300 | decomposed | hangs |
Greek | ἆ | U+1F06 | precomposed | works |
Greek | ἆ | U+1F00 U+0342 | decomposed | hangs |
However, when there is no precomposed alternative, the decomposed version works fine (depending on your fonts, the mixed script versions may or may not look right):
Latin | q́ | U+0071 U+0301 | decomposed | works |
Latin | q̀ | U+0071 U+0300 | decomposed | works |
Latin | q̃ | U+0071 U+0303 | decomposed | works |
Latin | q̉ | U+0071 U+0309 | decomposed | works |
Latin | q͂ | U+0071 U+0342 | decomposed | works |
Latin + Bengali | q় | U+0071 U+09BC | decomposed | works |
Latin + Devanagari | q़ | U+0071 U+093C | decomposed | works |
Latin + Gurmukhi | q਼ | U+0071 U+0A3C | decomposed | works |
So, I’m really not sure what’s going on here, but it looks like it is more than just Indic languages that have the problem, and there seems to be an “expected” form which works, and an “unexpected” form that doesn’t—and the (pre|de)composition difference can break in either direction for a given script.
Cc: debt, Liuxinyu970226, TJones, PokestarFan, daniel, thiemowmde, Aftabuzzaman, Mahir256, Aklapper, Lahi, GoranSMilovanovic, QZanden, EBjune, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs