[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-08-31 Thread Addshore
Addshore added a comment.
@daniel is there something further we want to action here as a result of the investigation?TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Ladsgroup, PokestarFan, Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, Cinemantique, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Shizhao, Nemo_bis, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-21 Thread daniel
daniel added a comment.
ruwiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = -235854953179375905 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key limit 30;
+-+---+
| cgti_raw| cgti_raw  |
+-+---+
| misk'i  | misk’i|
| sil'm   | sil’m |
| arc’hant| arc'hant  |
| маловір’я   | маловір'я |
| erc’h   | erc'h |
| мар’| мар'  |
| п’ятниця| п'ятниця  |
| saba’   | saba' |
| хэм'| хэм’  |
+-+---+
9 rows in set (35.87 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-21 Thread daniel
daniel added a comment.
zhwiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 396207730596080646  JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key limit 30;
++--+
| cgti_raw   | cgti_raw |
++--+
| D'Arsonval_galvanometer| D’Arsonval_galvanometer  |
| earth's_magnetic_field | earth’s_magnetic_field   |
| subscriber's_extension_station | subscriber’s_extension_station   |
| Maxwell’s_equation | Maxwell's_equation   |
| 9’s_complement | 9's_complement   |
| driller’s_log  | driller's_log|
| Joule’s_law| Joule's_law  |
| Fick’s_equation| Fick's_equation  |
| Babbage’s_analytical_engine| Babbage's_analytical_engine  |
| Maxwell's_law  | Maxwell’s_law|
| Coulomb’s_law  | Coulomb's_law|
| Cramer’s_rule  | Cramer's_rule|
| Joule's_equivalent | Joule’s_equivalent   |
| Loschmidt's_numeral| Loschmidt’s_numeral  |
| Avogadro's_number  | Avogadro’s_number|
| Ruhmkorff’s_coil   | Ruhmkorff's_coil |
| Duddell's_thermo-galvanometer  | Duddell’s_thermo-galvanometer|
| McMillan's_inequality  | McMillan’s_inequality|
| Lenz's_law | Lenz’s_law   |
| Kirchhoff’s_law| Kirchhoff's_law  |
| Ampere's_law   | Ampere’s_law |
| Ohm’s_law  | Ohm's_law|
| Weber’s_theory_of_magnetism| Weber's_theory_of_magnetism  |
| 10's_complement| 10’s_complement  |
| 1’s_complement | 1's_complement   |
| Kelvin’s_law   | Kelvin's_law |
| Steinmetz's_law| Steinmetz’s_law  |
| 2's_complement | 2’s_complement   |
++--+
28 rows in set (37.23 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-21 Thread daniel
daniel added a comment.
eswiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 2916682937954058841  JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key;
+--+--+
| cgti_raw | cgti_raw |
+--+--+
| ...  | …|
| ik’  | ik'  |
+--+--+
2 rows in set (37.16 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-21 Thread daniel
daniel added a comment.
count for shwiktionmary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 4903199207837476164  JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key;
+--+
| count(*) |
+--+
|0 |
+--+
1 row in set (45.44 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-21 Thread daniel
daniel added a comment.
count for mgwiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 8120841685256385134  JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key;
+--+
| count(*) |
+--+
|  146 |
+--+
1 row in set (43.34 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-20 Thread daniel
daniel added a comment.
viwiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 4760335324028501060  JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = 4760335324028501060 WHERE a.cgti_raw_key < b.cgti_raw_key;
+--+
| count(*) |
+--+
|0 |
+--+
1 row in set (43.01 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-20 Thread daniel
daniel added a comment.
For frwiktionary:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = 2097444195020099748   JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = 2097444195020099748 WHERE a.cgti_raw_key < b.cgti_raw_key;
+---+-+
| cgti_raw  | cgti_raw|
+---+-+
| Palazzolo_sull’Oglio  | Palazzolo_sull'Oglio|
| Urago_d'Oglio | Urago_d’Oglio   |
| ...   | …   |
| Vezza_d’Oglio | Vezza_d'Oglio   |
| sms’en| sms'en  |
| 'e| ’e  |
| 'o| ’o  |
| Monteleone_d'Orvieto  | Monteleone_d’Orvieto|
| ’ | '   |
| Quinzano_d'Oglio  | Quinzano_d’Oglio|
| o’| o'  |
| ’a| 'a  |
| Robecco_d'Oglio   | Robecco_d’Oglio |
| Scandolara_Ripa_d’Oglio   | Scandolara_Ripa_d'Oglio |
+---+-+
14 rows in set (37.42 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-20 Thread daniel
daniel added a comment.
I checked for dewiktionary, and found 45 clashes. All the ones I checked are duplicates and should be fixed. Most seem to be Korean character transcription

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key  and p.cgpa_namespace = 0 and p.cgpa_site = -3742436511788647340   JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key  and q.cgpa_namespace = 0 and q.cgpa_site = -3742436511788647340 WHERE a.cgti_raw_key < b.cgti_raw_key
-> ;
++---+
| cgti_raw   | cgti_raw  |
++---+
| p’ya   | p'ya  |
| ch'a   | ch’a  |
| ch’ŏ   | ch'ŏ  |
| ch’o   | ch'o  |
| p'ae   | p’ae  |
| yujach'a   | yujach’a  |
| p'u| p’u   |
| ch'e   | ch’e  |
| p’yo   | p'yo  |
| p'e| p’e   |
| ch'ŏl  | ch’ŏl |
| ch’i   | ch'i  |
| p'urŭda| p’urŭda   |
| mach’ŏllu  | mach'ŏllu |
| t’i| t'i   |
| p'yŏ   | p’yŏ  |
| t’a| t'a   |
| t'yu   | t’yu  |
| ch’ŏngdong | ch'ŏngdong|
| p'yu   | p’yu  |
| ch’ae  | ch'ae |
| ch'ŏng | ch’ŏng|
| p’o| p'o   |
| ch’ŏldo| ch'ŏldo   |
| p'wi   | p’wi  |
| ch’ŏlto| ch'ŏlto   |
| p’ŏ| p'ŏ   |
| t'u| t’u   |
| Saint_John’s   | Saint_John's  |
| p'oe   | p’oe  |
| t'o| t’o   |
| p’a| p'a   |
| ellibeit'ŏ | ellibeit’ŏ|
| t'anso | t’anso|
| t'wi   | t’wi  |
| t'ae   | t’ae  |
| t'ŏ| t’ŏ   |
| p'ŭ| p’ŭ   |
| p’i| p'i   |
| ch’u   | ch'u  |
| t'ŭ| t’ŭ   |
| kimch'i| kimch’i   |
| t’oe   | t'oe  |
| ch'ŭ   | ch’ŭ  |
| t'e| t’e   |
++---+
45 rows in set (36.65 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes

2017-05-20 Thread daniel
daniel added a comment.
Checked for enwiktionary, found 6 pairs of pages:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw
-> FROM cognate_titles as a
->   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key
->   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key 
-> and p.cgpa_namespace = 0 and p.cgpa_site = 8711873510529828948
->   JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key 
-> and q.cgpa_namespace = 0 and q.cgpa_site = 8711873510529828948
->   WHERE a.cgti_raw_key < b.cgti_raw_key 
-> LIMIT 10;
+---+-+
| cgti_raw  | cgti_raw|
+---+-+
| дев'ятнадцять | дев’ятнадцять   |
| ...   | …   |
| '_'   | ’_’ |
| ’ | '   |
| lu’um | lu'um   |
| ni'   | ni’ |
+---+-+
6 rows in set (39.45 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs