[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-06-01 Thread daniel
daniel added a comment.
@Nemo_bis whether or not they need to change something depends on their local conventions. We have no way to know and understand such local conventions for all Wiktionaries.

We did our best to make this non-disruptive, but there are always edge-cases. The best we can do is inform people of what we intend to do, and listen to their feedback. Quite often, people only notice issues once a feature has gone live.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Nemo_bis, Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-06-01 Thread Nemo_bis
Nemo_bis added a comment.

In T165061#3307061, @Lea_Lacroix_WMDE wrote:
Hello all, just to mention that I created a discussion topic here, so we can find a consensus whithin the different communities.


If some wikis need to change something, they should be notified individually.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Nemo_bisCc: Nemo_bis, Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-06-01 Thread Lea_Lacroix_WMDE
Lea_Lacroix_WMDE added a comment.
Hello all, just to mention that I created a discussion topic here, so we can find a consensus whithin the different communities. Feel free to summarize your point of view and your concerns there. Thanks a lot!TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lea_Lacroix_WMDECc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Nemo_bis, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-31 Thread daniel
daniel added a comment.

In T165061#3281823, @Thibaut120094 wrote:
Showing two links for the same language doesn't make any sense for the reader (see capture).


True. Cognate could detect this situation, and only show the link that exactly matches the local page's title.

However, this would hide a potential error. Maybe it's better to have this visible, so people can notice and fix it?TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Nemo_bis, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-22 Thread Darkdadaah
Darkdadaah added a comment.

In T165061#3281823, @Thibaut120094 wrote:
Btw, [[’]] should only show interwiki links to [[’]], not links to [[’]] and [[']] like on https://fr.wiktionary.org/w/index.php?title=%E2%80%99=23047369
 [...]
 Showing two links for the same language doesn't make any sense for the reader (see capture).


I added manual interwiki links in the linked page: as a result all language links are unique (just like for [[...]]).TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: DarkdadaahCc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-21 Thread Thibaut120094
Thibaut120094 added a comment.
Btw, [[’]] should only show interwiki links to [[’]], not links to [[’]] and [[']] like on https://fr.wiktionary.org/w/index.php?title=%E2%80%99=23047369

Same for [[...]], [[…]] [[']]

Having two links for the same language doesn't make any sense for the reader.

F8145905: image.pngTASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Thibaut120094Cc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-21 Thread Addshore
Addshore added a comment.
The data for enwiki now looks as below:

+--+--+
| cgti_raw | cgti_raw |
+--+--+
| ...  | …|
| '_'  | ’_’  |
| ’| '|
+--+--+
3 rows in set (35.95 sec)

Is this something that would be useful to have for all wikis?TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-20 Thread -sche
-sche added a comment.
I've almost finished standardizing the Maya entries I mentioned, consolidating ~4 pairs of entries prior to your post. :-) Now I just consolidated 3 of the pairs you mention. The other 6 entries, for the individual characters, will probably be kept separate.

I think no Latin-script entries on en.Wikt are supposed to use ’ except the entry ’ itself, so new pairs should not arise in Latin script. En.Wikt's Macedonian entries are currently standardized on ’ (though this could be changed) while Russian entries use ', so there is the potential for a few conflicting pairs to arise as our coverage of Macedonian and Russian increases, but the number of such pairs should always be small. Mg.Wikt and fr.Wikt may also have a few pairs of conflicting entries which you might alert them to fix.

Perhaps there should be a discussion/poll on Meta, advertised on all Wiktionaries, about whether or not to enable this feature, to ensure all wikis have their say?

Will the normalization/linking function keep track (accessibly) of cases where it encounters too many pages, e.g. encounters both curly ’ and straight ' on one wiki, so that wikis can know which pages they need to maintain manual interwiki links for?TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: -scheCc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-20 Thread daniel
daniel added a comment.
The following database query shows all pairs of page names on English wiktionary that would be conflicting based on these normalization rules:

mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw
-> FROM cognate_titles as a
->   JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key
->   JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key 
-> and p.cgpa_namespace = 0 and p.cgpa_site = 8711873510529828948
->   JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key 
-> and q.cgpa_namespace = 0 and q.cgpa_site = 8711873510529828948
->   WHERE a.cgti_raw_key < b.cgti_raw_key 
-> LIMIT 10;
+---+-+
| cgti_raw  | cgti_raw|
+---+-+
| дев'ятнадцять | дев’ятнадцять   |
| ...   | …   |
| '_'   | ’_’ |
| ’ | '   |
| lu’um | lu'um   |
| ni'   | ni’ |
+---+-+
6 rows in set (39.45 sec)

I suppose it would be ok to manage the language links for 12 pages manually.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Udo_T, -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-19 Thread daniel
daniel added a comment.
Here is the current normalization map from Cognate\StringNormalizer:

	private $replacements = [
		'’' => '\'',
		'…' => '...',
		' ' => '_',
	];

This maps:


right-single-quotation-mark (codepoint 02019) to the ascii apostrophy
horizontal-ellipsis (codepoint 02026) to three dots
spaces to underscores, like MediaWiki always does.


According to our analysis of existing language links, these normalization rules seem to cover nearly all cases in which the link is between pages that don't have exactly the same title. The remaining handful of pages can be  linked manually.

However, the point is now raised whether these rules will lead to too many language links to be inferred. This would happen if there are two (non-redirect) pages on the same wiki that would have the same title after applying these rules.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-18 Thread Wikitiki89
Wikitiki89 added a comment.

In T165061#3263807, @Darkdadaah wrote:
@Wikitiki89 Do you have examples of a Foo... <-> Foo… article (or with different apostrophes)? The only articles I can think of would be those about the character themselves.


Is there a list somewhere of the characters that would be normalized? That would help me find examples and assess their frequency.

If this is a problem for a relatively large number of articles, then we need to discuss it further. If not, then we can just override the interwikis manually in the handful of articles involved.

NB: fr did this normalization by bot.

This certainly needs to be discussed. It has been a longstanding policy on the English Wiktionary specifically not to allow such normalization and we do not appreciate having a feature like this forced down our throats without discussion.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Wikitiki89Cc: Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-15 Thread Darkdadaah
Darkdadaah added a comment.
@Wikitiki89 Do you have examples of a Foo... <-> Foo… article (or with different apostrophes)? The only articles I can think of would be those about the character themselves.

If this is a problem for a relatively large number of articles, then we need to discuss it further. If not, then we can just override the interwikis manually in the handful of articles involved.

NB: fr did this normalization by bot.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: DarkdadaahCc: Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-15 Thread Lea_Lacroix_WMDE
Lea_Lacroix_WMDE added a comment.
Thanks for your feedbacks. I'd like to start a discussion over the 3 main Wiktionaries (en, fr and de) to solve these questions about redirects. I'm sure we can find a common solution that will allow Cognate to be efficient while fitting the rules and processes of the Wiktionaries.
As the Wikimedia hackathon and Wikicite are happening this week and I'll be quite busy during this period, I'll start this discussion on June 1rst. In the meantime, no changes (but bug fixes) will be operated.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lea_Lacroix_WMDECc: Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-12 Thread daniel
daniel added a comment.
There is one thing to be careful about here: The combination of redirects and normalization.

As far as I know, it's quite frequent to have redirects to the normalized version of a title. For instance, if wiki 2 follows the convention of using the ellipsis character ("…") in titles instead of three dots ("..."), they may have a redirect from "Foo..." (with three dots) to "Foo…" with an ellipsis.

Cognate will also recognize these two titles as equivalent (redirect or no) because of the normalization rules. So, if wiki 1 has a page called "Foo..." (with dots), Cognate will add language links to both, the actual page on wiki 2 ("Foo…" with an ellipsis) as well as the redirect on wiki 2 ("Foo..." with dots). That's the consequence of Cognate applying normalization and at the same time treating redirects like normal pages.

Ideally, there would be a rule like "if you find an actual page to link to, ignore all the redirects to that page". But I currently do not see a way to do this efficiently, without asking each client database for redirect information. Cognate would have to track redirects in its own central database table - possible, but not trivial. And database changes need time.

I seem to recall that this issue was the original reason for ignoring redirects.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T165061: Fixing redirects for Cognate (step 1)

2017-05-11 Thread Wikitiki89
Wikitiki89 added a comment.
Just to clarify, we don't want "color" in Wiki 1 to link directly to "colow" in Wiki 2, but rather it should link to "color" in Wiki 2 and then the redirect should take place as usual.TASK DETAILhttps://phabricator.wikimedia.org/T165061EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Wikitiki89Cc: Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs