daniel added a comment.

Here is the current normalization map from Cognate\StringNormalizer:

	private $replacements = [
		'’' => '\'',
		'…' => '...',
		' ' => '_',
	];

This maps:

  • right-single-quotation-mark (codepoint 02019) to the ascii apostrophy
  • horizontal-ellipsis (codepoint 02026) to three dots
  • spaces to underscores, like MediaWiki always does.

According to our analysis of existing language links, these normalization rules seem to cover nearly all cases in which the link is between pages that don't have exactly the same title. The remaining handful of pages can be linked manually.

However, the point is now raised whether these rules will lead to too many language links to be inferred. This would happen if there are two (non-redirect) pages on the same wiki that would have the same title after applying these rules.


TASK DETAIL
https://phabricator.wikimedia.org/T165061

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: daniel
Cc: -sche, Aklapper, Thibaut120094, Lea_Lacroix_WMDE, Addshore, Wikitiki89, daniel, Darkdadaah, WMDE-leszek, Octahedron80, Lydia_Pintscher, GoranSMilovanovic, QZanden, Izno, Wikidata-bugs, aude, GPHemsley, Mbch331, Krenair
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to