Subramanya Sastry has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/356971 )

Change subject: WIP: Handle url-encoded hrefs properly while detecting 
interwiki shortcuts
......................................................................

WIP: Handle url-encoded hrefs properly while detecting interwiki shortcuts

Quick friday evening patch. Sanity to be verified and whether
it is okay to url-decode everything instead of just pieces[0].

Change-Id: I2cb64905d890d92c2b06bafa3da7d45a6b4a870d
---
M lib/html2wt/LinkHandler.js
M tests/parserTests.txt
2 files changed, 29 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/services/parsoid 
refs/changes/71/356971/1

diff --git a/lib/html2wt/LinkHandler.js b/lib/html2wt/LinkHandler.js
index b2db533..9a8a620 100644
--- a/lib/html2wt/LinkHandler.js
+++ b/lib/html2wt/LinkHandler.js
@@ -182,8 +182,29 @@
                        return rtData;
                }
 
+               var processHref = function(s, sep) {
+                       var pieces = s.split(sep);
+                       if (pieces.length === 1) {
+                               return s;
+                       }
+
+                       // Decode url-encoded entities in the first piece so we
+                       // can more accurate detect interwiki prefixes.
+                       return Util.decodeURI(pieces[0]) + sep + 
pieces.slice(1).join(sep);
+               }
+
+               var iwMatchStr;
+               if (/^\.\//.test(href)) {
+                       // With a leading './', 'foo:' is an interwiki prefix
+                       iwMatchStr = processHref(href, ':');
+               } else {
+                       // Without a leading './', we treat 'foo:' prefix as a 
url protocol.
+                       // But, 'foo%3A' is a valid interwiki prefix.
+                       iwMatchStr = processHref(href, '%3A');
+               }
+
                // Check if the href matches any of our interwiki URL patterns
-               var interWikiMatch = wiki.interWikiMatcher().match(href);
+               var interWikiMatch = wiki.interWikiMatcher().match(iwMatchStr);
                if (interWikiMatch
                                // Remaining target
                                // 1) is not just a fragment id (#foo), and
diff --git a/tests/parserTests.txt b/tests/parserTests.txt
index df98feb..34e69ca 100644
--- a/tests/parserTests.txt
+++ b/tests/parserTests.txt
@@ -25922,6 +25922,9 @@
 <p><a rel='mw:WikiLink' href='fr%3AFoo'>Foo</a>
 <a rel='mw:ExtLink' href='fr%3AFoo'>Foo</a>
 <a href='fr%3AFoo'>Foo</a></p>
+<p><a rel='mw:WikiLink' href='%66%72%3AFoo'>Foo</a>
+<a rel='mw:ExtLink' href='%66%72%3AFoo'>Foo</a>
+<a href='%66%72%3AFoo'>Foo</a></p>
 !! wikitext
 [[:fr:Foo|Foo]]
 [[:fr:Foo|Foo]]
@@ -25930,6 +25933,10 @@
 [[:fr:Foo|Foo]]
 [[:fr:Foo|Foo]]
 [[:fr:Foo|Foo]]
+
+[[:fr:Foo|Foo]]
+[[:fr:Foo|Foo]]
+[[:fr:Foo|Foo]]
 !! end
 
 !! test

-- 
To view, visit https://gerrit.wikimedia.org/r/356971
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I2cb64905d890d92c2b06bafa3da7d45a6b4a870d
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/services/parsoid
Gerrit-Branch: master
Gerrit-Owner: Subramanya Sastry <ssas...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to