Title: [208902] trunk
Revision
208902
Author
achristen...@apple.com
Date
2016-11-18 14:47:24 -0800 (Fri, 18 Nov 2016)

Log Message

Support IDN2008 with UTS #46 instead of IDN2003
https://bugs.webkit.org/show_bug.cgi?id=144194

Reviewed by Darin Adler.

Source/WebCore:

Use uidna_nameToASCII instead of the deprecated uidna_IDNToASCII.
It uses IDN2008 instead of IDN2003, and it uses UTF #46 when used with a UIDNA opened with uidna_openUTS46.
This follows https://url.spec.whatwg.org/#concept-domain-to-ascii except we do not use Transitional_Processing
to prevent homograph attacks on german domain names with "ß" and "ss" in them.  These are now treated as separate domains.
Firefox also doesn't use Transitional_Processing. Chrome and the current specification use Transitional_processing,
but https://github.com/whatwg/url/issues/110 might change the spec.
        
In addition, http://unicode.org/reports/tr46/ says:
"implementations are encouraged to apply the Bidi and ContextJ validity criteria"
Bidi checks prevent domain names with bidirectional text, such as latin and hebrew characters in the same domain.  Chrome and Firefox do this.

ContextJ checks prevent code points such as U+200D, which is a zero-width joiner which users would not see when looking at the domain name.
Firefox currently enables ContextJ checks and it is suggested by UTS #46, so we'll do it.

ContextO checks, which we do not use and neither does any other browser nor the spec, would fail if a domain contains code points such as U+30FB,
which looks somewhat like a dot.  We can investigate enabling these checks later.

Covered by new API tests and rebased LayoutTests.
The new API tests verify that we do not use transitional processing, that we do apply the Bidi and ContextJ checks, but not ContextO checks.

* platform/URLParser.cpp:
(WebCore::URLParser::domainToASCII):
(WebCore::URLParser::internationalDomainNameTranscoder):
* platform/URLParser.h:
* platform/mac/WebCoreNSURLExtras.mm:
(WebCore::mapHostNameWithRange):

Tools:

* TestWebKitAPI/Tests/WebCore/URLParser.cpp:
(TestWebKitAPI::TEST_F):
Add some tests from http://unicode.org/faq/idn.html verifying that we follow UTS46's deviations from IDN2008.
Add some tests based on https://tools.ietf.org/html/rfc5893 verifying that we check for bidirectional text.
Add a test based on https://tools.ietf.org/html/rfc5892 verifying that we do not do ContextO check.
Add a test for U+321D and U+321E which have particularly interesting punycode encodings.  We match Firefox here now.
Also add a test from http://www.unicode.org/reports/tr46/#IDNAComparison verifying we are not using IDN2003.
We should consider importing all of http://www.unicode.org/Public/idna/9.0.0/IdnaTest.txt as URL domain tests.

LayoutTests:

* fast/encoding/idn-security.html:
Move some characters with changed IDN encodings to inside the check for old ICU.
* fast/url/idna2003-expected.txt:
* fast/url/idna2008-expected.txt:
Update expected results.  We are now more compliant with IDN2008.

Modified Paths

Diff

Modified: trunk/LayoutTests/ChangeLog (208901 => 208902)


--- trunk/LayoutTests/ChangeLog	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/LayoutTests/ChangeLog	2016-11-18 22:47:24 UTC (rev 208902)
@@ -1,3 +1,16 @@
+2016-11-17  Alex Christensen  <achristen...@webkit.org>
+
+        Support IDN2008 with UTS #46 instead of IDN2003
+        https://bugs.webkit.org/show_bug.cgi?id=144194
+
+        Reviewed by Darin Adler.
+
+        * fast/encoding/idn-security.html:
+        Move some characters with changed IDN encodings to inside the check for old ICU.
+        * fast/url/idna2003-expected.txt:
+        * fast/url/idna2008-expected.txt:
+        Update expected results.  We are now more compliant with IDN2008.
+
 2016-11-18  Ryan Haddad  <ryanhad...@apple.com>
 
         Marking two js/dom/domjit-function-get-element-by-id-* tests as flaky.

Modified: trunk/LayoutTests/fast/encoding/idn-security-expected.txt (208901 => 208902)


--- trunk/LayoutTests/fast/encoding/idn-security-expected.txt	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/LayoutTests/fast/encoding/idn-security-expected.txt	2016-11-18 22:47:24 UTC (rev 208902)
@@ -34,10 +34,6 @@
 PASS testIDNRoundTripNotFirstCharacter(0xe01) is '%u0E01'
 PASS testIDNRoundTrip(0xa000) is '%uA000'
 PASS testIDNRoundTripNotFirstCharacter(0xa000) is '%uA000'
-PASS testIDNRoundTrip(0x2024) is '.'
-PASS testIDNRoundTripNotFirstCharacter(0x2024) is '.'
-PASS testIDNRoundTrip(0xfe52) is '.'
-PASS testIDNRoundTripNotFirstCharacter(0xfe52) is '.'
 PASS testIDNRoundTrip(0xff0f) is '/'
 PASS testIDNRoundTripNotFirstCharacter(0xff0f) is '/'
 PASS testIDNRoundTrip(0xfe68) is '%5C'
@@ -86,26 +82,6 @@
 PASS testIDNRoundTripNotFirstCharacter(0x251) is 'punycode'
 PASS testIDNRoundTrip(0x261) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0x261) is 'punycode'
-PASS testIDNRoundTrip(0x337) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
-PASS testIDNRoundTrip(0x337) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
-PASS testIDNRoundTrip(0x338) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
-PASS testIDNRoundTrip(0x338) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
-PASS testIDNRoundTrip(0x5b4) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x5b4) is 'punycode'
-PASS testIDNRoundTrip(0x5bc) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x5bc) is 'punycode'
-PASS testIDNRoundTrip(0x660) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x660) is 'punycode'
-PASS testIDNRoundTrip(0x6f0) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x6f0) is 'punycode'
-PASS testIDNRoundTrip(0x115f) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x115f) is 'punycode'
-PASS testIDNRoundTrip(0x1160) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x1160) is 'punycode'
 PASS testIDNRoundTrip(0x2027) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0x2027) is 'punycode'
 PASS testIDNRoundTrip(0x2039) is 'punycode'
@@ -162,12 +138,6 @@
 PASS testIDNRoundTripNotFirstCharacter(0x3033) is 'punycode'
 PASS testIDNRoundTrip(0x3035) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0x3035) is 'punycode'
-PASS testIDNRoundTrip(0x3164) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x3164) is 'punycode'
-PASS testIDNRoundTrip(0x321d) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x321d) is 'punycode'
-PASS testIDNRoundTrip(0x321e) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0x321e) is 'punycode'
 PASS testIDNRoundTrip(0x33ae) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0x33ae) is 'punycode'
 PASS testIDNRoundTrip(0x33af) is 'punycode'
@@ -176,10 +146,6 @@
 PASS testIDNRoundTripNotFirstCharacter(0x33c6) is 'punycode'
 PASS testIDNRoundTrip(0x33df) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0x33df) is 'punycode'
-PASS testIDNRoundTrip(0xfe14) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0xfe14) is 'punycode'
-PASS testIDNRoundTrip(0xfe15) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0xfe15) is 'punycode'
 PASS testIDNRoundTrip(0xfe3f) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0xfe3f) is 'punycode'
 PASS testIDNRoundTrip(0xfe5d) is 'punycode'
@@ -186,8 +152,6 @@
 PASS testIDNRoundTripNotFirstCharacter(0xfe5d) is 'punycode'
 PASS testIDNRoundTrip(0xfe5e) is 'punycode'
 PASS testIDNRoundTripNotFirstCharacter(0xfe5e) is 'punycode'
-PASS testIDNRoundTrip(0xffa0) is 'punycode'
-PASS testIDNRoundTripNotFirstCharacter(0xffa0) is 'punycode'
 PASS testIDNEncode(0x2028) is '%u2028'
 PASS testIDNEncodeNotFirstCharacter(0x2028) is '%u2028'
 PASS testIDNEncode(0x2029) is '%u2029'
@@ -244,4 +208,40 @@
 PASS testIDNRoundTripNotFirstCharacter(0xff61) is '.'
 PASS testIDNEncode(0xfeff) is '%uFEFF'
 PASS testIDNRoundTripNotFirstCharacter(0xfeff) is ''
+PASS testIDNRoundTrip(0x2024) is '%u2024'
+PASS testIDNRoundTripNotFirstCharacter(0x2024) is '%u2024'
+PASS testIDNRoundTrip(0xfe52) is '%uFE52'
+PASS testIDNRoundTripNotFirstCharacter(0xfe52) is '%uFE52'
+PASS testIDNRoundTrip(0x337) is '%u0337'
+PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
+PASS testIDNRoundTrip(0x337) is '%u0337'
+PASS testIDNRoundTripNotFirstCharacter(0x337) is 'punycode'
+PASS testIDNRoundTrip(0x338) is '%u0338'
+PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
+PASS testIDNRoundTrip(0x338) is '%u0338'
+PASS testIDNRoundTripNotFirstCharacter(0x338) is 'punycode'
+PASS testIDNRoundTrip(0x5b4) is '%u05B4'
+PASS testIDNRoundTripNotFirstCharacter(0x5b4) is 'punycode'
+PASS testIDNRoundTrip(0x5bc) is '%u05BC'
+PASS testIDNRoundTripNotFirstCharacter(0x5bc) is 'punycode'
+PASS testIDNRoundTrip(0x660) is '%u0660'
+PASS testIDNRoundTripNotFirstCharacter(0x660) is '%u0660'
+PASS testIDNRoundTrip(0x6f0) is 'punycode'
+PASS testIDNRoundTripNotFirstCharacter(0x6f0) is 'punycode'
+PASS testIDNRoundTrip(0x115f) is '%u115F'
+PASS testIDNRoundTripNotFirstCharacter(0x115f) is '%u115F'
+PASS testIDNRoundTrip(0x1160) is '%u1160'
+PASS testIDNRoundTripNotFirstCharacter(0x1160) is '%u1160'
+PASS testIDNRoundTrip(0x3164) is '%u3164'
+PASS testIDNRoundTripNotFirstCharacter(0x3164) is '%u3164'
+PASS testIDNRoundTrip(0x321d) is '%28%uC624%uC804%29'
+PASS testIDNRoundTripNotFirstCharacter(0x321d) is '%28%uC624%uC804%29'
+PASS testIDNRoundTrip(0x321e) is '%28%uC624%uD6C4%29'
+PASS testIDNRoundTripNotFirstCharacter(0x321e) is '%28%uC624%uD6C4%29'
+PASS testIDNRoundTrip(0xfe14) is '%3B'
+PASS testIDNRoundTripNotFirstCharacter(0xfe14) is '%3B'
+PASS testIDNRoundTrip(0xfe15) is '%21'
+PASS testIDNRoundTripNotFirstCharacter(0xfe15) is '%21'
+PASS testIDNRoundTrip(0xffa0) is '%uFFA0'
+PASS testIDNRoundTripNotFirstCharacter(0xffa0) is '%uFFA0'
 

Modified: trunk/LayoutTests/fast/encoding/idn-security.html (208901 => 208902)


--- trunk/LayoutTests/fast/encoding/idn-security.html	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/LayoutTests/fast/encoding/idn-security.html	2016-11-18 22:47:24 UTC (rev 208902)
@@ -134,8 +134,6 @@
 testIDNCharacter(0xA000, "allowed");
 
 /* ICU converts these to other allowed characters, so the original character can't be used to get to a phishy domain name */
-testIDNCharacter(0x2024, ".");
-testIDNCharacter(0xFE52, ".");
 testIDNCharacter(0xFF0F, "/");
 
 /* ICU converts these characters to backslash, so the original character can't be used to get to a phishy domain name */
@@ -168,16 +166,6 @@
 testIDNCharacter(0x01C3, "disallowed");
 testIDNCharacter(0x0251, "disallowed");
 testIDNCharacter(0x0261, "disallowed");
-testIDNCharacter(0x0337, "disallowed");
-testIDNCharacter(0x0337, "disallowed");
-testIDNCharacter(0x0338, "disallowed");
-testIDNCharacter(0x0338, "disallowed");
-testIDNCharacter(0x05B4, "disallowed");
-testIDNCharacter(0x05BC, "disallowed");
-testIDNCharacter(0x0660, "disallowed");
-testIDNCharacter(0x06F0, "disallowed");
-testIDNCharacter(0x115F, "disallowed");
-testIDNCharacter(0x1160, "disallowed");
 testIDNCharacter(0x2027, "disallowed");
 testIDNCharacter(0x2039, "disallowed");
 testIDNCharacter(0x203A, "disallowed");
@@ -206,19 +194,13 @@
 testIDNCharacter(0x3015, "disallowed");
 testIDNCharacter(0x3033, "disallowed");
 testIDNCharacter(0x3035, "disallowed");
-testIDNCharacter(0x3164, "disallowed");
-testIDNCharacter(0x321D, "disallowed");
-testIDNCharacter(0x321E, "disallowed");
 testIDNCharacter(0x33AE, "disallowed");
 testIDNCharacter(0x33AF, "disallowed");
 testIDNCharacter(0x33C6, "disallowed");
 testIDNCharacter(0x33DF, "disallowed");
-testIDNCharacter(0xFE14, "disallowed");
-testIDNCharacter(0xFE15, "disallowed");
 testIDNCharacter(0xFE3F, "disallowed");
 testIDNCharacter(0xFE5D, "disallowed");
 testIDNCharacter(0xFE5E, "disallowed");
-testIDNCharacter(0xFFA0, "disallowed");
 
 /* ICU won't encode these characters in IDN, thus we should always get 'host not found'. */
 testIDNCharacter(0x2028, "does not encode");
@@ -258,6 +240,24 @@
     testIDNCharacter(0xFF0E, ".");
     testIDNCharacter(0xFF61, ".");
     testIDNCharacter(0xFEFF, "");
+    testIDNCharacter(0x2024, ".");
+    testIDNCharacter(0xFE52, ".");
+    testIDNCharacter(0x0337, "disallowed");
+    testIDNCharacter(0x0337, "disallowed");
+    testIDNCharacter(0x0338, "disallowed");
+    testIDNCharacter(0x0338, "disallowed");
+    testIDNCharacter(0x05B4, "disallowed");
+    testIDNCharacter(0x05BC, "disallowed");
+    testIDNCharacter(0x0660, "disallowed");
+    testIDNCharacter(0x06F0, "disallowed");
+    testIDNCharacter(0x115F, "disallowed");
+    testIDNCharacter(0x1160, "disallowed");
+    testIDNCharacter(0x3164, "disallowed");
+    testIDNCharacter(0x321D, "disallowed");
+    testIDNCharacter(0x321E, "disallowed");
+    testIDNCharacter(0xFE14, "disallowed");
+    testIDNCharacter(0xFE15, "disallowed");
+    testIDNCharacter(0xFFA0, "disallowed");
 } else {
     testIDNCharacter(0x200B, "does not encode", "");
     testIDNCharacter(0x3002, "does not encode", ".");
@@ -264,6 +264,24 @@
     testIDNCharacter(0xFF0E, "does not encode", ".");
     testIDNCharacter(0xFF61, "does not encode", ".");
     testIDNCharacter(0xFEFF, "does not encode", "");
+    testIDNCharacter(0x2024, "%u2024");
+    testIDNCharacter(0xFE52, "%uFE52");
+    testIDNCharacter(0x0337, "%u0337", "punycode");
+    testIDNCharacter(0x0337, "%u0337", "punycode");
+    testIDNCharacter(0x0338, "%u0338", "punycode");
+    testIDNCharacter(0x0338, "%u0338", "punycode");
+    testIDNCharacter(0x05B4, "%u05B4", "punycode");
+    testIDNCharacter(0x05BC, "%u05BC", "punycode");
+    testIDNCharacter(0x0660, "%u0660");
+    testIDNCharacter(0x06F0, "disallowed");
+    testIDNCharacter(0x115F, "%u115F");
+    testIDNCharacter(0x1160, "%u1160");
+    testIDNCharacter(0x3164, "%u3164");
+    testIDNCharacter(0x321D, "%28%uC624%uC804%29");
+    testIDNCharacter(0x321E, "%28%uC624%uD6C4%29");
+    testIDNCharacter(0xFE14, "%3B");
+    testIDNCharacter(0xFE15, "%21");
+    testIDNCharacter(0xFFA0, "%uFFA0");
 }
 
 </script>

Modified: trunk/LayoutTests/fast/url/idna2003-expected.txt (208901 => 208902)


--- trunk/LayoutTests/fast/url/idna2003-expected.txt	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/LayoutTests/fast/url/idna2003-expected.txt	2016-11-18 22:47:24 UTC (rev 208902)
@@ -4,15 +4,15 @@
 
 
 The PASS/FAIL results of this test are set to the behavior in IDNA2003.
-PASS canonicalize('http://faß.de/') is 'http://fass.de/'
-PASS canonicalize('http://βόλος.com/') is 'http://xn--nxasmq6b.com/'
-PASS canonicalize('http://ශ්‍රී.com/') is 'http://xn--10cl1a0b.com/'
-PASS canonicalize('http://نامه‌ای.com/') is 'http://xn--mgba3gch31f.com/'
+FAIL canonicalize('http://faß.de/') should be http://fass.de/. Was http://xn--fa-hia.de/.
+FAIL canonicalize('http://βόλος.com/') should be http://xn--nxasmq6b.com/. Was http://xn--nxasmm1c.com/.
+FAIL canonicalize('http://ශ්‍රී.com/') should be http://xn--10cl1a0b.com/. Was http://xn--10cl1a0b660p.com/.
+FAIL canonicalize('http://نامه‌ای.com/') should be http://xn--mgba3gch31f.com/. Was http://xn--mgba3gch31f060k.com/.
 PASS canonicalize('http://www.looĸout.net/') is 'http://www.xn--looout-5bb.net/'
 PASS canonicalize('http://ᗯᗯᗯ.lookout.net/') is 'http://xn--1qeaa.lookout.net/'
 PASS canonicalize('http://www.lookout.сом/') is 'http://www.lookout.xn--l1adi/'
 FAIL canonicalize('http://www.lookout.net:80/') should be http://www.lookout.net:80/. Was http://www.lookout.net:80/.
-PASS canonicalize('http://www‥lookout.net/') is 'http://www..lookout.net/'
+FAIL canonicalize('http://www‥lookout.net/') should be http://www..lookout.net/. Was http://www‥lookout.net/.
 PASS canonicalize('http://www.lookout‧net/') is 'http://www.xn--lookoutnet-406e/'
 PASS canonicalize('http://www.looĸout.net/') is 'http://www.xn--looout-5bb.net/'
 FAIL canonicalize('http://www.lookout.net⩴80/') should be http://www.lookout.net::%3D80/. Was http://www.lookout.net⩴80/.

Modified: trunk/LayoutTests/fast/url/idna2008-expected.txt (208901 => 208902)


--- trunk/LayoutTests/fast/url/idna2008-expected.txt	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/LayoutTests/fast/url/idna2008-expected.txt	2016-11-18 22:47:24 UTC (rev 208902)
@@ -5,20 +5,20 @@
 
 The PASS/FAIL results of this test are set to the behavior in IDNA2008.
 PASS canonicalize('http://Bücher.de/') is 'http://xn--bcher-kva.de/'
-FAIL canonicalize('http://faß.de/') should be http://xn--fa-hia.de/. Was http://fass.de/.
-FAIL canonicalize('http://βόλος.com/') should be http://xn--nxasmm1c.com/. Was http://xn--nxasmq6b.com/.
-FAIL canonicalize('http://ශ්‍රී.com/') should be http://xn--10cl1a0b660p.com/. Was http://xn--10cl1a0b.com/.
-FAIL canonicalize('http://نامه‌ای.com/') should be http://xn--mgba3gch31f060k.com/. Was http://xn--mgba3gch31f.com/.
+PASS canonicalize('http://faß.de/') is 'http://xn--fa-hia.de/'
+PASS canonicalize('http://βόλος.com/') is 'http://xn--nxasmm1c.com/'
+PASS canonicalize('http://ශ්‍රී.com/') is 'http://xn--10cl1a0b660p.com/'
+PASS canonicalize('http://نامه‌ای.com/') is 'http://xn--mgba3gch31f060k.com/'
 FAIL canonicalize('http://♥.net/') should be http://�.net/. Was http://xn--g6h.net/.
-FAIL canonicalize('http://͸.net/') should be http://�.net/. Was http://xn--zva.net/.
-FAIL canonicalize('http://Ӏ.com/') should be http://�.com/. Was http://xn--d5a.com/.
-FAIL canonicalize('http://㛼.com/') should be http://�.com/. Was http://xn--j74i.com/.
-FAIL canonicalize('http://Ↄ.com/') should be http://�.com/. Was http://xn--q5g.com/.
+FAIL canonicalize('http://͸.net/') should be http://�.net/. Was http://͸.net/.
+FAIL canonicalize('http://Ӏ.com/') should be http://�.com/. Was http://Ӏ.com/.
+FAIL canonicalize('http://㛼.com/') should be http://�.com/. Was http://㛼.com/.
+FAIL canonicalize('http://Ↄ.com/') should be http://�.com/. Was http://Ↄ.com/.
 PASS canonicalize('http://look͏out.net/') is 'http://lookout.net/'
 PASS canonicalize('http://gOoGle.com/') is 'http://google.com/'
 FAIL canonicalize('http://ড়.com/') should be http://ড়.com/. Was http://xn--15b8c.com/.
-FAIL canonicalize('http://ẞ.com/') should be http://ss.com/. Was http://xn--kkg.com/.
-FAIL canonicalize('http://ẞ.foo.com/') should be http://ss.foo.com/. Was http://xn--kkg.foo.com/.
+PASS canonicalize('http://ẞ.com/') is 'http://ss.com/'
+PASS canonicalize('http://ẞ.foo.com/') is 'http://ss.foo.com/'
 FAIL canonicalize('http://-foo.bar.com/') should be http:///. Was http://-foo.bar.com/.
 FAIL canonicalize('http://foo-.bar.com/') should be http:///. Was http://foo-.bar.com/.
 FAIL canonicalize('http://ab--cd.com/') should be http:///. Was http://ab--cd.com/.

Modified: trunk/Source/WebCore/ChangeLog (208901 => 208902)


--- trunk/Source/WebCore/ChangeLog	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Source/WebCore/ChangeLog	2016-11-18 22:47:24 UTC (rev 208902)
@@ -1,3 +1,37 @@
+2016-11-17  Alex Christensen  <achristen...@webkit.org>
+
+        Support IDN2008 with UTS #46 instead of IDN2003
+        https://bugs.webkit.org/show_bug.cgi?id=144194
+
+        Reviewed by Darin Adler.
+
+        Use uidna_nameToASCII instead of the deprecated uidna_IDNToASCII.
+        It uses IDN2008 instead of IDN2003, and it uses UTF #46 when used with a UIDNA opened with uidna_openUTS46.
+        This follows https://url.spec.whatwg.org/#concept-domain-to-ascii except we do not use Transitional_Processing
+        to prevent homograph attacks on german domain names with "ß" and "ss" in them.  These are now treated as separate domains.
+        Firefox also doesn't use Transitional_Processing. Chrome and the current specification use Transitional_processing,
+        but https://github.com/whatwg/url/issues/110 might change the spec.
+        
+        In addition, http://unicode.org/reports/tr46/ says:
+        "implementations are encouraged to apply the Bidi and ContextJ validity criteria"
+        Bidi checks prevent domain names with bidirectional text, such as latin and hebrew characters in the same domain.  Chrome and Firefox do this.
+
+        ContextJ checks prevent code points such as U+200D, which is a zero-width joiner which users would not see when looking at the domain name.
+        Firefox currently enables ContextJ checks and it is suggested by UTS #46, so we'll do it.
+
+        ContextO checks, which we do not use and neither does any other browser nor the spec, would fail if a domain contains code points such as U+30FB,
+        which looks somewhat like a dot.  We can investigate enabling these checks later.
+
+        Covered by new API tests and rebased LayoutTests.
+        The new API tests verify that we do not use transitional processing, that we do apply the Bidi and ContextJ checks, but not ContextO checks.
+
+        * platform/URLParser.cpp:
+        (WebCore::URLParser::domainToASCII):
+        (WebCore::URLParser::internationalDomainNameTranscoder):
+        * platform/URLParser.h:
+        * platform/mac/WebCoreNSURLExtras.mm:
+        (WebCore::mapHostNameWithRange):
+
 2016-11-18  Dean Jackson  <d...@apple.com>
 
         Better testing for accessibility media queries

Modified: trunk/Source/WebCore/platform/URLParser.cpp (208901 => 208902)


--- trunk/Source/WebCore/platform/URLParser.cpp	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Source/WebCore/platform/URLParser.cpp	2016-11-18 22:47:24 UTC (rev 208902)
@@ -29,6 +29,7 @@
 #include "Logging.h"
 #include "RuntimeApplicationChecks.h"
 #include <array>
+#include <mutex>
 #include <unicode/uidna.h>
 #include <unicode/utypes.h>
 
@@ -2479,19 +2480,11 @@
     
     UChar hostnameBuffer[defaultInlineBufferSize];
     UErrorCode error = U_ZERO_ERROR;
-
-#if COMPILER(GCC) || COMPILER(CLANG)
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
-#endif
-    // FIXME: This should use uidna_openUTS46 / uidna_close instead
-    int32_t numCharactersConverted = uidna_IDNToASCII(StringView(domain).upconvertedCharacters(), domain.length(), hostnameBuffer, defaultInlineBufferSize, UIDNA_ALLOW_UNASSIGNED, nullptr, &error);
-#if COMPILER(GCC) || COMPILER(CLANG)
-#pragma GCC diagnostic pop
-#endif
+    UIDNAInfo processingDetails = UIDNA_INFO_INITIALIZER;
+    int32_t numCharactersConverted = uidna_nameToASCII(&internationalDomainNameTranscoder(), StringView(domain).upconvertedCharacters(), domain.length(), hostnameBuffer, defaultInlineBufferSize, &processingDetails, &error);
     ASSERT(numCharactersConverted <= static_cast<int32_t>(defaultInlineBufferSize));
 
-    if (error == U_ZERO_ERROR) {
+    if (U_SUCCESS(error) && !processingDetails.errors) {
         for (int32_t i = 0; i < numCharactersConverted; ++i) {
             ASSERT(isASCII(hostnameBuffer[i]));
             ASSERT(!isASCIIUpper(hostnameBuffer[i]));
@@ -2760,6 +2753,19 @@
     return String::adopt(WTFMove(output));
 }
 
+const UIDNA& URLParser::internationalDomainNameTranscoder()
+{
+    static UIDNA* encoder;
+    static std::once_flag onceFlag;
+    std::call_once(onceFlag, [] {
+        UErrorCode error = U_ZERO_ERROR;
+        encoder = uidna_openUTS46(UIDNA_CHECK_BIDI | UIDNA_CHECK_CONTEXTJ | UIDNA_NONTRANSITIONAL_TO_UNICODE | UIDNA_NONTRANSITIONAL_TO_ASCII, &error);
+        RELEASE_ASSERT(U_SUCCESS(error));
+        RELEASE_ASSERT(encoder);
+    });
+    return *encoder;
+}
+
 bool URLParser::allValuesEqual(const URL& a, const URL& b)
 {
     // FIXME: m_cannotBeABaseURL is not compared because the old URL::parse did not use it,

Modified: trunk/Source/WebCore/platform/URLParser.h (208901 => 208902)


--- trunk/Source/WebCore/platform/URLParser.h	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Source/WebCore/platform/URLParser.h	2016-11-18 22:47:24 UTC (rev 208902)
@@ -29,6 +29,8 @@
 #include "URL.h"
 #include <wtf/Forward.h>
 
+struct UIDNA;
+
 namespace WebCore {
 
 template<typename CharacterType> class CodePointIterator;
@@ -48,6 +50,8 @@
     static URLEncodedForm parseURLEncodedForm(StringView);
     static String serialize(const URLEncodedForm&);
 
+    static const UIDNA& internationalDomainNameTranscoder();
+
 private:
     static Optional<uint16_t> defaultPortForProtocol(StringView);
     friend Optional<uint16_t> defaultPortForProtocol(StringView);

Modified: trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm (208901 => 208902)


--- trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Source/WebCore/platform/mac/WebCoreNSURLExtras.mm	2016-11-18 22:47:24 UTC (rev 208902)
@@ -27,6 +27,7 @@
  */
 
 #import "config.h"
+#import "URLParser.h"
 #import "WebCoreObjCExtras.h"
 #import "WebCoreNSStringExtras.h"
 #import "WebCoreNSURLExtras.h"
@@ -478,8 +479,9 @@
     [string getCharacters:sourceBuffer range:range];
     
     UErrorCode uerror = U_ZERO_ERROR;
-    int32_t numCharactersConverted = (encode ? uidna_IDNToASCII : uidna_IDNToUnicode)(sourceBuffer, length, destinationBuffer, HOST_NAME_BUFFER_LENGTH, UIDNA_ALLOW_UNASSIGNED, NULL, &uerror);
-    if (U_FAILURE(uerror)) {
+    UIDNAInfo processingDetails = UIDNA_INFO_INITIALIZER;
+    int32_t numCharactersConverted = (encode ? uidna_nameToASCII : uidna_nameToUnicode)(&URLParser::internationalDomainNameTranscoder(), sourceBuffer, length, destinationBuffer, HOST_NAME_BUFFER_LENGTH, &processingDetails, &uerror);
+    if (U_FAILURE(uerror) || processingDetails.errors) {
         *error = YES;
         return nil;
     }

Modified: trunk/Tools/ChangeLog (208901 => 208902)


--- trunk/Tools/ChangeLog	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Tools/ChangeLog	2016-11-18 22:47:24 UTC (rev 208902)
@@ -1,3 +1,19 @@
+2016-11-17  Alex Christensen  <achristen...@webkit.org>
+
+        Support IDN2008 with UTS #46 instead of IDN2003
+        https://bugs.webkit.org/show_bug.cgi?id=144194
+
+        Reviewed by Darin Adler.
+
+        * TestWebKitAPI/Tests/WebCore/URLParser.cpp:
+        (TestWebKitAPI::TEST_F):
+        Add some tests from http://unicode.org/faq/idn.html verifying that we follow UTS46's deviations from IDN2008.
+        Add some tests based on https://tools.ietf.org/html/rfc5893 verifying that we check for bidirectional text.
+        Add a test based on https://tools.ietf.org/html/rfc5892 verifying that we do not do ContextO check.
+        Add a test for U+321D and U+321E which have particularly interesting punycode encodings.  We match Firefox here now.
+        Also add a test from http://www.unicode.org/reports/tr46/#IDNAComparison verifying we are not using IDN2003.
+        We should consider importing all of http://www.unicode.org/Public/idna/9.0.0/IdnaTest.txt as URL domain tests.
+
 2016-11-17  Carlos Garcia Campos  <cgar...@igalia.com>
 
         Downloads started by context menu actions should also have a web view associated

Modified: trunk/Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp (208901 => 208902)


--- trunk/Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp	2016-11-18 22:32:01 UTC (rev 208901)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/URLParser.cpp	2016-11-18 22:47:24 UTC (rev 208902)
@@ -1093,6 +1093,32 @@
     checkRelativeURLDifferences("a://b", "//[aBc]",
         {"a", "", "", "b", 0, "", "", "", "a://b"},
         {"", "", "", "", 0, "", "", "", "a://b"});
+    checkURL(utf16String(u"http://öbb.at"), {"http", "", "", "xn--bb-eka.at", 0, "/", "", "", "http://xn--bb-eka.at/"});
+    checkURL(utf16String(u"http://ÖBB.at"), {"http", "", "", "xn--bb-eka.at", 0, "/", "", "", "http://xn--bb-eka.at/"});
+    checkURL(utf16String(u"http://√.com"), {"http", "", "", "xn--19g.com", 0, "/", "", "", "http://xn--19g.com/"});
+    checkURLDifferences(utf16String(u"http://faß.de"),
+        {"http", "", "", "xn--fa-hia.de", 0, "/", "", "", "http://xn--fa-hia.de/"},
+        {"http", "", "", "fass.de", 0, "/", "", "", "http://fass.de/"});
+    checkURL(utf16String(u"http://ԛәлп.com"), {"http", "", "", "xn--k1ai47bhi.com", 0, "/", "", "", "http://xn--k1ai47bhi.com/"});
+    checkURLDifferences(utf16String(u"http://Ⱥbby.com"),
+        {"http", "", "", "xn--bby-iy0b.com", 0, "/", "", "", "http://xn--bby-iy0b.com/"},
+        {"http", "", "", "xn--bby-spb.com", 0, "/", "", "", "http://xn--bby-spb.com/"});
+    checkURLDifferences(utf16String(u"http://\u2132"),
+        {"", "", "", "", 0, "", "", "", utf16String(u"http://Ⅎ")},
+        {"http", "", "", "xn--f3g", 0, "/", "", "", "http://xn--f3g/"});
+    checkURLDifferences(utf16String(u"http://\u05D9\u05B4\u05D5\u05D0\u05B8/"),
+        {"http", "", "", "xn--cdbi5etas", 0, "/", "", "", "http://xn--cdbi5etas/"},
+        {"", "", "", "", 0, "", "", "", "about:blank"}, TestTabs::No);
+    checkURLDifferences(utf16String(u"http://bidirectional\u0786\u07AE\u0782\u07B0\u0795\u07A9\u0793\u07A6\u0783\u07AA/"),
+        {"", "", "", "", 0, "", "", "", utf16String(u"http://bidirectionalކޮންޕީޓަރު/")},
+        {"", "", "", "", 0, "", "", "", "about:blank"}, TestTabs::No);
+    checkURLDifferences(utf16String(u"http://contextj\u200D"),
+        {"", "", "", "", 0, "", "", "", utf16String(u"http://contextj\u200D")},
+        {"http", "", "", "contextj", 0, "/", "", "", "http://contextj/"});
+    checkURL(utf16String(u"http://contexto\u30FB"), {"http", "", "", "xn--contexto-wg5g", 0, "/", "", "", "http://xn--contexto-wg5g/"});
+    checkURLDifferences(utf16String(u"http://\u321D\u321E/"),
+        {"http", "", "", "xn--()()-bs0sc174agx4b", 0, "/", "", "", "http://xn--()()-bs0sc174agx4b/"},
+        {"http", "", "", "xn--5mkc", 0, "/", "", "", "http://xn--5mkc/"});
 }
 
 TEST_F(URLParserTest, DefaultPort)
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to