Title: [287592] trunk
Revision
287592
Author
wenson_hs...@apple.com
Date
2022-01-04 15:31:23 -0800 (Tue, 04 Jan 2022)

Log Message

Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
https://bugs.webkit.org/show_bug.cgi?id=234677

Reviewed by Tim Horton.

Source/WebKit:

Followup to r287420 - use ICU to check for more strings that resemble either the lowercase or uppercase letter
"x", rather than relying on a hard-coded set of symbols. Note that ICU's "confusables" list currently does not
consider both ✖ and ✕ to be lookalikes to the letter "x"; since these symbols are actually known to appear in
modal containers on several websites, we'll still need to check for these two symbols separately.

Test: ModalContainerObservation.ClassifyMultiplySymbol

* UIProcess/Cocoa/ModalContainerControlClassifier.mm:
(WebKit::SpoofChecker::~SpoofChecker):

Add a helper class that wraps calls to `uspoof_areConfusableUTF8`, and also ensures balanced calls to
`uspoof_open` and `uspoof_close` when creating a new ICU spoof checker. Use this in the
WKModalContainerClassifierInput class below to check for more types of strings that look like the letter "x".

(WebKit::SpoofChecker::areConfusable):
(WebKit::SpoofChecker::checker):
(-[WKModalContainerClassifierInput initWithTokenizer:rawInput:]):

Tools:

Augment the existing API test so that it additionally tests a symbol ("small roman numeral ten") that would not
have been covered as one of the three hard-coded symbol strings in the earlier fix.

* TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm:
(TestWebKitAPI::TEST):

Modified Paths

Diff

Modified: trunk/Source/WebKit/ChangeLog (287591 => 287592)


--- trunk/Source/WebKit/ChangeLog	2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Source/WebKit/ChangeLog	2022-01-04 23:31:23 UTC (rev 287592)
@@ -1,3 +1,28 @@
+2022-01-04  Wenson Hsieh  <wenson_hs...@apple.com>
+
+        Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
+        https://bugs.webkit.org/show_bug.cgi?id=234677
+
+        Reviewed by Tim Horton.
+
+        Followup to r287420 - use ICU to check for more strings that resemble either the lowercase or uppercase letter
+        "x", rather than relying on a hard-coded set of symbols. Note that ICU's "confusables" list currently does not
+        consider both ✖ and ✕ to be lookalikes to the letter "x"; since these symbols are actually known to appear in
+        modal containers on several websites, we'll still need to check for these two symbols separately.
+
+        Test: ModalContainerObservation.ClassifyMultiplySymbol
+
+        * UIProcess/Cocoa/ModalContainerControlClassifier.mm:
+        (WebKit::SpoofChecker::~SpoofChecker):
+
+        Add a helper class that wraps calls to `uspoof_areConfusableUTF8`, and also ensures balanced calls to
+        `uspoof_open` and `uspoof_close` when creating a new ICU spoof checker. Use this in the
+        WKModalContainerClassifierInput class below to check for more types of strings that look like the letter "x".
+
+        (WebKit::SpoofChecker::areConfusable):
+        (WebKit::SpoofChecker::checker):
+        (-[WKModalContainerClassifierInput initWithTokenizer:rawInput:]):
+
 2022-01-04  Per Arne Vollan  <pvol...@apple.com>
 
         [iOS][WP] Add telemetry for syscall violations

Modified: trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm (287591 => 287592)


--- trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm	2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Source/WebKit/UIProcess/Cocoa/ModalContainerControlClassifier.mm	2022-01-04 23:31:23 UTC (rev 287592)
@@ -27,6 +27,8 @@
 #import "ModalContainerControlClassifier.h"
 
 #import <WebCore/ModalContainerTypes.h>
+#import <unicode/uspoof.h>
+
 #import <pal/cocoa/CoreMLSoftLink.h>
 #import <pal/cocoa/NaturalLanguageSoftLink.h>
 
@@ -74,6 +76,36 @@
 
 @end
 
+namespace WebKit {
+
+class SpoofChecker {
+    WTF_MAKE_FAST_ALLOCATED;
+public:
+    ~SpoofChecker()
+    {
+        if (m_checker)
+            uspoof_close(m_checker);
+    }
+
+    bool areConfusable(NSString *potentialSpoofString, const char* stringToSpoof)
+    {
+        return checker() && uspoof_areConfusableUTF8(checker(), potentialSpoofString.UTF8String, -1, stringToSpoof, -1, &m_status);
+    }
+
+private:
+    USpoofChecker* checker()
+    {
+        if (!m_checker && m_status == U_ZERO_ERROR)
+            m_checker = uspoof_open(&m_status);
+        return m_checker;
+    }
+
+    UErrorCode m_status { U_ZERO_ERROR };
+    USpoofChecker* m_checker { nullptr };
+};
+
+} // namespace WebKit
+
 @implementation WKModalContainerClassifierInput {
     RetainPtr<NSString> _canonicalInput;
 }
@@ -95,10 +127,11 @@
             return;
 
         if (attributes & (NLTokenizerAttributeSymbolic | NLTokenizerAttributeEmoji)) {
-            // We should consider using a memory-compact hash map if we need to add a large number of entries here in the future.
-            // For now, we only make an exception for the following symbols, so simply checking each string is sufficient.
-            if ([lowercaseToken isEqualToString:@"×"] || [lowercaseToken isEqualToString:@"✕"] || [lowercaseToken isEqualToString:@"✖"])
+            WebKit::SpoofChecker checker;
+            if ([lowercaseToken isEqualToString:@"✕"] || [lowercaseToken isEqualToString:@"✖"] || checker.areConfusable(lowercaseToken, "x") || checker.areConfusable(lowercaseToken, "X")) {
+                // ICU does not consider two unicode symbols to be confusable with the letter x, but for the purposes of the classifier we need to treat them as if they were.
                 [tokens addObject:@"x"];
+            }
             return;
         }
 

Modified: trunk/Tools/ChangeLog (287591 => 287592)


--- trunk/Tools/ChangeLog	2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Tools/ChangeLog	2022-01-04 23:31:23 UTC (rev 287592)
@@ -1,5 +1,18 @@
 2022-01-04  Wenson Hsieh  <wenson_hs...@apple.com>
 
+        Use ICU instead of relying on hard-coded string equality checks in ModalContainerControlClassifier
+        https://bugs.webkit.org/show_bug.cgi?id=234677
+
+        Reviewed by Tim Horton.
+
+        Augment the existing API test so that it additionally tests a symbol ("small roman numeral ten") that would not
+        have been covered as one of the three hard-coded symbol strings in the earlier fix.
+
+        * TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm:
+        (TestWebKitAPI::TEST):
+
+2022-01-04  Wenson Hsieh  <wenson_hs...@apple.com>
+
         ModalContainerObserver should search for text in subframes
         https://bugs.webkit.org/show_bug.cgi?id=234446
         rdar://86897770

Modified: trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm (287591 => 287592)


--- trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm	2022-01-04 23:05:44 UTC (rev 287591)
+++ trunk/Tools/TestWebKitAPI/Tests/WebKitCocoa/ModalContainerObservation.mm	2022-01-04 23:31:23 UTC (rev 287592)
@@ -234,11 +234,16 @@
 TEST(ModalContainerObservation, ClassifyMultiplySymbol)
 {
     auto webView = createModalContainerWebView();
-    [webView loadBundlePage:@"modal-container-custom"];
-    [webView evaluate:@"show(`<p>Hello world</p><button>×</button>`)" andDecidePolicy:_WKModalContainerDecisionHideAndIgnore];
+    auto runTest = [&] (NSString *symbol) {
+        [webView loadBundlePage:@"modal-container-custom"];
+        NSString *scriptToEvaluate = [NSString stringWithFormat:@"show(`<p>Hello world</p><button>%@</button>`)", symbol];
+        [webView evaluate:scriptToEvaluate andDecidePolicy:_WKModalContainerDecisionHideAndIgnore];
 
-    EXPECT_FALSE([[webView contentsAsString] containsString:@"Hello world"]);
-    EXPECT_EQ([webView lastModalContainerInfo].availableTypes, _WKModalContainerControlTypeNeutral);
+        EXPECT_FALSE([[webView contentsAsString] containsString:@"Hello world"]);
+        EXPECT_EQ([webView lastModalContainerInfo].availableTypes, _WKModalContainerControlTypeNeutral);
+    };
+    runTest(@"✕");
+    runTest(@"⨯");
 }
 
 TEST(ModalContainerObservation, DetectSearchTermInBoldTag)
_______________________________________________
webkit-changes mailing list
webkit-changes@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to