Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 0f81753fdddc1035077ec0ed7a8c21b23477ee27
https://github.com/WebKit/WebKit/commit/0f81753fdddc1035077ec0ed7a8c21b23477ee27
Author: Sosuke Suzuki <[email protected]>
Date: 2026-02-03 (Tue, 03 Feb 2026)
Changed paths:
A JSTests/microbenchmarks/string-trim-end-unicode-whitespace.js
A JSTests/microbenchmarks/string-trim-start-unicode-whitespace.js
A JSTests/microbenchmarks/string-trim-unicode-whitespace.js
A JSTests/stress/trim-unicode-whitespace.js
M Source/JavaScriptCore/CMakeLists.txt
M Source/JavaScriptCore/DerivedSources-input.xcfilelist
M Source/JavaScriptCore/DerivedSources-output.xcfilelist
M Source/JavaScriptCore/DerivedSources.make
M Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
M Source/JavaScriptCore/Sources.txt
M Source/JavaScriptCore/parser/Lexer.h
A Source/JavaScriptCore/parser/LexerUnicodeProperties.cpp
A Source/JavaScriptCore/parser/LexerUnicodeProperties.h
A Source/JavaScriptCore/parser/generateLexerUnicodePropertyTables.py
Log Message:
-----------
[JSC] Optimize `Lexer<char16_t>::isWhiteSpace` to avoid ICU calls
https://bugs.webkit.org/show_bug.cgi?id=306741
Reviewed by Yusuke Suzuki.
This patch optimizes Lexer<char16_t>::isWhiteSpace by replacing the ICU
u_charType() call with direct comparisons for non-Latin1 whitespace characters.
ECMAScript WhiteSpace includes Unicode Zs (Space_Separator) category
characters[1],
which is a fixed set of 17 code points[2]. Since this set is stable, we can
enumerate
them directly instead of calling ICU at runtime.
Non-Latin1 Zs characters handled:
- U+1680 OGHAM SPACE MARK
- U+2000..U+200A EN QUAD through HAIR SPACE
- U+202F NARROW NO-BREAK SPACE
- U+205F MEDIUM MATHEMATICAL SPACE
- U+3000 IDEOGRAPHIC SPACE
- U+FEFF BOM (not Zs but ECMAScript WhiteSpace)
TipOfTree Patched
string-trim-start-unicode-whitespace 74.4752+-7.1193 ^
37.1204+-14.8022 ^ definitely 2.0063x faster
string-trim-end-unicode-whitespace 62.3887+-20.0288
36.3405+-12.6097 might be 1.7168x faster
string-trim-unicode-whitespace 118.3358+-37.3310 ^
60.7765+-19.8147 ^ definitely 1.9471x faster
[1]: https://tc39.es/ecma262/#sec-trimstring
[2]:
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AGeneral_Category%3DSpace_Separator%3A%5D
Tests: JSTests/microbenchmarks/string-raw.js
JSTests/microbenchmarks/string-trim-end-unicode-whitespace.js
JSTests/microbenchmarks/string-trim-start-unicode-whitespace.js
JSTests/microbenchmarks/string-trim-unicode-whitespace.js
JSTests/stress/trim-unicode-whitespace.js
* JSTests/microbenchmarks/string-trim-end-unicode-whitespace.js: Added.
* JSTests/microbenchmarks/string-trim-start-unicode-whitespace.js: Added.
* JSTests/microbenchmarks/string-trim-unicode-whitespace.js: Added.
* JSTests/stress/trim-unicode-whitespace.js: Added.
(shouldBe):
(of.zsCharacters.code.toString.16.toUpperCase.padStart):
(code.toString.16.toUpperCase.padStart):
(of.otherWhitespace.code.toString.16.toUpperCase.padStart):
(of.nonWhitespace.code.toString.16.toUpperCase.padStart):
* Source/JavaScriptCore/CMakeLists.txt:
* Source/JavaScriptCore/DerivedSources-input.xcfilelist:
* Source/JavaScriptCore/DerivedSources-output.xcfilelist:
* Source/JavaScriptCore/DerivedSources.make:
* Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj:
* Source/JavaScriptCore/Sources.txt:
* Source/JavaScriptCore/parser/Lexer.h:
(JSC::Lexer<char16_t>::isWhiteSpace):
* Source/JavaScriptCore/parser/LexerUnicodeProperties.cpp: Added.
* Source/JavaScriptCore/parser/LexerUnicodeProperties.h: Added.
* Source/JavaScriptCore/parser/generateLexerUnicodePropertyTables.py: Added.
(parse_unicode_data):
(group_code_points):
(generate_output):
(main):
Canonical link: https://commits.webkit.org/306702@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications