Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: e2cf0b9d1b39b4d2adc69adcafc8346f385bba25
https://github.com/WebKit/WebKit/commit/e2cf0b9d1b39b4d2adc69adcafc8346f385bba25
Author: Chris Dumez <[email protected]>
Date: 2026-01-04 (Sun, 04 Jan 2026)
Changed paths:
M Source/WTF/wtf/text/StringCommon.h
Log Message:
-----------
Use faster algorithm in WTF::copyElements() for better performance
https://bugs.webkit.org/show_bug.cgi?id=304835
Reviewed by Yusuke Suzuki.
The old code used interleaved stores (vst2q_u8) to achieve the
upconversion by interleaving data with zeros. Your new code uses
vmovl_u8 (widening move), which is specifically designed for
zero-extending 8-bit to 16-bit values. This is exactly what the hardware
instruction was meant for - it's more semantically direct.
The old code used vst2q_u8 which writes in an interleaved pattern
(complex addressing). The new code uses straightforward sequential
vst1q_u16 stores. Sequential stores are generally more cache-friendly
and easier for the CPU's store buffer to handle.
Micro-benchmark results:
====================================
Size | Before | After | Speedup | Before | After
--------------------------------------------------------------------------------
16 bytes | 2.27 ns | 1.87 ns | 1.21x | 7041.40 |
8540.33 GB/s
32 bytes | 2.37 ns | 2.33 ns | 1.02x | 13492.26 |
13751.07 GB/s
48 bytes | 2.83 ns | 2.85 ns | 0.99x | 16987.61 |
16851.08 GB/s
63 bytes | 5.21 ns | 5.15 ns | 1.01x | 12086.38 |
12229.38 GB/s
64 bytes | 1.64 ns | 1.25 ns | 1.32x | 39018.24 |
51383.06 GB/s
65 bytes | 1.94 ns | 1.48 ns | 1.31x | 33554.37 |
43843.30 GB/s
128 bytes | 2.89 ns | 2.04 ns | 1.42x | 44259.87 |
62891.60 GB/s
256 bytes | 5.31 ns | 4.19 ns | 1.27x | 48216.11 |
61029.10 GB/s
512 bytes | 10.18 ns | 8.30 ns | 1.23x | 50310.98 |
61708.24 GB/s
1024 bytes | 19.43 ns | 16.52 ns | 1.18x | 52707.15 |
61987.44 GB/s
4096 bytes | 125.92 ns | 69.52 ns | 1.81x | 32528.44 |
58914.64 GB/s
8192 bytes | 159.90 ns | 149.73 ns | 1.07x | 51233.31 |
54712.31 GB/s
16384 bytes | 416.98 ns | 528.18 ns | 0.79x | 39291.73 |
31019.66 GB/s
32768 bytes | 963.62 ns | 808.52 ns | 1.19x | 34005.25 |
40528.20 GB/s
65536 bytes | 1529.74 ns | 1484.54 ns | 1.03x | 42841.39 |
44145.55 GB/s
131072 bytes | 2452.53 ns | 1982.41 ns | 1.24x | 53443.64 |
66117.38 GB/s
262144 bytes | 6291.20 ns | 6126.99 ns | 1.03x | 41668.38 |
42785.13 GB/s
524288 bytes | 12640.27 ns | 12525.43 ns | 1.01x | 41477.59 |
41857.88 GB/s
1048576 bytes | 24742.13 ns | 23726.78 ns | 1.04x | 42380.18 |
44193.77 GB/s
This seems to result in a 0.56%-0.74% progression on Speedometer 3 on macOS,
depending on the model as well. It is performance neutral on iOS and on
JetStream.
I used Claude AI to assist with this optimization.
* Source/WTF/wtf/text/StringCommon.h:
(WTF::copyElements):
Canonical link: https://commits.webkit.org/305095@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications