Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: d17da45094f8eb45a42da3807fe417cead594374
https://github.com/WebKit/WebKit/commit/d17da45094f8eb45a42da3807fe417cead594374
Author: Marcus Plutowski <[email protected]>
Date: 2026-05-11 (Mon, 11 May 2026)
Changed paths:
M Source/WTF/wtf/LockAlgorithmInlines.h
Log Message:
-----------
Rework ParkingLot's spinloop
https://bugs.webkit.org/show_bug.cgi?id=314101
rdar://176237718
Reviewed by Yusuke Suzuki.
This includes three main changes:
1. Only call sched_yield once every N
loop iterations, and don't call it at all
for the first N.
2. Add a blind inner loop which unconditionally
executes M yields per outer loop iteration.
3. Tuning the above M, N, and the overall spin-count
O on a per-platform basis.
By doing so, it improves performance in the semi-contended
regime without impairing it in the heavily-contended case.
This tuning was not exhaustive, so there are likely still
opportunities for further improvement, e.g. via some sort
of exponential backoff, or even by tuning the parameters
per-SoC, rather than per-platform.
There are a few variables to consider:
* Time-to-park: the total CPU time of a full spinloop
~= spinLimit * (nopCount + 1/yieldInterval)
Increases with spinLimit, nopCount; decreases with yieldInterval.
Higher is better (to a point) for semi-contended,
but significantly worse for heavily-contended, as we pay
the full cost of the spinloop on ~every attempt to acquire.
* Niceness: (vaguely) how often we yield vs. run on core
~= 1 / (yieldInterval * nopCount)
Higher is better for heavily-contended, as it means that high-
priority threads will 'make room' for other threads as they
spin, rather than taking up high-priority CPU time on a spinloop.
However, it's worse for the semi-contended case, as when we
do acquire the spinlock the priority depression can last
for some time, meaning it could take a few quanta to get
back to 'full speed'.
* Poll-rate: the rate at which we read the atomic lock bit
~= 1 / (nopCount + 1/yieldInterval)
This affects performance in two different ways.
The first is that, if the lock does become available, we
may be in the middle of a nop-spin, and therefore have to
execute the remaining nops before we check again.
Therefore, in the semi-contended case we want a higher frequency.
However, the higher the frequency, the more often we hammer the
lock's cache line. In sparse contention regimes this is relatively
OK: e.g. if there's only a single waiter, then the cache-line
stays local. With multiple waiters, however, then the line
can ping between cores, hurting performance.
Therefore, in the heavily-contended case it's better for this
to be lower.
In general, the gains for the semi-contended case are modest, but
show up across the board. On the flipside, hits to the heavily-
contended case tend to be localized to a few scenarios, but have
a very large effect-size; heavy contention is very rare
(by design, from how WebKit uses locks), but very sensitive
because spinlocks are poorly-adapted for that regime. E.g.
omitting sched-yield entirely can more than double the runtime
of certain benchmarks!
No new tests because existing tests cover changes to lock behavior.
Canonical link: https://commits.webkit.org/313051@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications