Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: d17da45094f8eb45a42da3807fe417cead594374
      
https://github.com/WebKit/WebKit/commit/d17da45094f8eb45a42da3807fe417cead594374
  Author: Marcus Plutowski <[email protected]>
  Date:   2026-05-11 (Mon, 11 May 2026)

  Changed paths:
    M Source/WTF/wtf/LockAlgorithmInlines.h

  Log Message:
  -----------
  Rework ParkingLot's spinloop
https://bugs.webkit.org/show_bug.cgi?id=314101
rdar://176237718

Reviewed by Yusuke Suzuki.

This includes three main changes:
 1. Only call sched_yield once every N
    loop iterations, and don't call it at all
    for the first N.
 2. Add a blind inner loop which unconditionally
    executes M yields per outer loop iteration.
 3. Tuning the above M, N, and the overall spin-count
    O on a per-platform basis.
By doing so, it improves performance in the semi-contended
regime without impairing it in the heavily-contended case.

This tuning was not exhaustive, so there are likely still
opportunities for further improvement, e.g. via some sort
of exponential backoff, or even by tuning the parameters
per-SoC, rather than per-platform.

There are a few variables to consider:
  * Time-to-park: the total CPU time of a full spinloop
    ~= spinLimit * (nopCount + 1/yieldInterval)
    Increases with spinLimit, nopCount; decreases with yieldInterval.
    Higher is better (to a point) for semi-contended,
    but significantly worse for heavily-contended, as we pay
    the full cost of the spinloop on ~every attempt to acquire.
  * Niceness: (vaguely) how often we yield vs. run on core
    ~= 1 / (yieldInterval * nopCount)
    Higher is better for heavily-contended, as it means that high-
    priority threads will 'make room' for other threads as they
    spin, rather than taking up high-priority CPU time on a spinloop.
    However, it's worse for the semi-contended case, as when we
    do acquire the spinlock the priority depression can last
    for some time, meaning it could take a few quanta to get
    back to 'full speed'.
  * Poll-rate: the rate at which we read the atomic lock bit
    ~= 1 / (nopCount + 1/yieldInterval)
    This affects performance in two different ways.
    The first is that, if the lock does become available, we
    may be in the middle of a nop-spin, and therefore have to
    execute the remaining nops before we check again.
    Therefore, in the semi-contended case we want a higher frequency.
    However, the higher the frequency, the more often we hammer the
    lock's cache line. In sparse contention regimes this is relatively
    OK: e.g. if there's only a single waiter, then the cache-line
    stays local. With multiple waiters, however, then the line
    can ping between cores, hurting performance.
    Therefore, in the heavily-contended case it's better for this
    to be lower.

In general, the gains for the semi-contended case are modest, but
show up across the board. On the flipside, hits to the heavily-
contended case tend to be localized to a few scenarios, but have
a very large effect-size; heavy contention is very rare
(by design, from how WebKit uses locks), but very sensitive
because spinlocks are poorly-adapted for that regime. E.g.
omitting sched-yield entirely can more than double the runtime
of certain benchmarks!

No new tests because existing tests cover changes to lock behavior.

Canonical link: https://commits.webkit.org/313051@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

Reply via email to