Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 62596b85906b6d10e97da86663eadc5ea8f41e46
https://github.com/WebKit/WebKit/commit/62596b85906b6d10e97da86663eadc5ea8f41e46
Author: Keith Miller <[email protected]>
Date: 2026-05-06 (Wed, 06 May 2026)
Changed paths:
M Source/JavaScriptCore/assembler/MacroAssembler.h
M Source/JavaScriptCore/wasm/WasmOMGIRGenerator.cpp
Log Message:
-----------
[OMG] Tail call shuffler should be able to handle registers
https://bugs.webkit.org/show_bug.cgi?id=313820
rdar://176022434
Reviewed by Yusuke Suzuki.
prepareForTailCallImpl shuffles arguments, callee-save restores, and the
return PC into position for Wasm tail calls. The old algorithm sorted
stack moves by source offset and used a "danger zone" heuristic, but it
could only handle stack-to-stack moves -- register-destination arguments
had to be skipped entirely, relying on B3 patchpoint constraints to place
them. This made the patchpoint early-clobber all callee-save registers,
preventing B3 from using callee-saves for any patchpoint input.
This meant that we were *extremely* limited on registers for X86_64. Now
that B3 is free to assign values to callee saves, it should reduce the
constraints on the tail call patchpoint. This will be more important with
the lazy restore frame in https://bugs.webkit.org/show_bug.cgi?id=313732
Replace it with the same parallel move algorithm used by BBQ's
emitShuffle. The algorithm resolves dependencies between moves
recursively and breaks cycles by spilling to scratch slots. This handles
all source/destination combinations (GPR, FPR, Stack) uniformly. There
were a few changes needed for OMG though:
1 Callee-save restoration is now part of the parallel move rather than a
separate emitRestore() call before the shuffle. The algorithm correctly
orders reads from callee-save registers relative to their restores, so
B3 is free to place inputs in callee-save registers.
2 The patchpoint early-clobber is reduced from all callee-saves to just
macroClobberedGPRs (the scratch register). The late clobber is removed
entirely since the patchpoint is terminal. This gives B3 more freedom
in register allocation.
3 On ARM/ARM64/RISCV64, the return PC is shuffled directly into the link
register as part of the parallel move, eliminating the separate
load+move after the shuffle. ARM64E untagging still happens before the
SP adjustment.
4 The boxed callee (for indirect and direct tail calls) is now placed
into its frame slot as part of the parallel move. Since B3 can now
place patchpoint inputs in callee-save registers, the boxed callee
must be read before callee-save restores clobber it. The
boxedCalleeCallee Value* is passed through createTailCallPatchpoint
constrained to LateColdAny so prepareForTailCallImpl can include it
in the shuffle targeting CallFrameSlot::callee in the new frame.
5 V128 stack entries are split into two Width64 entries to satisfy the
algorithm's atomic-location invariant (a write to location X must only
conflict with reads from exactly X). FPR sources are pre-spilled to
scratch before splitting.
6 Width32 stack sources at non-8-byte-aligned offsets are pre-spilled
when they conflict with a destination. B3/Air can allocate 4-byte spill
slots at 4-byte alignment, so a Width32 source may sit in the upper
half of an 8-byte slot that a Width64 destination write covers.
7 The patchArgs parameter, wasmCallerInfoAsCallee parameter, and
tmpNeedsSaving machinery are removed since the parallel move handles
everything directly. The duplicate calleeCode patchpoint arg
(previously passed in the initial patchArgs at wasmScratchGPR0) is
also removed.
I went with splitting V128 stack arguments because I expect them to be
rare and we can't reasonably handle V128s as the atomic size. For 4-byte
values copied them since I expect 8-byte values to be common and
splitting them to significantly slow down the shuffle, so I went with
the spill approach.
I added a load/store/transferWidth set of instructions to MacroAssembler.
In the future we can adopt this in other dynamic width situations like
BBQ.
No new tests, covered by existing tests.
* Source/JavaScriptCore/assembler/MacroAssembler.h:
(JSC::MacroAssembler::loadWidth):
(JSC::MacroAssembler::storeWidth):
(JSC::MacroAssembler::transferWidth):
* Source/JavaScriptCore/wasm/WasmOMGIRGenerator.cpp:
(JSC::Wasm::OMGIRGenerator::emitIndirectCall):
(JSC::Wasm::prepareForTailCallImpl):
(JSC::Wasm::OMGIRGenerator::createTailCallPatchpoint):
(JSC::Wasm::OMGIRGenerator::emitDirectCall):
Canonical link: https://commits.webkit.org/312691@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications