Andy Trick and I had this conversation Thursday, and I thought I'd capture it
here.
The Problem
It's a longstanding complaint that SIL uses drastically different code patterns
for the same sequence of operations based on whether the types of the values
involved are loadable or "address-only". For example, suppose I have Swift
code like this:
let x, y : T
let z = x + y
If T is a loadable type, this will generate SIL somewhat like this:
// %x and %y are values of type $T
%lhs = copy_value %x
%rhs = copy_value %y
%operator = function_ref T.+
%result = apply %operator(%lhs, %rhs)
%z = %result
(copy_value doesn't currently exist, but it's easier to understand, and as it
happens we're thinking of adding it back.)
If T is an address-only type, this will generate SIL somewhat like this:
// %x and %y are values of type $*T
%z = alloc_stack $T
%lhs = alloc_stack $T
copy_addr %x to [initialization] %lhs
%rhs = alloc_stack $T
copy_addr %y to [initialization] %rhs
%operator = function_ref T.+
apply %operator(%z, %lhs, %rhs)
dealloc_stack %rhs
dealloc_stack %lhs
Notably, we're explicitly modeling the memory in which values are stored, which
is both very verbose and — more importantly — loses any interesting SSA
properties for tracking actual values around. And this has a bunch of
secondary effects where various high-level operations like dynamic casts end up
needing two different instructions based on whether the value is stored in
memory. This is pretty dumb, and it's a major contributor to the reality that
generic code is currently very poorly optimized.
It does, however, have some significant advantages: since the memory allocation
is explicit, it's quite natural to express optimizations that e.g. hoist or
sink those allocations, and the next level of lowering (IRGen) can be very
simple.
Addresses and address-only types
Swift is an imperative language, and imperative languages make formal memory
locations part of the high-level semantics. DI and SSA formation allow us to
eliminate formal memory locations for most local variables, but class
properties, global variables, escaped local variables, and pointers are all
fundamentally un-SSA-able. Therefore we will always have some concept of an
address.
But most values of address-only type aren't really being stored in that sort of
formal memory location; we're just representing them that way in SIL. Why do
we do this? What is it about a type that makes it address-only?
Well, some types are inherently "memory-pinned": something about their
representation only makes sense, or is only implementable, if the value is
always kept in memory:
- The representation of the value may involve interior pointers, as with
LLVM's SmallVector. This isn't currently a thing in Swift, but it's a
possibility someday with the import of non-POD C++ types.
- The address of the value may need to be registered elsewhere, as with weak
references.
- The value may allow internal mutation even from a shared reference, like a
C++ class with a mutable field or a Rust atomic type; you can see weak
references as analogous to this.
Such types are necessarily address-only at a low level.
Other types are address-only by abstraction: their representation isn't (fully)
known, and so they have to be kept in memory (1) in case that representation
includes another address-only value and (2) because there's no way to store a
unbounded amount of data except in memory anyway.
But this sense of address-only types that we're describing is really a property
of the final generated code. It's necessary for LLVM IR generation to know
about it, but it's not obviously necessary for SIL to know about it beyond the
implications of the inherently memory-pinned cases above:
- it is not generally safe to assume that the stored properties of a non-POD
C++ type remain invariant across moves
- weak references *do* represent the same value across moves, but of course
that value can change dynamically at any time anyway, per the rules of weak
references
- mutable fields can be modified even by a shared borrowed reference, and so
(if they are modeled at all in SIL at all, rather than just leaving the type
opaque) there must be some way to project a mutable address from a shared
borrow and so on.
Our current representation of address-only types arises directly from the
low-level operations that are required, which does admit some interesting
optimizations on its own, but the disadvantages of having to support wildly
divergent code paths and completely give up SSA use/def chains are crippling.
What would SIL look like if we abandoned this entirely?
All types as SIL scalars
The fundamental issue here is that IR-level lowering does need to place
memory-pinned values into memory. Suppose we take a value, use it as an enum
case payload, inject that enum into an Any, and pass that to a function. In a
future world where we consistently SSA all local values, this ideally looks
something like this:
// %value is a $T
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
apply %fn(%any)
If all of these types were loadable, lowering this to IR doesn't impose any
abstraction costs, because we can just manipulate it as a bit-pattern in
memory, which LLVM (really, any reasonable compiler infrastructure) should be
quite good at analyzing and optimizing. But if all of these types are
address-only, that's not obviously true. Let's look at the details of lowering
to memory.
Because all types are movable — even in a world with Rust-style ownership —
there's a fairly straightforward lowering that works for all possible
functions: every address-only SIL value gets its own allocation which is
deallocated along the dominance frontier of the value. Indirect arguments use
the passed-in buffer. Basic block arguments are copied from the allocation for
the branch argument value to the allocation for the corresponding destination
block parameter. Indirect return values are copied from the allocation for the
return value to the pointer passed in.
This admits some obvious optimizations. If "owned" SIL values are
pseudo-linear — i.e. along every path from their definition point, it is
statically evident that they are (optionally) borrowed multiple times, consumed
exactly once, and then never used again — then the copies can instead be moves.
Standard register-allocation algorithms can recognize when basic block
parameters can use the same location as their arguments. SIL only permits a
single return instruction, so there's no reason not to allocate returned values
directly into the return slot. These optimizations will tend to eliminate a
lot of moves.
However, there's another problem, which is injections. Consider the example
above. %any is an opaque existential, and initializing it formally involves
allocating a buffer within the existential and moving the argument into that
buffer. Ideally, we would allocate that buffer first and then simply allocate
%enum in-place into that buffer. In this simple example, that's easy. But if
the control flow were more complex, detecting that this is possible becomes
significantly more challenging, as does ensuring that the buffer is properly
cleaned up along all paths. For example, suppose that %value were computed by
calling a throwing function:
try_apply %someFunction() normal %cont, unwind %handler
cont(%value: $T):
%enum = enum #MyEnum.foo, %value : $T
%any = existential $Any, %enum
%fn = function_ref @bar
apply %fn(%any)
handler(%error: $Error):
throw $error
Naive allocation here is going to introduce a lot of moves. Optimally, we
would receive the return value from %someFunction directly in the payload of
%enum, which we want to build directly into the allocated existential buffer of
%any. But to do this, we actually need to allocate that existential buffer
before executing the try_apply; and if the try_apply throws, we need to
deallocate that existential buffer in the handler block. The need to
retroactively insert this kind of clean-up code adds a lot of complexity to
this allocation approach. Moreover, it's quite possible that complex
intermediate control — for example, if there's a loop somewhere between the
definition of a value and its consuming use — will tend to block this kind of
analysis and cause more unnecessary moves.
In contrast, the current SIL-generation scheme tends to be aware of at least
the immediate local uses of values and therefore emits a lot of these kinds of
initialization "in-place" already. But that said, it does proceed from a
simple recursive examination of the AST, which means it will miss examples that
are just slightly more opaque, like binding a return value as a let and only
then returning it. That kind of thing is currently left to the optimizer for
no real good reason.
Summary
Overall, I think representing local address-only values as SIL scalars with
full SSA use/def chains is really promising, and I do think we can write an
allocation pass that does an acceptable job eliminating unnecessary moves. In
order to actually do this, though, I think we need two things:
- We need SIL to be "pseudo-linear" as discussed above. We really don't want
the allocation pass to have to worry about keeping values alive past
consumptions, and thus potentially having to insert copies instead of moves.
- We need the allocation pass to handle exits before initialization (like
with try_apply) and other sorts of interfering control flow. It will not be
acceptable for this to be a block-local analysis with only a trivial amount of
cross-block argument merging.
John.
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev