> On Oct 3, 2016, at 9:22 AM, Joe Groff <jgr...@apple.com> wrote:
> 
> I feel like moving in this direction is the right thing to do. Some random 
> comments below:
> 
>> On Oct 1, 2016, at 1:32 AM, John McCall via swift-dev <swift-dev@swift.org> 
>> wrote:
>> 
>> Andy Trick and I had this conversation Thursday, and I thought I'd capture 
>> it here.
>> 
>> The Problem
>> 
>> It's a longstanding complaint that SIL uses drastically different code 
>> patterns for the same sequence of operations based on whether the types of 
>> the values involved are loadable or "address-only".  For example, suppose I 
>> have Swift code like this:
>>  let x, y : T
>>  let z = x + y
>> 
>> If T is a loadable type, this will generate SIL somewhat like this:
>> // %x and %y are values of type $T
>> %lhs = copy_value %x
>> %rhs = copy_value %y
>> %operator = function_ref T.+
>> %result = apply %operator(%lhs, %rhs)
>> %z = %result
>> 
>> (copy_value doesn't currently exist, but it's easier to understand, and as 
>> it happens we're thinking of adding it back.)
>> 
>> If T is an address-only type, this will generate SIL somewhat like this:
>>  // %x and %y are values of type $*T
>>  %z = alloc_stack $T
>>  %lhs = alloc_stack $T
>>  copy_addr %x to [initialization] %lhs
>>  %rhs = alloc_stack $T
>>  copy_addr %y to [initialization] %rhs
>>  %operator = function_ref T.+
>>  apply %operator(%z, %lhs, %rhs)
>>  dealloc_stack %rhs
>>  dealloc_stack %lhs
>> 
>> Notably, we're explicitly modeling the memory in which values are stored, 
>> which is both very verbose and — more importantly — loses any interesting 
>> SSA properties for tracking actual values around.  And this has a bunch of 
>> secondary effects where various high-level operations like dynamic casts end 
>> up needing two different instructions based on whether the value is stored 
>> in memory.  This is pretty dumb, and it's a major contributor to the reality 
>> that generic code is currently very poorly optimized.
>> 
>> It does, however, have some significant advantages: since the memory 
>> allocation is explicit, it's quite natural to express optimizations that 
>> e.g. hoist or sink those allocations, and the next level of lowering (IRGen) 
>> can be very simple.
>> 
>> Addresses and address-only types
>> 
>> Swift is an imperative language, and imperative languages make formal memory 
>> locations part of the high-level semantics.  DI and SSA formation allow us 
>> to eliminate formal memory locations for most local variables, but class 
>> properties, global variables, escaped local variables, and pointers are all 
>> fundamentally un-SSA-able.  Therefore we will always have some concept of an 
>> address.
>> 
>> But most values of address-only type aren't really being stored in that sort 
>> of formal memory location; we're just representing them that way in SIL.  
>> Why do we do this?  What is it about a type that makes it address-only?
>> 
>> Well, some types are inherently "memory-pinned": something about their 
>> representation only makes sense, or is only implementable, if the value is 
>> always kept in memory:
>>  - The representation of the value may involve interior pointers, as with 
>> LLVM's SmallVector.  This isn't currently a thing in Swift, but it's a 
>> possibility someday with the import of non-POD C++ types.
>>  - The address of the value may need to be registered elsewhere, as with 
>> weak references.
>>  - The value may allow internal mutation even from a shared reference, like 
>> a C++ class with a mutable field or a Rust atomic type; you can see weak 
>> references as analogous to this.
>> Such types are necessarily address-only at a low level.
>> 
>> Other types are address-only by abstraction: their representation isn't 
>> (fully) known, and so they have to be kept in memory (1) in case that 
>> representation includes another address-only value and (2) because there's 
>> no way to store a unbounded amount of data except in memory anyway.
>> 
>> But this sense of address-only types that we're describing is really a 
>> property of the final generated code.  It's necessary for LLVM IR generation 
>> to know about it, but it's not obviously necessary for SIL to know about it 
>> beyond the implications of the inherently memory-pinned cases above:
>>  - it is not generally safe to assume that the stored properties of a 
>> non-POD C++ type remain invariant across moves
> 
> Even if we want to accommodate ill-behaved C++ value types in the future, it 
> seems to me that we could avoid penalizing the common case by making 
> "WellBehavedValueSemantics" another opt-out type constraint, in the vein of 
> "Copyable" in the move-only types model. If we do want to accommodate C++ 
> types with semantically loaded move or copy operations in Swift, we'll 
> probably want to make more C++-like guarantees about exactly when move, copy, 
> and assign operations are performed on values of those types, but I don't 
> think we want generic code to be subject to those constraints by default. If 
> an address-only SIL representation ends up being fundamentally necessary to 
> semantically model ill-behaved C++ types, we'd also want to avoid having to 
> emit the SIL for all generics that way.

I completely agree that we'll need to ask non-value-semantics C++ types to opt 
out explicitly in some way.  That wasn't what I had in mind here, though.

My concern is about types that provide a consistent view of their abstract 
"value" but implement that in ways that change storage; as one example, a 
pre-C++11 type for which moves are actually implemented as copies.  We would 
not want to e.g. assume that we can read a property, move the value, and then 
rely on the old load still being valid.

Generic code should be fine regardless, since the optimizer can't make 
assumptions about abstractly-implemented properties.

>>  - weak references *do* represent the same value across moves, but of course 
>> that value can change dynamically at any time anyway, per the rules of weak 
>> references
>>  - mutable fields can be modified even by a shared borrowed reference, and 
>> so (if they are modeled at all in SIL at all, rather than just leaving the 
>> type opaque) there must be some way to project a mutable address from a 
>> shared borrow and so on.
> 
> We had briefly discussed the possibility of giving "owned" and "borrowed" 
> variants of a type different low-level representations. There are obvious 
> complexities involved in making this a user-facing feature, but it might 
> suffice to have an abstraction model that accommodates only two 
> possibilities—either the owned and borrowed representations are the same, as 
> for most POD or refcounted types, or the borrowed representation is a pointer 
> to the owned representation for types with sharably mutable fields, such as 
> weak references, atomics, mutexes, and so on. We could distinguish 
> owned/borrowed in the SIL type system to make this possible.

Yeah, this is something I'm still tossing around in my mind.  The problem is 
tuples; is a borrowed tuple a tuple of borrows or just that same rule applied 
to the aggregate?  The latter doesn't handle any sort of translation we might 
want to do.

>> Our current representation of address-only types arises directly from the 
>> low-level operations that are required, which does admit some interesting 
>> optimizations on its own, but the disadvantages of having to support wildly 
>> divergent code paths and completely give up SSA use/def chains are 
>> crippling.  What would SIL look like if we abandoned this entirely?
>> 
>> All types as SIL scalars
>> 
>> The fundamental issue here is that IR-level lowering does need to place 
>> memory-pinned values into memory.  Suppose we take a value, use it as an 
>> enum case payload, inject that enum into an Any, and pass that to a 
>> function.  In a future world where we consistently SSA all local values, 
>> this ideally looks something like this:
>> 
>> // %value is a $T
>> %enum = enum #MyEnum.foo, %value : $T
>> %any = existential $Any, %enum
>> %fn = function_ref @bar
>>  apply %fn(%any)
>> 
>> If all of these types were loadable, lowering this to IR doesn't impose any 
>> abstraction costs, because we can just manipulate it as a bit-pattern in 
>> memory, which LLVM (really, any reasonable compiler infrastructure) should 
>> be quite good at analyzing and optimizing.  But if all of these types are 
>> address-only, that's not obviously true.  Let's look at the details of 
>> lowering to memory.
>> 
>> Because all types are movable — even in a world with Rust-style ownership — 
>> there's a fairly straightforward lowering that works for all possible 
>> functions: every address-only SIL value gets its own allocation which is 
>> deallocated along the dominance frontier of the value.  Indirect arguments 
>> use the passed-in buffer.  Basic block arguments are copied from the 
>> allocation for the branch argument value to the allocation for the 
>> corresponding destination block parameter.  Indirect return values are 
>> copied from the allocation for the return value to the pointer passed in.
>> 
>> This admits some obvious optimizations.  If "owned" SIL values are 
>> pseudo-linear — i.e. along every path from their definition point, it is 
>> statically evident that they are (optionally) borrowed multiple times, 
>> consumed exactly once, and then never used again — then the copies can 
>> instead be moves.  Standard register-allocation algorithms can recognize 
>> when basic block parameters can use the same location as their arguments.  
>> SIL only permits a single return instruction, so there's no reason not to 
>> allocate returned values directly into the return slot.  These optimizations 
>> will tend to eliminate a lot of moves.
>> 
>> However, there's another problem, which is injections.  Consider the example 
>> above.  %any is an opaque existential, and initializing it formally involves 
>> allocating a buffer within the existential and moving the argument into that 
>> buffer.  Ideally, we would allocate that buffer first and then simply 
>> allocate %enum in-place into that buffer.  In this simple example, that's 
>> easy.  But if the control flow were more complex, detecting that this is 
>> possible becomes significantly more challenging, as does ensuring that the 
>> buffer is properly cleaned up along all paths.  For example, suppose that 
>> %value were computed by calling a throwing function:
>> 
>>  try_apply %someFunction() normal %cont, unwind %handler
>> cont(%value: $T):
>>  %enum = enum #MyEnum.foo, %value : $T
>>  %any = existential $Any, %enum
>>  %fn = function_ref @bar
>>  apply %fn(%any)
>> handler(%error: $Error):
>>  throw $error
>> 
>> Naive allocation here is going to introduce a lot of moves.  Optimally, we 
>> would receive the return value from %someFunction directly in the payload of 
>> %enum, which we want to build directly into the allocated existential buffer 
>> of %any.  But to do this, we actually need to allocate that existential 
>> buffer before executing the try_apply; and if the try_apply throws, we need 
>> to deallocate that existential buffer in the handler block.  The need to 
>> retroactively insert this kind of clean-up code adds a lot of complexity to 
>> this allocation approach.  Moreover, it's quite possible that complex 
>> intermediate control — for example, if there's a loop somewhere between the 
>> definition of a value and its consuming use — will tend to block this kind 
>> of analysis and cause more unnecessary moves.
> 
> I think we can do a reasonable job of avoiding unnecessary moves in most 
> injection cases. Perhaps we could keep the address-oriented SIL instructions 
> around so that this late allocation pass can still be done in SIL (and 
> thereby be subject to the correctness analyses we want to implement) instead 
> of needing to be completely done on-the-fly in IRGen.

Yeah, this is probably the right thing to do.  We would probably want to 
introduce a new phase so that ordinary optimizations can assume away the 
address-oriented SIL instructions; it's not clear to me that they work well 
with a stricter, higher-level SIL, since otherwise I think we can mostly define 
away uninitialized memory.

John.

> 
> -Joe
> 
>> In contrast, the current SIL-generation scheme tends to be aware of at least 
>> the immediate local uses of values and therefore emits a lot of these kinds 
>> of initialization "in-place" already.  But that said, it does proceed from a 
>> simple recursive examination of the AST, which means it will miss examples 
>> that are just slightly more opaque, like binding a return value as a let and 
>> only then returning it.  That kind of thing is currently left to the 
>> optimizer for no real good reason.
>> 
>> Summary
>> 
>> Overall, I think representing local address-only values as SIL scalars with 
>> full SSA use/def chains is really promising, and I do think we can write an 
>> allocation pass that does an acceptable job eliminating unnecessary moves.  
>> In order to actually do this, though, I think we need two things:
>>  - We need SIL to be "pseudo-linear" as discussed above.  We really don't 
>> want the allocation pass to have to worry about keeping values alive past 
>> consumptions, and thus potentially having to insert copies instead of moves.
>>  - We need the allocation pass to handle exits before initialization (like 
>> with try_apply) and other sorts of interfering control flow.  It will not be 
>> acceptable for this to be a block-local analysis with only a trivial amount 
>> of cross-block argument merging.
>> 
>> John.
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev@swift.org
>> https://lists.swift.org/mailman/listinfo/swift-dev
> 

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to