Re: [swift-dev] Proposal: SIL Ownership Model + Verifier

John McCall via swift-dev Thu, 08 Dec 2016 14:01:53 -0800

> On Dec 8, 2016, at 1:53 PM, Andrew Trick <atr...@apple.com> wrote:
>> On Dec 7, 2016, at 11:25 PM, John McCall via swift-dev <swift-dev@swift.org 
>> <mailto:swift-dev@swift.org>> wrote:
>> 
>>> 
>>> On Dec 7, 2016, at 2:13 PM, Michael Gottesman via swift-dev 
>>> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:
>>> 
>>> This is a proposal for a new SIL Ownership Model and verifier. An online 
>>> HTML version of the document is available here:
>>> 
>>> https://gottesmm.github.io/proposals/sil-ownership-model.html 
>>> <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>> 
>>> and inline below.
>>> 
>>> Michael
>>> 
>>> ----
>>> 
>>> # Summary
>>> 
>>> This document defines a SIL ownership model and a compile time static 
>>> verifier
>>> for the model. This will allow for static compile time verification that a 
>>> SIL
>>> program satisfies all ownership model constraints.
>>> 
>>> The proposed SIL ownership model embeds ownership into SIL's SSA def-use
>>> edges. This is accomplished by:
>>> 
>>> 1. Formalizing into SIL API's the distinction in between defs
>>>    (e.g. `SILInstruction` and `SILArgument`) and the values produced by defs
>>>    (e.g. `SILValue`). This is needed to model values as having ownership and
>>>    defs as not having ownership. The main implication here is that the two 
>>> can
>>>    no longer be implicitly convertible and the API must enforce this.
> 
> This proposal is simultaneously proposing some specific API details
> while avoiding both the motivation behind it and the eventual features
> that it will support.
> 
> I think the ValueBase/Def renaming is a little
> misunderstanding. "ValueDef" in the proposal was a placeholder (not a
> perfect name) for some perceived need to have a common
> SILArgument/SILInstruction base class. That's probably not necessary
> in which case there's nothing to disagree about.
> 
> John's counter proposal solves that problem by repurposing ValueBase
> to serve as a base class for value definitions (and then goes into
> some detail on implementing multi-result instructions).
> 
>> We currently have two kinds of def: instructions and arguments.  Arguments
>> always define a single value, and I don't see anything in your proposal that
>> changes that.  And the idea that an instruction produces exactly one value is
>> already problematic, because many don't produce a meaningful value at all.
>> All that changes in your proposal is that certain instructions need to be 
>> able
>> to produce multiple values.
> 
> That’s an obvious way of looking at it. I personally am most
> interested in putting an end to the chronic problem of conflating
> instructions and values which literally makes it impossible to have a
> sane discussion about ownership. I insisted that SILInstruction not
> derive from ValueBase, which probably led to the proposed
> renaming. This probably comes across as confusing because the proposal
> is deliberately side-stepping the issue of multi-result instructions.
> 
> What I really want to see at the proposal level is agreement on core concepts:
> - A value has a type, ownership, one def, and 0..n uses
> - Ownership rules are guaranteed simply by evaluating those 0..n uses.
> - A value’s def/use is the point at which it is produced/consumed.
> - Use/def points are demarcated by instructions or block boundaries
> - An instruction produces values via results or consumes them via operands.
> 
> There shouldn't be any debate about these concepts, so why is there so
> much confusion? I think caused by some bad choices in the SIL
> interface, particularly as SILInstruction : ValueBase.
> 
> So we should define a SIL interface that reflects those core
> concepts. We should be able to reimplement multi-result instructions
> without affecting those interfaces and without rewritting SIL passes.


I agree.  I don't think this requires deep changes to how SIL nodes are 
typically
worked with.  As long as single-result instructions can be transparently used as
values, so that e.g. the result of SILBuilder::CreateBitCast can be used as a
SILValue, the number of changes required should be quite modest.

Things like SILCloner will of course need some minor updates to handle the
fact that a single instruction can have multiple uses, but that should be 
straightforward
to do on top of the existing Value->Value mapping.

John.

> 
> -Andy
> 
>> 
>> Moreover, the word "def" clearly suggests that it refers to the definition 
>> of a
>> value that can be used, and that's how the term is employed basically 
>> everywhere.
>> 
>> So allow me to suggest that a much clearer way of saying what you're trying 
>> to
>> say is that we need to distinguish between defs and instructions.  An 
>> instruction
>> may have an arbitrary number of defs, possibly zero, and each def is a value
>> that can be used.  (But the number of defs per instruction is known 
>> statically
>> for most instructions, which is something we can use to make working with
>> defs much less annoying.)
>> 
>> Also this stuff you're saying about values having ownership and defs not 
>> having
>> ownership is, let's say, misleading; it only works if you're using a 
>> meaninglessly
>> broad definition of ownership.  It would be better to simply say that the 
>> verification
>> we want to do for ownership strongly encourages us to allow instructions to 
>> introduce multiple defs whose properties can be verified independently.
>> 
>>> 2. Specifying a set of ownership kinds and specifying a method for mapping a
>>>    `SILValue` to an ownership kind.
>>> 3. Specifying ownership constraints on all `SILInstruction`s and 
>>> `SILArgument`s
>>>    that constrain what ownership kinds their operands and incoming values
>>>    respectively can possess.
>>> 4. Implementing a verifier to ensure that all `SILInstruction` and 
>>> `SILArgument`
>>>    are compatible with the ownership kind propagated by their operand
>>>    `SILValue`s and that pseudo-linear dataflow constraints are maintained.
>> 
>> I'll tackle these other sections in another email.  Let's go one at a time.
>> 
>>> # Cleanly separating Value and Def APIs in SIL
>>> 
>>> All values in SIL are defined via an assignment statement of the form: 
>>> `<foo> = <bar>`.
>>> In English, we say `foo` is a value that is defined by the def
>>> `bar`. Originally, these two concepts were distinctly represented by the 
>>> classes
>>> `SILValue` and `ValueBase`. All `ValueBase` defined a list of `SILValue` 
>>> that
>>> were related, but not equivalent to the defining `ValueBase`. With the 
>>> decision
>>> to represent multiple return values as instruction projections instead of 
>>> as a
>>> list of `SILValue`, this distinction in between a def and the values was 
>>> lost,
>>> resulting in `SILValue` being used interchangeably with `ValueBase` 
>>> throughout
>>> the swift codebase. This exposes a modeling issue when one attempts to add
>>> ownership to SIL, namely that values have ownership, while the defs that 
>>> define
>>> the values do not. This implies that defs and values *should not* be
>>> interchangeable.
>> 
>> Again, I do not understand what you're trying to say about ownership here.
>> Just drop it.
>> 
>>> In order to model that values, not defs, have ownership, we separate the
>>> `SILValue` and `ValueBase` APIs. This is done by:
>> 
>> Almost all of this is based on what I consider to be a really unfortunate use
>> of the term "def".
>> 
>> Allow me to suggest this:
>> 
>> 1. You do not need to rename ValueBase.  Let's stick with the term "Value"
>> instead of "Def" in the source code.
>> 
>> 2. There are only three subclasses of ValueBase for now:
>>   - SILArgument
>>   - SingleInstructionResult
>>   - MultiInstructionResult
>> This means that the Kind can probably be packed into the use-list pointer.
>> 
>> 3. SingleInstructionResult will only be used for instructions that are known 
>> to
>> produce exactly one value.  All such instructions can derive from a single
>> class, of which SingleInstructionResult can be a superclass.  This will make
>> it possible to construct a SILValue directly from such instructions, as well 
>> as
>> making it easy to support dyn_casting back to such instructions.
>> 
>> 4. MultiInstructionResult will have to store the result index as well as 
>> provide
>> some way to get back to the instruction.  I think the most efficient 
>> representation
>> here is to store an array of MultiInstructionResults immediately *before* the
>> address point of the instruction, in reverse order, so that (this + 
>> ResultIndex + 1)
>> gets you back to the instruction.  This also makes it possible to 
>> efficiently index
>> into the results without loading the number of results (except for the
>> bounds-checking assert, of course).
>> 
>> It will not be possible to construct a SILValue directly from one of these 
>> instructions;
>> you'll have to ask the instruction for its Nth result.
>> 
>> It's not clear to me what dyn_casting a SILValue back to a multi-result 
>> instruction
>> should do.  I guess it succeeds and then you just ask the instruction which
>> result you had.
>> 
>> 5. SILInstruction has an accessor to return what's basically an ArrayRef of
>> its instruction results.  It's not quite an ArrayRef because that would 
>> require
>> type-punning, but you can make it work as efficiently as one.
>> 
>> John.
>> 
>> 
>>> 
>>> 1. Renaming `ValueBase` to `ValueDef`. This makes it clear from a naming
>>> perspective that a `ValueDef` is not a value, but may define values.
>>> 2. Eliminate all operator methods on `SILValue` that allow one to work with 
>>> the
>>> `ValueDef` API via a `SILValue` in favor of an explicit method for getting 
>>> the
>>> internal `ValueDef` of a `SILValue`, i.e.:
>>> 
>>>         class SILValue {
>>>           ...
>>>           ValueDef *operator->() const; // deleted
>>>           ValueDef &operator*() const; // deleted
>>>           operator ValueDef *() const; // deleted
>>> 
>>>           bool operator==(ValueDef *RHS) const; // deleted
>>>           bool operator!=(ValueDef *RHS) const; // deleted
>>>           ...
>>>           ValueDef *SILValue::getDef() const; // new
>>>           ...
>>>         };
>>> 
>>> 3. Use private access control and friend classes to only allow for 
>>> `SILValue`s
>>>    to be constructed and vended by their defining `ValueDef` via a new 
>>> method on
>>>    `ValueDef`:
>>> 
>>>         class SILValue {
>>>           friend class SILUndef; // new
>>>           friend class SILArgument; // new
>>>           friend class SILInstruction; // new
>>>         public:
>>>           SILValue(ValueDef *); // deleted
>>>         private:
>>>           SILValue(ValueDef *); // new
>>>         };
>>> 
>>>         class ValueDef {
>>>           ...
>>>           SILValue getValue() const; // new
>>>           ...
>>>         };
>>> 
>>> To see how specific common code patterns change in light of these changes,
>>> please see the [appendix](#changes-to-silvalue-api-for-sil-ownership).
>>> 
>>> # Values and ValueOwnershipKinds
>>> 
>>> Define `ValueOwnershipKind` as the enum with the following cases:
>>> 
>>>     enum class ValueOwnershipKind {
>>>       Trivial,
>>>       Unowned,
>>>       Owned,
>>>       Guaranteed,
>>>     }
>>> 
>>> Our approach to mapping a `SILValue` to a `ValueOwnershipKind` is to use a
>>> `SILVisitor` called `ValueOwnershipKindVisitor`. This works well since if 
>>> one
>>> holds `ValueKind` constant, `SILValue` have well defined ownership
>>> constraints. Thus we can handle each case individually via the visitor. We 
>>> use
>>> SILNodes.def to ensure that all `ValueKind` have a defined visitor method. 
>>> This
>>> will ensure that when a new `ValueKind` is added, the compiler will emit a
>>> warning that the visitor must be updated, ensuring correctness.
>>> 
>>> The visitor will be hidden in a *.cpp file and will expose its output via a 
>>> new
>>> API on `SILValue` :
>>> 
>>>     ValueOwnershipKind SILValue::getOwnershipKind() const;
>>> 
>>> Since the implementation of `SILValue::getOwnershipKind()` will be out of 
>>> line,
>>> none of the visitor code will be exposed to the rest of the compiler.
>>> 
>>> # ValueDefs and Ownership Constraints
>>> 
>>> In order to determine if a `SILInstruction`'s operands are SILValue that 
>>> have
>>> compatible ownership with a `SILInstruction`, we introduce a new API on
>>> `SILInstruction` that returns the ownership constraint of the `i`th operand 
>>> of
>>> the `SILInstruction`:
>>> 
>>>     Optional<ValueOwnershipKind> 
>>> SILInstruction::getOwnershipConstraint(unsigned i) const;
>>> 
>>> Specifically, it returns `.Some(OwnershipKind)` if there is a constraint or
>>> `.None` if the `SILInstruction` will accept any `ValueOwnershipKind`
>>> (e.g. `copy_value`). This API is sufficient since there are no 
>>> `SILInstructions`
>>> that accept only a subset of `ValueOwnershipKind`, all either accept a 
>>> singular
>>> ownership kind or all ownership kinds.
>>> 
>>> For `SILArgument`, we determine compatible ownership by introducing the 
>>> concept
>>> of ownership conventions. The reason to introduce these ownership 
>>> conventions is
>>> that it ensures that we do not need to perform any form of dataflow to 
>>> determine
>>> the convention of a `SILArgument` in a loop. Additionally, it allows for 
>>> greater
>>> readability in the IR and prevents the optimizer from performing implicit
>>> conversions of ownership semantics on branch instructions, for instance,
>>> converting a `switch_enum` from consuming a value at +1 to using a borrowed 
>>> +0
>>> parameter. In terms of API, we introduce a new method on `SILArgument`:
>>> 
>>>     ValueOwnershipKind SILArgument::getOwnershipConstraint() const;
>>> 
>>> which returns the required ownership constraint. In terms of representing 
>>> these
>>> ownership constraint conventions in textual SIL, we print out the ownership
>>> constraint next to the specific argument in the block and the specific 
>>> argument
>>> in the given terminator. For details of how this will look with various
>>> terminators, see the 
>>> [appendix](#sil-argument-terminator-convention-examples).
>>> 
>>> # Verification of Ownership Semantics
>>> 
>>> Since our ownership model is based around SSA form, all verification of
>>> ownership only need consider an individual value (`SILValue`) and the uses 
>>> of
>>> the def (`SILInstruction`, `SILArgument`). Thus for each def/use-set, we:
>>> 
>>> 1. **Verify Use Semantics**: All uses must be compatible with the def's
>>>    `ValueOwnershipKind`.
>>> 2. **Dataflow Verification**: Given any path P in the program and a 
>>> `ValueDef`
>>>    `V` along that path:
>>>    a. There exists only one <a 
>>> href="#footnote-0-lifetime-ending-use">"lifetime ending"</a> use of `V` 
>>> along that path.
>>>    b. After the lifetime ending use of `V`, there are no non-lifetime ending
>>>       uses of `V` along `P`.
>>> 
>>> Since the dataflow verification requires knowledge of a subset of the uses, 
>>> we
>>> perform the use semantic verification first and then use the found uses for 
>>> the
>>> dataflow verification.
>>> 
>>> ## Use Verification
>>> 
>>> Just like with `SILValue`, individual `SILInstruction` kind's have 
>>> well-defined
>>> ownership semantics implying that we can use a `SILVisitor` approach here as
>>> well. Thus we define a `SILVisitor` called
>>> `OwnershipCompatibilityUseChecker`. This checker works by taking in a 
>>> `SILValue`
>>> and visiting all of the `SILInstruction` users of the `SILValue`. Each 
>>> visitor
>>> method returns a pair of booleans, the first stating whether or not the
>>> ownership values were compatible and the second stating whether or not this
>>> specific use should be considered a "lifetime ending" use. The checker then
>>> stores the lifetime ending uses and non-lifetime ending uses in two separate
>>> arrays for processing by the dataflow verifier.
>>> 
>>> ## Dataflow Verification
>>> 
>>> The dataflow verifier takes in as inputs the `SILValue` and lists of
>>> lifetime-ending and non-lifetime ending uses. Since we are using SSA form, 
>>> we
>>> already know that our def must dominate all of our uses implying that a use 
>>> can
>>> never overconsume due to a def not being along a path. On the other hand, we
>>> still must consider issues related to joint post-dominance namely:
>>> 
>>> 1. Double-Consume due to 2+ lifetime ending uses along a path. This implies
>>>    proving that no lifetime ending use is reachable from another lifetime 
>>> ending
>>>    use.
>>> 2. Use-After-Free due to a non-lifetime ending use not being joint-post
>>>    dominated by the lifetime ending uses. This implies proving that all
>>>    non-lifetime ending uses are joint post-dominated by the lifetime ending 
>>> use
>>>    set.
>>> 3. Lifetime-Leaks due to the non-lifetime ending uses not joint-post 
>>> dominating
>>>    the lifetime ending uses.
>>> 
>>> Note how we must consider joint post-dominance in two different places. 
>>> Thus we
>>> must be able to prove joint post-dominance quickly and cheaply. To do so, we
>>> take advantage of dominance implying that all uses are reachable from the 
>>> def to
>>> just needing to prove that when we walk the CFG up to prove reachability, 
>>> that
>>> any block on the reachability path does not have any successors that have 
>>> not
>>> been visited when we finish walking and visit the def.
>>> 
>>> The full code is in the [appendix](#full-dataflow-verification-algorithm).
>>> 
>>> <a id="footnote-0-lifetime-ending-use">[0]</a>: A use after which a def is 
>>> no
>>> longer allowed to be used in any way, e.g. `destroy_value`, `end_borrow`.
>>> 
>>> # Appendix
>>> 
>>> ## Changes to SILValue API for SIL Ownership
>>> 
>>> The common code pattern eliminated by removing implicit uses of 
>>> `ValueDef`'s API
>>> via a `SILValue` are as follows:
>>> 
>>>     SILValue V;
>>>     ValueDef *Def;
>>> 
>>>     if (V != Def) { ... }
>>>     if (V->getParentBlock()) { ... }
>>> 
>>> Our change, causes these code patterns to be rewritten in favor of the 
>>> explicit:
>>> 
>>>     SILValue V;
>>>     ValueDef *Def;
>>> 
>>>     if (V.getDef() != Def) { ... }
>>>     if (V.getDef()->getParentBlock()) { ... }
>>> 
>>> In cases like the above, we want the to increase clarity through the usage 
>>> of
>>> verbosity. In other cases, this verboseness makes convenient APIs more 
>>> difficult
>>> to use. The main example here are the `isa` and `dyn_cast` APIs. We 
>>> introduce
>>> two helper functions that give the same convenience as before but added 
>>> clarity by
>>> making it clear that we are not performing an isa query on the value, but
>>> instead the underlying def of the value:
>>> 
>>>     template <typename ParentTy>
>>>     bool def_isa(SILValue V) { return isa<ParentTy>(V.getDef()); }
>>>     template <typename ParentTy>
>>>     ParentTy *def_dyn_cast(SILValue V) { return 
>>> dyn_cast<ParentTy>(V.getDef()); }
>>> 
>>> Consider the following code using the old API,
>>> 
>>>     SILValue V;
>>> 
>>>     if (isa<ApplyInst>(V)) { ... }
>>>     if (auto *PAI = dyn_cast<PartialApplyInst>(V)) { ... }
>>> 
>>> Notice how it seems like one is casting the SILValue as if it is a
>>> ValueBase. This is due to the misleading usage of the implicit conversion 
>>> from a
>>> `SILValue` to the `SILValue`'s internal `ValueBase`. In comparison the new 
>>> API
>>> makes it absolutely clear that one is reasoning about the ValueKind of the
>>> underlying def of the `SILValue`:
>>> 
>>>     SILValue V;
>>> 
>>>     if (def_isa<ApplyInst>(V)) { ... }
>>>     if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>> 
>>> Thus the convenience of the old API is maintained, clarity is improved, and 
>>> the
>>> conceptual API boundary is enforced.
>>> 
>>> The final change that we propose is eliminating the ability to construct a
>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself. Allowing 
>>> this
>>> to occur violates the modeling notion that a `SILValue` is defined and thus 
>>> is
>>> dependent on the `ValueDef`. To implement this, we propose changing the
>>> constructor `SILValue::SILValue(ValueDef *)` to have private instead of 
>>> public
>>> access control and declaring `ValueDef` subclasses as friends of
>>> `SILValue`. This then allows the `ValueDef` to vend opaquely constructed
>>> `SILValue`, but disallows external users of the API from directly creating
>>> `SILValue` from `ValueDef`, enforcing the value/def distinction in our 
>>> model.
>>> 
>>> ## SILArgument/Terminator OwnershipConvention Examples
>>> 
>>> We present here several examples of how ownership conventions look on
>>> SILArguments and terminators.
>>> 
>>> First we present a simple branch and return example.
>>> 
>>>     class C { ... }
>>> 
>>>     sil @simple_branching : $@convention(thin) : @convention(thin) (@owned 
>>> Builtin.NativeObject, @guaranteed C) -> @owned C {
>>>     bb0(%0 : @owned $Builtin.NativeObject, %1 : @guaranteed $C):
>>>       br bb1(%0 : @owned $Builtin.NativeObject)
>>> 
>>>     bb1(%1 : @owned $Builtin.NativeObject):
>>>       strong_release %1 : $Builtin.NativeObject
>>>       %2 = copy_value %0 : $C
>>>       return %2 : @owned $C
>>>     }
>>> 
>>> Now consider this loop example:
>>> 
>>>     sil @loop : $@convention(thin) : $@convention(thin) (@owned 
>>> Optional<Builtin.NativeObject>) -> () {
>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>       br bb1(%0 : @owned $Builtin.NativeObject)
>>> 
>>>     bb1(%1 : @owned $Builtin.NativeObject):
>>>       %2 = alloc_object $C
>>>       strong_release %1 : $Builtin.NativeObject
>>>       %3 = unchecked_ref_cast %2 : $C to $Builtin.NativeObject
>>>       cond_br %cond, bb1(%3 : @owned Builtin.NativeObject), bb2
>>> 
>>>     bb2:
>>>       strong_release %3 : $Builtin.NativeObject
>>>       %result = tuple()
>>>       return %result : @trivial $()
>>>     }
>>> 
>>> Note how it is absolutely clear what convention is being used when passing a
>>> value to the phi node. No dataflow reasoning is required implying we can do 
>>> a
>>> simple pass over the CFG to prove correctness.
>>> 
>>> Now consider two examples of switches:
>>> 
>>>     sil @owned_switch_enum : $@convention(thin) : $@convention(thin) 
>>> (@owned Optional<Builtin.NativeObject>) -> () {
>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>       switch_enum %0 : @owned $Builtin.NativeObject, 
>>> #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>> 
>>>     bb1:
>>>       br bb3
>>> 
>>>     bb2(%1 : @owned $Builtin.NativeObject):
>>>       strong_release %1 : $Builtin.NativeObject
>>>       br bb3
>>> 
>>>     bb3:
>>>       %result = tuple()
>>>       return %result : @trivial $()
>>>     }
>>> 
>>>     sil @guaranted_converted_switch_enum : $@convention(thin) : 
>>> $@convention(thin) (@owned Optional<Builtin.NativeObject>) -> () {
>>>     bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>>       %1 = begin_borrow %0 : $Optional<Builtin.NativeObject>
>>>       switch_enum %1 : @guaranteed $Builtin.NativeObject, 
>>> #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>> 
>>>     bb1:
>>>       br bb3
>>> 
>>>     bb2(%2 : @guaranteed $Builtin.NativeObject):
>>>       %3 = function_ref @g_user : $@convention(thin) (@guaranteed 
>>> Builtin.NativeObject) -> ()
>>>       apply %3(%2) : $@convention(thin) (@guaranteed Builtin.NativeObject) 
>>> -> ()
>>>       br bb3
>>> 
>>>     bb3:
>>>       end_borrow %1 from %0 : $Optional<Builtin.NativeObject>, 
>>> $Optional<Builtin.NativeObject>
>>>       %result = tuple()
>>>       return %result : @trivial $()
>>>     }
>>> 
>>> Notice how the lifetime is completely explicit in both cases, so the 
>>> optimizer
>>> can not treat the conversion of switch_enum from +1 to +0 implicitly.
>>> 
>>> ## Full Dataflow Verification Algorithm
>>> 
>>> Define the following book keeping data structures.
>>> 
>>>     // The worklist that we will use for our iterative reachability query.
>>>     llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>> 
>>>     // The set of blocks with lifetime ending uses.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>> 
>>>     // The set of blocks with non-lifetime ending uses and the associated
>>>     // non-lifetime ending use SILInstruction.
>>>     llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8> 
>>> BlocksWithNonLifetimeEndingUses;
>>> 
>>>     // The blocks that we have already visited.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>> 
>>>     // A list of successor blocks that we must visit by the time the 
>>> algorithm terminates.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>> 
>>> Then for each non-lifetime ending use (found by the
>>> `OwnershipCompatibilityUseChecker`), we add the block and its instruction 
>>> to the
>>> `BlocksWithNonLifetimeEndingUses`. There is a possibility of having multiple
>>> non-lifetime ending uses in the same block, in such a case, we always take 
>>> the
>>> later value since we are performing a liveness-esque dataflow:
>>> 
>>>     for (SILInstruction *User : getNonLifetimeEndingUses()) {
>>>       // First try to associate User with User->getParent().
>>>       auto Result = 
>>> BlocksWithNonLifetimeEndingUses.insert({User->getParent(), User});
>>> 
>>>       // If the insertion succeeds, then we know that there is no more work 
>>> to
>>>       // be done, so process the next use.
>>>       if (Result.second)
>>>         continue;
>>> 
>>>       // If the insertion fails, then we have at least two non-lifetime 
>>> ending uses
>>>       // in the same block. Since we are performing a liveness type of 
>>> dataflow,
>>>       // we only need the last non-lifetime ending use to show that all 
>>> lifetime
>>>       // ending uses post dominate both. Thus, see if Use is after
>>>       // Result.first->second in the use list. If Use is not later, then we 
>>> wish
>>>       // to keep the already mapped value, not use, so continue.
>>>       if (std::find(Result.first->second->getIterator(), Block->end(), 
>>> User) ==
>>>             Block->end()) {
>>>         continue;
>>>       }
>>> 
>>>       // At this point, we know that Use is later in the Block than
>>>       // Result.first->second, so store Use instead.
>>>       Result.first->second = User;
>>>     }
>>> 
>>> Then we visit each each lifetime ending use and its parent block:
>>> 
>>>     for (SILInstruction *User : getLifetimeEndingUses()) {
>>>       SILBasicBlock *UserParentBlock = User->getParent();
>>>       ...
>>>     }
>>> 
>>> We begin by adding `UserParentBlock` to the `BlocksWithLifetimeEndingUses`
>>> set. If a block is already in the set (i.e. insert fails), then we know 
>>> that (1)
>>> has been violated and we error. If we never hit that condition, then we know
>>> that all lifetime ending uses are in different blocks.
>>> 
>>>     if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>>       double_consume_error(); // ERROR!
>>>     }
>>> 
>>> Then we check if we previously saw any non-lifetime ending uses in
>>> `UserParentBlock` by checking the map `BlocksWithNonLifetimeEndingUses`. If 
>>> we do
>>> find any such uses, we check if the lifetime ending use is earlier in the 
>>> block
>>> that the non-lifetime ending use. If so then (2) is violated and we
>>> error. Once we know that (2) has not been violated, we remove
>>> 
>>>     auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>>     if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>>       // Make sure that the non-lifetime ending use is before the lifetime
>>>       // ending use. Otherwise, we have a use after free.
>>>       if (std::find(User->getIterator(), UserParentBlock->end(), 
>>> Iter->second) !=
>>>             UserParentBlock->end()) {
>>>         use_after_free_error(); // ERROR!
>>>       }
>>> 
>>>       // Erase the use since we know that it is properly joint 
>>> post-dominated.
>>>       BlocksWithNonLifetimeEndingUses.erase(Iter);
>>>     }
>>> 
>>> Then we mark `UserParentBlock` as visited to ensure that we do not consider
>>> UserParentBlock to be a successor we must visit and prime the worklist with 
>>> the
>>> predecessor blocks of `UserParentBlock`.
>>> 
>>>     // Then mark UserParentBlock as visited add all predecessors of
>>>     // UserParentBlock to the worklist.
>>>     VisitedBlocks.insert(UserParentBlock);
>>>     copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>> 
>>> The entire block of code is provided in the appendix. Now we know that:
>>> 
>>> 1. All lifetime ending uses are in different basic blocks.
>>> 2. All non-lifetime ending uses are either in different basic blocks.
>>> 
>>> We begin our worklist algorithm by popping off the next element in the 
>>> worklist.
>>> 
>>>     // Until the worklist is empty.
>>>     while (!Worklist.empty()) {
>>>       // Grab the next block to visit.
>>>       const SILBasicBlock *BB = Worklist.pop_back_val();
>>>       ...
>>>     }
>>> 
>>> Since every time we insert objects into the worklist, we first try to 
>>> insert it
>>> into the `VisitedBlock` list, we do not need to check if `BB` has been 
>>> visited
>>> yet, since elements can not be added to the worklist twice.
>>> 
>>> Then make sure that BB is not in `BlocksWithLifetimeEndingUses`. If BB is in
>>> `BlocksWithLifetimeEndingUses`, then we know that there is a path going 
>>> through
>>> the def where the def is consumed twice, an error!
>>> 
>>>     if (BlocksWithLifetimeEndingUses.count(BB)) {
>>>       double_consume_error(); // ERROR!
>>>     }
>>> 
>>> Now that we know we are not double consuming, we need to update our data
>>> structures. First if BB is contained in SuccessorBlocksThatMustBeVisited, we
>>> remove it. This ensures when the algorithm terminates, we know that the path
>>> through the successor was visited as apart of our walk.
>>> 
>>>     SuccessorBlocksThatMustBeVisited.erase(BB);
>>> 
>>> Then if BB is in BlocksWithNonLifetimeEndingUses, we remove it. This ensures
>>> that when the algorithm terminates, we know that the non-lifetime ending 
>>> uses
>>> were properly joint post-dominated by the lifetime ending uses.
>>> 
>>>     BlocksWithNonLifetimeEndingUses.erase(BB);
>>> 
>>> Then add all successors of BB that we have not visited yet to the
>>> `SuccessorBlocksThatMustBeVisited` set. This ensures that if we do not 
>>> visit the
>>> successor as apart of our CFG walk, at the end of the algorithm we will 
>>> assert
>>> that there is a leak.
>>> 
>>>     for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>>       // If we already visited the successor, there is nothing to do since
>>>       // we already visited the successor.
>>>       if (VisitedBlocks.count(SuccBlock))
>>>         continue;
>>> 
>>>       // Otherwise, add the successor to our MustVisitBlocks set to ensure 
>>> that
>>>       // we assert if we do not visit it by the end of the algorithm.
>>>       SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>>     }
>>> 
>>> Finally add all predecessors of BB that we have not visited yet to the 
>>> Worklist
>>> for processing. We use insert here to ensure that we only ever add a
>>> VisitedBlock once to the Worklist:
>>> 
>>>     for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>>       if (!VisitedBlocks.insert(PredBlock).second)
>>>         continue;
>>>       Worklist.push_back(PredBlock);
>>>     }
>>> 
>>> This continues until we have exhausted the worklist. Once the worklist is
>>> exhausted, we know that:
>>> 
>>> 1. If `SuccessorBlocksThatMustBeVisited` is non-empty, then the Blocks in 
>>> the
>>>    set are not joint post-dominated by the set of lifetime ending users 
>>> implying
>>>    a leak.
>>> 2. If `BlockSwithNonLifetimeEndingUses` is non-empty, then there was a
>>>    non-lifetime ending use that was not joint post-dominated by the lifetime
>>>    ending use set. This implies a use-after free.
>>> 
>>> Thus we assert that both sets are empty and error accordingly.
>>> 
>>>     if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>>       leak_error(); // ERROR!
>>>     }
>>>     if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>>       use_after_free_error(); // ERROR!
>>>     }
>>> 
>>> The full code:
>>> 
>>>     // The worklist that we will use for our iterative reachability query.
>>>     llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>> 
>>>     // The set of blocks with lifetime ending uses.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>> 
>>>     // The set of blocks with non-lifetime ending uses and the associated
>>>     // non-lifetime ending use SILInstruction.
>>>     llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8> 
>>> BlocksWithNonLifetimeEndingUses;
>>> 
>>>     // The blocks that we have already visited.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>> 
>>>     // A list of successor blocks that we must visit by the time the 
>>> algorithm terminates.
>>>     llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>> 
>>>     for (SILInstruction *User : getLifetimeEndingUses()) {
>>>       SILBasicBlock *UserParentBlock = User->getParent();
>>>       if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>>         double_consume_error(); // ERROR!
>>>       }
>>> 
>>>       auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>>       if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>>         // TODO: Could this be cached.
>>>         if (std::find(User->getIterator(), UserParentBlock->end(), 
>>> Iter->second) !=
>>>               UserParentBlock->end()) {
>>>           use_after_free_error(); // ERROR!
>>>         }
>>>       }
>>> 
>>>       // Then add all predecessors of the user block to the worklist.
>>>       VisitedBlocks.insert(UserParentBlock);
>>>       copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>>     }
>>> 
>>>     // Until the worklist is empty.
>>>     while (!Worklist.empty()) {
>>>       // Grab the next block to visit.
>>>       const SILBasicBlock *BB = Worklist.pop_back_val();
>>> 
>>>       // Then check that BB is not a lifetime-ending use block. If it is 
>>> error,
>>>       // since we have an overconsume.
>>>       if (BlocksWithLifetimeEndingUses.count(BB)) {
>>>         double_consume_error(); // ERROR!
>>>       }
>>> 
>>>       // Ok, now we know that we are not double-consuming. Update our data
>>>       // structures.
>>> 
>>>       // First remove BB from the SuccessorBlocksThatMustBeVisited list. 
>>> This
>>>       // ensures that when the algorithm terminates, we know that BB was 
>>> not the
>>>       // beginning of a non-covered path to the exit.
>>>       SuccessorBlocksThatMustBeVisited.erase(BB);
>>> 
>>>       // Then remove BB from BlocksWithNonLifetimeEndingUses so we know that
>>>       // this block was properly joint post-dominated by our lifetime ending
>>>       // users.
>>>       BlocksWithNonLifetimeEndingUses.erase(BB);
>>> 
>>>       // Then add all successors of BB that we have not visited yet to the
>>>       // SuccessorBlocksThatMustBeVisited set.
>>>       for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>>         // If we already visited the successor, there is nothing to do since
>>>         // we already visited the successor.
>>>         if (VisitedBlocks.count(SuccBlock))
>>>           continue;
>>> 
>>>         // Otherwise, add the successor to our MustVisitBlocks set to 
>>> ensure that
>>>         // we assert if we do not visit it by the end of the algorithm.
>>>         SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>>       }
>>> 
>>>       // Finally add all predecessors of BB that have not been visited yet 
>>> to
>>>       // the worklist.
>>>       for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>>         // Try to insert the PredBlock into the VisitedBlocks list to ensure
>>>         // that we only ever add a block once to the worklist.
>>>         if (!VisitedBlocks.insert(PredBlock).second)
>>>           continue;
>>>         Worklist.push_back(PredBlock);
>>>       }
>>>     }
>>> 
>>>     if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>>       leak_error(); // ERROR!
>>>     }
>>>     if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>>       use_after_free_error(); // ERROR!
>>>     }
>>> 
>>> ----
>>> 
>>>> On Dec 5, 2016, at 6:47 PM, Michael Gottesman <mgottes...@apple.com 
>>>> <mailto:mgottes...@apple.com>> wrote:
>>>> 
>>>>> 
>>>>> On Dec 5, 2016, at 5:26 PM, Andrew Trick <atr...@apple.com 
>>>>> <mailto:atr...@apple.com>> wrote:
>>>>> 
>>>>> 
>>>>>> On Dec 5, 2016, at 3:55 PM, Michael Gottesman <mgottes...@apple.com 
>>>>>> <mailto:mgottes...@apple.com>> wrote:
>>>>>> 
>>>>>> Hello everyone.
>>>>>> 
>>>>>> This is an initial draft of a proposal for the SIL Ownership Model + 
>>>>>> Verifier. I am mainly looking for any high level concerns. If there are 
>>>>>> none, then I am fine moving this straight to swift-dev.
>>>>>> 
>>>>>> It is online here:
>>>>>> 
>>>>>> https://gottesmm.github.io/proposals/sil-ownership-model.html 
>>>>>> <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>>>>> 
>>>>>> Michael
>>>>> 
>>>>> 
>>>>> Thanks! It looks good.
>>>>> 
>>>>> > `ValueDef` is not a value, but the def of a value.
>>>>> 
>>>>> Should be "but may define (or produce) values.".
>>>> 
>>>> Sure.
>>>> 
>>>>> 
>>>>> > Since our ownership model is based around SSA form
>>>>> 
>>>>> There's an in-memory aspect as well. Ownership of local memory should be 
>>>>> statically verified. We at least need a TBD section for this.
>>>> 
>>>> Ok.
>>>> 
>>>>> 
>>>>> > This checker works by taking in a
>>>>> `ValueDef` and visiting all of the `SILInstruction` users of the
>>>>> `ValueDef`.
>>>>> 
>>>>> This line does not compute for me. A ValueDef does not have ownership. 
>>>>> The verifier is simply checking constraints for each SILValue against all 
>>>>> of its uses. At the verifier's level of abstraction, it doesn't matter 
>>>>> who defined that value.
>>>> 
>>>> That is a typo. If you look earlier I was using explicitly SILValue.
>>>> 
>>>>> 
>>>>> > The dataflow verifier takes in as inputs the `ValueDef` (i.e. def) and 
>>>>> > lists of lifetime-ending and non-lifetime ending uses.
>>>>> 
>>>>> Again, the verifier would not care about the uses of the ValueDef, it 
>>>>> cares about uses of the SILValue. I think this is a misleading way to 
>>>>> formulate the problem and explain the algorithm.
>>>> 
>>>> That is a typo from an earlier version of the algorithm. Everything should 
>>>> be about the SILValue.
>>>> 
>>>>> 
>>>>> >  if (V.getDef()->getParentBlock()) { ... }
>>>>> 
>>>>> I see no reason we can't add getParentBlock() and other very common 
>>>>> idioms to the SILValue API.
>>>> 
>>>> The question is how much should we make that a def only API.
>>>> 
>>>>> 
>>>>> > if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>>>> 
>>>>> It's not obvious to be that this is more readable than 
>>>>> 
>>>>> if (auto *PAI = dyn_cast<PartialApplyInst>(V.getDef())) { ... }
>>>>> 
>>>>> But I don't have a strong opinion on it.
>>>> 
>>>> it is just a little shorter. I am also not wedded to this. I was just 
>>>> trying to create a vision of what is possible to stimulate discussion.
>>>> 
>>>>> 
>>>>> > The final change that we propose is eliminating the ability to 
>>>>> > construct a
>>>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself.
>>>>> 
>>>>> Nice!
>>>>> 
>>>>> I didn't reread the "Full Dataflow Verification Algorithm" because I 
>>>>> don't think it's changed much. I'll review it again after the final 
>>>>> proposal.
>>>> 
>>>> It is exactly the same as the version I showed you before, except I 
>>>> followed your suggestion about putting the visit before things are added 
>>>> to the worklist rather than after it is popped off the worklist.
>>>> 
>>>> Let me fix this real quick.
>>>> 
>>>> Michael
>>>> 
>>>>> 
>>>>> -Andy
>>> 
>>> 
>>> _______________________________________________
>>> swift-dev mailing list
>>> swift-dev@swift.org <mailto:swift-dev@swift.org>
>>> https://lists.swift.org/mailman/listinfo/swift-dev 
>>> <https://lists.swift.org/mailman/listinfo/swift-dev>
>> 
>> _______________________________________________
>> swift-dev mailing list
>> swift-dev@swift.org <mailto:swift-dev@swift.org>
>> https://lists.swift.org/mailman/listinfo/swift-dev 
>> <https://lists.swift.org/mailman/listinfo/swift-dev>

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Re: [swift-dev] Proposal: SIL Ownership Model + Verifier

Reply via email to