> On Dec 8, 2016, at 1:53 PM, Andrew Trick <[email protected]> wrote:
>> On Dec 7, 2016, at 11:25 PM, John McCall via swift-dev <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>>
>>> On Dec 7, 2016, at 2:13 PM, Michael Gottesman via swift-dev
>>> <[email protected] <mailto:[email protected]>> wrote:
>>>
>>> This is a proposal for a new SIL Ownership Model and verifier. An online
>>> HTML version of the document is available here:
>>>
>>> https://gottesmm.github.io/proposals/sil-ownership-model.html
>>> <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>>
>>> and inline below.
>>>
>>> Michael
>>>
>>> ----
>>>
>>> # Summary
>>>
>>> This document defines a SIL ownership model and a compile time static
>>> verifier
>>> for the model. This will allow for static compile time verification that a
>>> SIL
>>> program satisfies all ownership model constraints.
>>>
>>> The proposed SIL ownership model embeds ownership into SIL's SSA def-use
>>> edges. This is accomplished by:
>>>
>>> 1. Formalizing into SIL API's the distinction in between defs
>>> (e.g. `SILInstruction` and `SILArgument`) and the values produced by defs
>>> (e.g. `SILValue`). This is needed to model values as having ownership and
>>> defs as not having ownership. The main implication here is that the two
>>> can
>>> no longer be implicitly convertible and the API must enforce this.
>
> This proposal is simultaneously proposing some specific API details
> while avoiding both the motivation behind it and the eventual features
> that it will support.
>
> I think the ValueBase/Def renaming is a little
> misunderstanding. "ValueDef" in the proposal was a placeholder (not a
> perfect name) for some perceived need to have a common
> SILArgument/SILInstruction base class. That's probably not necessary
> in which case there's nothing to disagree about.
>
> John's counter proposal solves that problem by repurposing ValueBase
> to serve as a base class for value definitions (and then goes into
> some detail on implementing multi-result instructions).
>
>> We currently have two kinds of def: instructions and arguments. Arguments
>> always define a single value, and I don't see anything in your proposal that
>> changes that. And the idea that an instruction produces exactly one value is
>> already problematic, because many don't produce a meaningful value at all.
>> All that changes in your proposal is that certain instructions need to be
>> able
>> to produce multiple values.
>
> That’s an obvious way of looking at it. I personally am most
> interested in putting an end to the chronic problem of conflating
> instructions and values which literally makes it impossible to have a
> sane discussion about ownership. I insisted that SILInstruction not
> derive from ValueBase, which probably led to the proposed
> renaming. This probably comes across as confusing because the proposal
> is deliberately side-stepping the issue of multi-result instructions.
>
> What I really want to see at the proposal level is agreement on core concepts:
> - A value has a type, ownership, one def, and 0..n uses
> - Ownership rules are guaranteed simply by evaluating those 0..n uses.
> - A value’s def/use is the point at which it is produced/consumed.
> - Use/def points are demarcated by instructions or block boundaries
> - An instruction produces values via results or consumes them via operands.
>
> There shouldn't be any debate about these concepts, so why is there so
> much confusion? I think caused by some bad choices in the SIL
> interface, particularly as SILInstruction : ValueBase.
>
> So we should define a SIL interface that reflects those core
> concepts. We should be able to reimplement multi-result instructions
> without affecting those interfaces and without rewritting SIL passes.
I agree. I don't think this requires deep changes to how SIL nodes are
typically
worked with. As long as single-result instructions can be transparently used as
values, so that e.g. the result of SILBuilder::CreateBitCast can be used as a
SILValue, the number of changes required should be quite modest.
Things like SILCloner will of course need some minor updates to handle the
fact that a single instruction can have multiple uses, but that should be
straightforward
to do on top of the existing Value->Value mapping.
John.
>
> -Andy
>
>>
>> Moreover, the word "def" clearly suggests that it refers to the definition
>> of a
>> value that can be used, and that's how the term is employed basically
>> everywhere.
>>
>> So allow me to suggest that a much clearer way of saying what you're trying
>> to
>> say is that we need to distinguish between defs and instructions. An
>> instruction
>> may have an arbitrary number of defs, possibly zero, and each def is a value
>> that can be used. (But the number of defs per instruction is known
>> statically
>> for most instructions, which is something we can use to make working with
>> defs much less annoying.)
>>
>> Also this stuff you're saying about values having ownership and defs not
>> having
>> ownership is, let's say, misleading; it only works if you're using a
>> meaninglessly
>> broad definition of ownership. It would be better to simply say that the
>> verification
>> we want to do for ownership strongly encourages us to allow instructions to
>> introduce multiple defs whose properties can be verified independently.
>>
>>> 2. Specifying a set of ownership kinds and specifying a method for mapping a
>>> `SILValue` to an ownership kind.
>>> 3. Specifying ownership constraints on all `SILInstruction`s and
>>> `SILArgument`s
>>> that constrain what ownership kinds their operands and incoming values
>>> respectively can possess.
>>> 4. Implementing a verifier to ensure that all `SILInstruction` and
>>> `SILArgument`
>>> are compatible with the ownership kind propagated by their operand
>>> `SILValue`s and that pseudo-linear dataflow constraints are maintained.
>>
>> I'll tackle these other sections in another email. Let's go one at a time.
>>
>>> # Cleanly separating Value and Def APIs in SIL
>>>
>>> All values in SIL are defined via an assignment statement of the form:
>>> `<foo> = <bar>`.
>>> In English, we say `foo` is a value that is defined by the def
>>> `bar`. Originally, these two concepts were distinctly represented by the
>>> classes
>>> `SILValue` and `ValueBase`. All `ValueBase` defined a list of `SILValue`
>>> that
>>> were related, but not equivalent to the defining `ValueBase`. With the
>>> decision
>>> to represent multiple return values as instruction projections instead of
>>> as a
>>> list of `SILValue`, this distinction in between a def and the values was
>>> lost,
>>> resulting in `SILValue` being used interchangeably with `ValueBase`
>>> throughout
>>> the swift codebase. This exposes a modeling issue when one attempts to add
>>> ownership to SIL, namely that values have ownership, while the defs that
>>> define
>>> the values do not. This implies that defs and values *should not* be
>>> interchangeable.
>>
>> Again, I do not understand what you're trying to say about ownership here.
>> Just drop it.
>>
>>> In order to model that values, not defs, have ownership, we separate the
>>> `SILValue` and `ValueBase` APIs. This is done by:
>>
>> Almost all of this is based on what I consider to be a really unfortunate use
>> of the term "def".
>>
>> Allow me to suggest this:
>>
>> 1. You do not need to rename ValueBase. Let's stick with the term "Value"
>> instead of "Def" in the source code.
>>
>> 2. There are only three subclasses of ValueBase for now:
>> - SILArgument
>> - SingleInstructionResult
>> - MultiInstructionResult
>> This means that the Kind can probably be packed into the use-list pointer.
>>
>> 3. SingleInstructionResult will only be used for instructions that are known
>> to
>> produce exactly one value. All such instructions can derive from a single
>> class, of which SingleInstructionResult can be a superclass. This will make
>> it possible to construct a SILValue directly from such instructions, as well
>> as
>> making it easy to support dyn_casting back to such instructions.
>>
>> 4. MultiInstructionResult will have to store the result index as well as
>> provide
>> some way to get back to the instruction. I think the most efficient
>> representation
>> here is to store an array of MultiInstructionResults immediately *before* the
>> address point of the instruction, in reverse order, so that (this +
>> ResultIndex + 1)
>> gets you back to the instruction. This also makes it possible to
>> efficiently index
>> into the results without loading the number of results (except for the
>> bounds-checking assert, of course).
>>
>> It will not be possible to construct a SILValue directly from one of these
>> instructions;
>> you'll have to ask the instruction for its Nth result.
>>
>> It's not clear to me what dyn_casting a SILValue back to a multi-result
>> instruction
>> should do. I guess it succeeds and then you just ask the instruction which
>> result you had.
>>
>> 5. SILInstruction has an accessor to return what's basically an ArrayRef of
>> its instruction results. It's not quite an ArrayRef because that would
>> require
>> type-punning, but you can make it work as efficiently as one.
>>
>> John.
>>
>>
>>>
>>> 1. Renaming `ValueBase` to `ValueDef`. This makes it clear from a naming
>>> perspective that a `ValueDef` is not a value, but may define values.
>>> 2. Eliminate all operator methods on `SILValue` that allow one to work with
>>> the
>>> `ValueDef` API via a `SILValue` in favor of an explicit method for getting
>>> the
>>> internal `ValueDef` of a `SILValue`, i.e.:
>>>
>>> class SILValue {
>>> ...
>>> ValueDef *operator->() const; // deleted
>>> ValueDef &operator*() const; // deleted
>>> operator ValueDef *() const; // deleted
>>>
>>> bool operator==(ValueDef *RHS) const; // deleted
>>> bool operator!=(ValueDef *RHS) const; // deleted
>>> ...
>>> ValueDef *SILValue::getDef() const; // new
>>> ...
>>> };
>>>
>>> 3. Use private access control and friend classes to only allow for
>>> `SILValue`s
>>> to be constructed and vended by their defining `ValueDef` via a new
>>> method on
>>> `ValueDef`:
>>>
>>> class SILValue {
>>> friend class SILUndef; // new
>>> friend class SILArgument; // new
>>> friend class SILInstruction; // new
>>> public:
>>> SILValue(ValueDef *); // deleted
>>> private:
>>> SILValue(ValueDef *); // new
>>> };
>>>
>>> class ValueDef {
>>> ...
>>> SILValue getValue() const; // new
>>> ...
>>> };
>>>
>>> To see how specific common code patterns change in light of these changes,
>>> please see the [appendix](#changes-to-silvalue-api-for-sil-ownership).
>>>
>>> # Values and ValueOwnershipKinds
>>>
>>> Define `ValueOwnershipKind` as the enum with the following cases:
>>>
>>> enum class ValueOwnershipKind {
>>> Trivial,
>>> Unowned,
>>> Owned,
>>> Guaranteed,
>>> }
>>>
>>> Our approach to mapping a `SILValue` to a `ValueOwnershipKind` is to use a
>>> `SILVisitor` called `ValueOwnershipKindVisitor`. This works well since if
>>> one
>>> holds `ValueKind` constant, `SILValue` have well defined ownership
>>> constraints. Thus we can handle each case individually via the visitor. We
>>> use
>>> SILNodes.def to ensure that all `ValueKind` have a defined visitor method.
>>> This
>>> will ensure that when a new `ValueKind` is added, the compiler will emit a
>>> warning that the visitor must be updated, ensuring correctness.
>>>
>>> The visitor will be hidden in a *.cpp file and will expose its output via a
>>> new
>>> API on `SILValue` :
>>>
>>> ValueOwnershipKind SILValue::getOwnershipKind() const;
>>>
>>> Since the implementation of `SILValue::getOwnershipKind()` will be out of
>>> line,
>>> none of the visitor code will be exposed to the rest of the compiler.
>>>
>>> # ValueDefs and Ownership Constraints
>>>
>>> In order to determine if a `SILInstruction`'s operands are SILValue that
>>> have
>>> compatible ownership with a `SILInstruction`, we introduce a new API on
>>> `SILInstruction` that returns the ownership constraint of the `i`th operand
>>> of
>>> the `SILInstruction`:
>>>
>>> Optional<ValueOwnershipKind>
>>> SILInstruction::getOwnershipConstraint(unsigned i) const;
>>>
>>> Specifically, it returns `.Some(OwnershipKind)` if there is a constraint or
>>> `.None` if the `SILInstruction` will accept any `ValueOwnershipKind`
>>> (e.g. `copy_value`). This API is sufficient since there are no
>>> `SILInstructions`
>>> that accept only a subset of `ValueOwnershipKind`, all either accept a
>>> singular
>>> ownership kind or all ownership kinds.
>>>
>>> For `SILArgument`, we determine compatible ownership by introducing the
>>> concept
>>> of ownership conventions. The reason to introduce these ownership
>>> conventions is
>>> that it ensures that we do not need to perform any form of dataflow to
>>> determine
>>> the convention of a `SILArgument` in a loop. Additionally, it allows for
>>> greater
>>> readability in the IR and prevents the optimizer from performing implicit
>>> conversions of ownership semantics on branch instructions, for instance,
>>> converting a `switch_enum` from consuming a value at +1 to using a borrowed
>>> +0
>>> parameter. In terms of API, we introduce a new method on `SILArgument`:
>>>
>>> ValueOwnershipKind SILArgument::getOwnershipConstraint() const;
>>>
>>> which returns the required ownership constraint. In terms of representing
>>> these
>>> ownership constraint conventions in textual SIL, we print out the ownership
>>> constraint next to the specific argument in the block and the specific
>>> argument
>>> in the given terminator. For details of how this will look with various
>>> terminators, see the
>>> [appendix](#sil-argument-terminator-convention-examples).
>>>
>>> # Verification of Ownership Semantics
>>>
>>> Since our ownership model is based around SSA form, all verification of
>>> ownership only need consider an individual value (`SILValue`) and the uses
>>> of
>>> the def (`SILInstruction`, `SILArgument`). Thus for each def/use-set, we:
>>>
>>> 1. **Verify Use Semantics**: All uses must be compatible with the def's
>>> `ValueOwnershipKind`.
>>> 2. **Dataflow Verification**: Given any path P in the program and a
>>> `ValueDef`
>>> `V` along that path:
>>> a. There exists only one <a
>>> href="#footnote-0-lifetime-ending-use">"lifetime ending"</a> use of `V`
>>> along that path.
>>> b. After the lifetime ending use of `V`, there are no non-lifetime ending
>>> uses of `V` along `P`.
>>>
>>> Since the dataflow verification requires knowledge of a subset of the uses,
>>> we
>>> perform the use semantic verification first and then use the found uses for
>>> the
>>> dataflow verification.
>>>
>>> ## Use Verification
>>>
>>> Just like with `SILValue`, individual `SILInstruction` kind's have
>>> well-defined
>>> ownership semantics implying that we can use a `SILVisitor` approach here as
>>> well. Thus we define a `SILVisitor` called
>>> `OwnershipCompatibilityUseChecker`. This checker works by taking in a
>>> `SILValue`
>>> and visiting all of the `SILInstruction` users of the `SILValue`. Each
>>> visitor
>>> method returns a pair of booleans, the first stating whether or not the
>>> ownership values were compatible and the second stating whether or not this
>>> specific use should be considered a "lifetime ending" use. The checker then
>>> stores the lifetime ending uses and non-lifetime ending uses in two separate
>>> arrays for processing by the dataflow verifier.
>>>
>>> ## Dataflow Verification
>>>
>>> The dataflow verifier takes in as inputs the `SILValue` and lists of
>>> lifetime-ending and non-lifetime ending uses. Since we are using SSA form,
>>> we
>>> already know that our def must dominate all of our uses implying that a use
>>> can
>>> never overconsume due to a def not being along a path. On the other hand, we
>>> still must consider issues related to joint post-dominance namely:
>>>
>>> 1. Double-Consume due to 2+ lifetime ending uses along a path. This implies
>>> proving that no lifetime ending use is reachable from another lifetime
>>> ending
>>> use.
>>> 2. Use-After-Free due to a non-lifetime ending use not being joint-post
>>> dominated by the lifetime ending uses. This implies proving that all
>>> non-lifetime ending uses are joint post-dominated by the lifetime ending
>>> use
>>> set.
>>> 3. Lifetime-Leaks due to the non-lifetime ending uses not joint-post
>>> dominating
>>> the lifetime ending uses.
>>>
>>> Note how we must consider joint post-dominance in two different places.
>>> Thus we
>>> must be able to prove joint post-dominance quickly and cheaply. To do so, we
>>> take advantage of dominance implying that all uses are reachable from the
>>> def to
>>> just needing to prove that when we walk the CFG up to prove reachability,
>>> that
>>> any block on the reachability path does not have any successors that have
>>> not
>>> been visited when we finish walking and visit the def.
>>>
>>> The full code is in the [appendix](#full-dataflow-verification-algorithm).
>>>
>>> <a id="footnote-0-lifetime-ending-use">[0]</a>: A use after which a def is
>>> no
>>> longer allowed to be used in any way, e.g. `destroy_value`, `end_borrow`.
>>>
>>> # Appendix
>>>
>>> ## Changes to SILValue API for SIL Ownership
>>>
>>> The common code pattern eliminated by removing implicit uses of
>>> `ValueDef`'s API
>>> via a `SILValue` are as follows:
>>>
>>> SILValue V;
>>> ValueDef *Def;
>>>
>>> if (V != Def) { ... }
>>> if (V->getParentBlock()) { ... }
>>>
>>> Our change, causes these code patterns to be rewritten in favor of the
>>> explicit:
>>>
>>> SILValue V;
>>> ValueDef *Def;
>>>
>>> if (V.getDef() != Def) { ... }
>>> if (V.getDef()->getParentBlock()) { ... }
>>>
>>> In cases like the above, we want the to increase clarity through the usage
>>> of
>>> verbosity. In other cases, this verboseness makes convenient APIs more
>>> difficult
>>> to use. The main example here are the `isa` and `dyn_cast` APIs. We
>>> introduce
>>> two helper functions that give the same convenience as before but added
>>> clarity by
>>> making it clear that we are not performing an isa query on the value, but
>>> instead the underlying def of the value:
>>>
>>> template <typename ParentTy>
>>> bool def_isa(SILValue V) { return isa<ParentTy>(V.getDef()); }
>>> template <typename ParentTy>
>>> ParentTy *def_dyn_cast(SILValue V) { return
>>> dyn_cast<ParentTy>(V.getDef()); }
>>>
>>> Consider the following code using the old API,
>>>
>>> SILValue V;
>>>
>>> if (isa<ApplyInst>(V)) { ... }
>>> if (auto *PAI = dyn_cast<PartialApplyInst>(V)) { ... }
>>>
>>> Notice how it seems like one is casting the SILValue as if it is a
>>> ValueBase. This is due to the misleading usage of the implicit conversion
>>> from a
>>> `SILValue` to the `SILValue`'s internal `ValueBase`. In comparison the new
>>> API
>>> makes it absolutely clear that one is reasoning about the ValueKind of the
>>> underlying def of the `SILValue`:
>>>
>>> SILValue V;
>>>
>>> if (def_isa<ApplyInst>(V)) { ... }
>>> if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>>
>>> Thus the convenience of the old API is maintained, clarity is improved, and
>>> the
>>> conceptual API boundary is enforced.
>>>
>>> The final change that we propose is eliminating the ability to construct a
>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself. Allowing
>>> this
>>> to occur violates the modeling notion that a `SILValue` is defined and thus
>>> is
>>> dependent on the `ValueDef`. To implement this, we propose changing the
>>> constructor `SILValue::SILValue(ValueDef *)` to have private instead of
>>> public
>>> access control and declaring `ValueDef` subclasses as friends of
>>> `SILValue`. This then allows the `ValueDef` to vend opaquely constructed
>>> `SILValue`, but disallows external users of the API from directly creating
>>> `SILValue` from `ValueDef`, enforcing the value/def distinction in our
>>> model.
>>>
>>> ## SILArgument/Terminator OwnershipConvention Examples
>>>
>>> We present here several examples of how ownership conventions look on
>>> SILArguments and terminators.
>>>
>>> First we present a simple branch and return example.
>>>
>>> class C { ... }
>>>
>>> sil @simple_branching : $@convention(thin) : @convention(thin) (@owned
>>> Builtin.NativeObject, @guaranteed C) -> @owned C {
>>> bb0(%0 : @owned $Builtin.NativeObject, %1 : @guaranteed $C):
>>> br bb1(%0 : @owned $Builtin.NativeObject)
>>>
>>> bb1(%1 : @owned $Builtin.NativeObject):
>>> strong_release %1 : $Builtin.NativeObject
>>> %2 = copy_value %0 : $C
>>> return %2 : @owned $C
>>> }
>>>
>>> Now consider this loop example:
>>>
>>> sil @loop : $@convention(thin) : $@convention(thin) (@owned
>>> Optional<Builtin.NativeObject>) -> () {
>>> bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>> br bb1(%0 : @owned $Builtin.NativeObject)
>>>
>>> bb1(%1 : @owned $Builtin.NativeObject):
>>> %2 = alloc_object $C
>>> strong_release %1 : $Builtin.NativeObject
>>> %3 = unchecked_ref_cast %2 : $C to $Builtin.NativeObject
>>> cond_br %cond, bb1(%3 : @owned Builtin.NativeObject), bb2
>>>
>>> bb2:
>>> strong_release %3 : $Builtin.NativeObject
>>> %result = tuple()
>>> return %result : @trivial $()
>>> }
>>>
>>> Note how it is absolutely clear what convention is being used when passing a
>>> value to the phi node. No dataflow reasoning is required implying we can do
>>> a
>>> simple pass over the CFG to prove correctness.
>>>
>>> Now consider two examples of switches:
>>>
>>> sil @owned_switch_enum : $@convention(thin) : $@convention(thin)
>>> (@owned Optional<Builtin.NativeObject>) -> () {
>>> bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>> switch_enum %0 : @owned $Builtin.NativeObject,
>>> #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>>
>>> bb1:
>>> br bb3
>>>
>>> bb2(%1 : @owned $Builtin.NativeObject):
>>> strong_release %1 : $Builtin.NativeObject
>>> br bb3
>>>
>>> bb3:
>>> %result = tuple()
>>> return %result : @trivial $()
>>> }
>>>
>>> sil @guaranted_converted_switch_enum : $@convention(thin) :
>>> $@convention(thin) (@owned Optional<Builtin.NativeObject>) -> () {
>>> bb0(%0 : @owned $Optional<Builtin.NativeObject>):
>>> %1 = begin_borrow %0 : $Optional<Builtin.NativeObject>
>>> switch_enum %1 : @guaranteed $Builtin.NativeObject,
>>> #Optional.none.enumelt: bb1, #Optional.some.enumelt.1: bb2
>>>
>>> bb1:
>>> br bb3
>>>
>>> bb2(%2 : @guaranteed $Builtin.NativeObject):
>>> %3 = function_ref @g_user : $@convention(thin) (@guaranteed
>>> Builtin.NativeObject) -> ()
>>> apply %3(%2) : $@convention(thin) (@guaranteed Builtin.NativeObject)
>>> -> ()
>>> br bb3
>>>
>>> bb3:
>>> end_borrow %1 from %0 : $Optional<Builtin.NativeObject>,
>>> $Optional<Builtin.NativeObject>
>>> %result = tuple()
>>> return %result : @trivial $()
>>> }
>>>
>>> Notice how the lifetime is completely explicit in both cases, so the
>>> optimizer
>>> can not treat the conversion of switch_enum from +1 to +0 implicitly.
>>>
>>> ## Full Dataflow Verification Algorithm
>>>
>>> Define the following book keeping data structures.
>>>
>>> // The worklist that we will use for our iterative reachability query.
>>> llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>>
>>> // The set of blocks with lifetime ending uses.
>>> llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>>
>>> // The set of blocks with non-lifetime ending uses and the associated
>>> // non-lifetime ending use SILInstruction.
>>> llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8>
>>> BlocksWithNonLifetimeEndingUses;
>>>
>>> // The blocks that we have already visited.
>>> llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>>
>>> // A list of successor blocks that we must visit by the time the
>>> algorithm terminates.
>>> llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>>
>>> Then for each non-lifetime ending use (found by the
>>> `OwnershipCompatibilityUseChecker`), we add the block and its instruction
>>> to the
>>> `BlocksWithNonLifetimeEndingUses`. There is a possibility of having multiple
>>> non-lifetime ending uses in the same block, in such a case, we always take
>>> the
>>> later value since we are performing a liveness-esque dataflow:
>>>
>>> for (SILInstruction *User : getNonLifetimeEndingUses()) {
>>> // First try to associate User with User->getParent().
>>> auto Result =
>>> BlocksWithNonLifetimeEndingUses.insert({User->getParent(), User});
>>>
>>> // If the insertion succeeds, then we know that there is no more work
>>> to
>>> // be done, so process the next use.
>>> if (Result.second)
>>> continue;
>>>
>>> // If the insertion fails, then we have at least two non-lifetime
>>> ending uses
>>> // in the same block. Since we are performing a liveness type of
>>> dataflow,
>>> // we only need the last non-lifetime ending use to show that all
>>> lifetime
>>> // ending uses post dominate both. Thus, see if Use is after
>>> // Result.first->second in the use list. If Use is not later, then we
>>> wish
>>> // to keep the already mapped value, not use, so continue.
>>> if (std::find(Result.first->second->getIterator(), Block->end(),
>>> User) ==
>>> Block->end()) {
>>> continue;
>>> }
>>>
>>> // At this point, we know that Use is later in the Block than
>>> // Result.first->second, so store Use instead.
>>> Result.first->second = User;
>>> }
>>>
>>> Then we visit each each lifetime ending use and its parent block:
>>>
>>> for (SILInstruction *User : getLifetimeEndingUses()) {
>>> SILBasicBlock *UserParentBlock = User->getParent();
>>> ...
>>> }
>>>
>>> We begin by adding `UserParentBlock` to the `BlocksWithLifetimeEndingUses`
>>> set. If a block is already in the set (i.e. insert fails), then we know
>>> that (1)
>>> has been violated and we error. If we never hit that condition, then we know
>>> that all lifetime ending uses are in different blocks.
>>>
>>> if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>> double_consume_error(); // ERROR!
>>> }
>>>
>>> Then we check if we previously saw any non-lifetime ending uses in
>>> `UserParentBlock` by checking the map `BlocksWithNonLifetimeEndingUses`. If
>>> we do
>>> find any such uses, we check if the lifetime ending use is earlier in the
>>> block
>>> that the non-lifetime ending use. If so then (2) is violated and we
>>> error. Once we know that (2) has not been violated, we remove
>>>
>>> auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>> if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>> // Make sure that the non-lifetime ending use is before the lifetime
>>> // ending use. Otherwise, we have a use after free.
>>> if (std::find(User->getIterator(), UserParentBlock->end(),
>>> Iter->second) !=
>>> UserParentBlock->end()) {
>>> use_after_free_error(); // ERROR!
>>> }
>>>
>>> // Erase the use since we know that it is properly joint
>>> post-dominated.
>>> BlocksWithNonLifetimeEndingUses.erase(Iter);
>>> }
>>>
>>> Then we mark `UserParentBlock` as visited to ensure that we do not consider
>>> UserParentBlock to be a successor we must visit and prime the worklist with
>>> the
>>> predecessor blocks of `UserParentBlock`.
>>>
>>> // Then mark UserParentBlock as visited add all predecessors of
>>> // UserParentBlock to the worklist.
>>> VisitedBlocks.insert(UserParentBlock);
>>> copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>>
>>> The entire block of code is provided in the appendix. Now we know that:
>>>
>>> 1. All lifetime ending uses are in different basic blocks.
>>> 2. All non-lifetime ending uses are either in different basic blocks.
>>>
>>> We begin our worklist algorithm by popping off the next element in the
>>> worklist.
>>>
>>> // Until the worklist is empty.
>>> while (!Worklist.empty()) {
>>> // Grab the next block to visit.
>>> const SILBasicBlock *BB = Worklist.pop_back_val();
>>> ...
>>> }
>>>
>>> Since every time we insert objects into the worklist, we first try to
>>> insert it
>>> into the `VisitedBlock` list, we do not need to check if `BB` has been
>>> visited
>>> yet, since elements can not be added to the worklist twice.
>>>
>>> Then make sure that BB is not in `BlocksWithLifetimeEndingUses`. If BB is in
>>> `BlocksWithLifetimeEndingUses`, then we know that there is a path going
>>> through
>>> the def where the def is consumed twice, an error!
>>>
>>> if (BlocksWithLifetimeEndingUses.count(BB)) {
>>> double_consume_error(); // ERROR!
>>> }
>>>
>>> Now that we know we are not double consuming, we need to update our data
>>> structures. First if BB is contained in SuccessorBlocksThatMustBeVisited, we
>>> remove it. This ensures when the algorithm terminates, we know that the path
>>> through the successor was visited as apart of our walk.
>>>
>>> SuccessorBlocksThatMustBeVisited.erase(BB);
>>>
>>> Then if BB is in BlocksWithNonLifetimeEndingUses, we remove it. This ensures
>>> that when the algorithm terminates, we know that the non-lifetime ending
>>> uses
>>> were properly joint post-dominated by the lifetime ending uses.
>>>
>>> BlocksWithNonLifetimeEndingUses.erase(BB);
>>>
>>> Then add all successors of BB that we have not visited yet to the
>>> `SuccessorBlocksThatMustBeVisited` set. This ensures that if we do not
>>> visit the
>>> successor as apart of our CFG walk, at the end of the algorithm we will
>>> assert
>>> that there is a leak.
>>>
>>> for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>> // If we already visited the successor, there is nothing to do since
>>> // we already visited the successor.
>>> if (VisitedBlocks.count(SuccBlock))
>>> continue;
>>>
>>> // Otherwise, add the successor to our MustVisitBlocks set to ensure
>>> that
>>> // we assert if we do not visit it by the end of the algorithm.
>>> SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>> }
>>>
>>> Finally add all predecessors of BB that we have not visited yet to the
>>> Worklist
>>> for processing. We use insert here to ensure that we only ever add a
>>> VisitedBlock once to the Worklist:
>>>
>>> for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>> if (!VisitedBlocks.insert(PredBlock).second)
>>> continue;
>>> Worklist.push_back(PredBlock);
>>> }
>>>
>>> This continues until we have exhausted the worklist. Once the worklist is
>>> exhausted, we know that:
>>>
>>> 1. If `SuccessorBlocksThatMustBeVisited` is non-empty, then the Blocks in
>>> the
>>> set are not joint post-dominated by the set of lifetime ending users
>>> implying
>>> a leak.
>>> 2. If `BlockSwithNonLifetimeEndingUses` is non-empty, then there was a
>>> non-lifetime ending use that was not joint post-dominated by the lifetime
>>> ending use set. This implies a use-after free.
>>>
>>> Thus we assert that both sets are empty and error accordingly.
>>>
>>> if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>> leak_error(); // ERROR!
>>> }
>>> if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>> use_after_free_error(); // ERROR!
>>> }
>>>
>>> The full code:
>>>
>>> // The worklist that we will use for our iterative reachability query.
>>> llvm::SmallVector<SILBasicBlock *, 32> Worklist;
>>>
>>> // The set of blocks with lifetime ending uses.
>>> llvm::SmallPtrSet<SILBasicBlock *, 8> BlocksWithLifetimeEndingUses;
>>>
>>> // The set of blocks with non-lifetime ending uses and the associated
>>> // non-lifetime ending use SILInstruction.
>>> llvm::SmallDenseMap<SILBasicBlock *, SILInstruction *, 8>
>>> BlocksWithNonLifetimeEndingUses;
>>>
>>> // The blocks that we have already visited.
>>> llvm::SmallPtrSet<SILBasicBlock *, 32> VisitedBlocks;
>>>
>>> // A list of successor blocks that we must visit by the time the
>>> algorithm terminates.
>>> llvm::SmallPtrSet<SILBasicBlock *, 8> SuccessorBlocksThatMustBeVisited;
>>>
>>> for (SILInstruction *User : getLifetimeEndingUses()) {
>>> SILBasicBlock *UserParentBlock = User->getParent();
>>> if (!BlocksWithLifetimeEndingUses.insert(UserParentBlock).second) {
>>> double_consume_error(); // ERROR!
>>> }
>>>
>>> auto Iter = BlocksWithNonLifetimeEndUses.find(UserParentBlock);
>>> if (Iter != BlocksWithNonLifetimeEndUses.end()) {
>>> // TODO: Could this be cached.
>>> if (std::find(User->getIterator(), UserParentBlock->end(),
>>> Iter->second) !=
>>> UserParentBlock->end()) {
>>> use_after_free_error(); // ERROR!
>>> }
>>> }
>>>
>>> // Then add all predecessors of the user block to the worklist.
>>> VisitedBlocks.insert(UserParentBlock);
>>> copy(UserParentBlock->getPreds(), std::back_inserter(Worklist));
>>> }
>>>
>>> // Until the worklist is empty.
>>> while (!Worklist.empty()) {
>>> // Grab the next block to visit.
>>> const SILBasicBlock *BB = Worklist.pop_back_val();
>>>
>>> // Then check that BB is not a lifetime-ending use block. If it is
>>> error,
>>> // since we have an overconsume.
>>> if (BlocksWithLifetimeEndingUses.count(BB)) {
>>> double_consume_error(); // ERROR!
>>> }
>>>
>>> // Ok, now we know that we are not double-consuming. Update our data
>>> // structures.
>>>
>>> // First remove BB from the SuccessorBlocksThatMustBeVisited list.
>>> This
>>> // ensures that when the algorithm terminates, we know that BB was
>>> not the
>>> // beginning of a non-covered path to the exit.
>>> SuccessorBlocksThatMustBeVisited.erase(BB);
>>>
>>> // Then remove BB from BlocksWithNonLifetimeEndingUses so we know that
>>> // this block was properly joint post-dominated by our lifetime ending
>>> // users.
>>> BlocksWithNonLifetimeEndingUses.erase(BB);
>>>
>>> // Then add all successors of BB that we have not visited yet to the
>>> // SuccessorBlocksThatMustBeVisited set.
>>> for (const SILBasicBlock *SuccBlock : BB->getSuccessorBlocks()) {
>>> // If we already visited the successor, there is nothing to do since
>>> // we already visited the successor.
>>> if (VisitedBlocks.count(SuccBlock))
>>> continue;
>>>
>>> // Otherwise, add the successor to our MustVisitBlocks set to
>>> ensure that
>>> // we assert if we do not visit it by the end of the algorithm.
>>> SuccessorBlocksThatMustBeVisited.insert(SuccBlock);
>>> }
>>>
>>> // Finally add all predecessors of BB that have not been visited yet
>>> to
>>> // the worklist.
>>> for (const SILBasicBlock *PredBlock : BB->getPredecessorBlocks()) {
>>> // Try to insert the PredBlock into the VisitedBlocks list to ensure
>>> // that we only ever add a block once to the worklist.
>>> if (!VisitedBlocks.insert(PredBlock).second)
>>> continue;
>>> Worklist.push_back(PredBlock);
>>> }
>>> }
>>>
>>> if (!SuccessorBlocksThatMustBeVisited.empty()) {
>>> leak_error(); // ERROR!
>>> }
>>> if (!BlocksWithNonLifetimeEndingUses.empty()) {
>>> use_after_free_error(); // ERROR!
>>> }
>>>
>>> ----
>>>
>>>> On Dec 5, 2016, at 6:47 PM, Michael Gottesman <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>>
>>>>> On Dec 5, 2016, at 5:26 PM, Andrew Trick <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>
>>>>>> On Dec 5, 2016, at 3:55 PM, Michael Gottesman <[email protected]
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>> Hello everyone.
>>>>>>
>>>>>> This is an initial draft of a proposal for the SIL Ownership Model +
>>>>>> Verifier. I am mainly looking for any high level concerns. If there are
>>>>>> none, then I am fine moving this straight to swift-dev.
>>>>>>
>>>>>> It is online here:
>>>>>>
>>>>>> https://gottesmm.github.io/proposals/sil-ownership-model.html
>>>>>> <https://gottesmm.github.io/proposals/sil-ownership-model.html>
>>>>>>
>>>>>> Michael
>>>>>
>>>>>
>>>>> Thanks! It looks good.
>>>>>
>>>>> > `ValueDef` is not a value, but the def of a value.
>>>>>
>>>>> Should be "but may define (or produce) values.".
>>>>
>>>> Sure.
>>>>
>>>>>
>>>>> > Since our ownership model is based around SSA form
>>>>>
>>>>> There's an in-memory aspect as well. Ownership of local memory should be
>>>>> statically verified. We at least need a TBD section for this.
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>> > This checker works by taking in a
>>>>> `ValueDef` and visiting all of the `SILInstruction` users of the
>>>>> `ValueDef`.
>>>>>
>>>>> This line does not compute for me. A ValueDef does not have ownership.
>>>>> The verifier is simply checking constraints for each SILValue against all
>>>>> of its uses. At the verifier's level of abstraction, it doesn't matter
>>>>> who defined that value.
>>>>
>>>> That is a typo. If you look earlier I was using explicitly SILValue.
>>>>
>>>>>
>>>>> > The dataflow verifier takes in as inputs the `ValueDef` (i.e. def) and
>>>>> > lists of lifetime-ending and non-lifetime ending uses.
>>>>>
>>>>> Again, the verifier would not care about the uses of the ValueDef, it
>>>>> cares about uses of the SILValue. I think this is a misleading way to
>>>>> formulate the problem and explain the algorithm.
>>>>
>>>> That is a typo from an earlier version of the algorithm. Everything should
>>>> be about the SILValue.
>>>>
>>>>>
>>>>> > if (V.getDef()->getParentBlock()) { ... }
>>>>>
>>>>> I see no reason we can't add getParentBlock() and other very common
>>>>> idioms to the SILValue API.
>>>>
>>>> The question is how much should we make that a def only API.
>>>>
>>>>>
>>>>> > if (auto *PAI = def_dyn_cast<PartialApplyInst>(V)) { ... }
>>>>>
>>>>> It's not obvious to be that this is more readable than
>>>>>
>>>>> if (auto *PAI = dyn_cast<PartialApplyInst>(V.getDef())) { ... }
>>>>>
>>>>> But I don't have a strong opinion on it.
>>>>
>>>> it is just a little shorter. I am also not wedded to this. I was just
>>>> trying to create a vision of what is possible to stimulate discussion.
>>>>
>>>>>
>>>>> > The final change that we propose is eliminating the ability to
>>>>> > construct a
>>>>> `SILValue` from a `ValueDef` externally to the `ValueDef` itself.
>>>>>
>>>>> Nice!
>>>>>
>>>>> I didn't reread the "Full Dataflow Verification Algorithm" because I
>>>>> don't think it's changed much. I'll review it again after the final
>>>>> proposal.
>>>>
>>>> It is exactly the same as the version I showed you before, except I
>>>> followed your suggestion about putting the visit before things are added
>>>> to the worklist rather than after it is popped off the worklist.
>>>>
>>>> Let me fix this real quick.
>>>>
>>>> Michael
>>>>
>>>>>
>>>>> -Andy
>>>
>>>
>>> _______________________________________________
>>> swift-dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>> <https://lists.swift.org/mailman/listinfo/swift-dev>
>>
>> _______________________________________________
>> swift-dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-dev
>> <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev