> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <dabrah...@apple.com> wrote: > > > on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com> wrote: > >> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev >> <swift-dev@swift.org> wrote: >> >>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >>> <http://swift-dev-at-swift.org/>> wrote: >>> >>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>>> <swift-dev@swift.org> wrote: >>>>> >>>>> This is a proposal for representing copy-on-write buffers in >>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>> depends on how we move forward with SIL ownership. >>>>> <CopyOnWrite.rst> >>>>> If you have any comments, please let me know. >>>> >>>> The SIL-level design seems sensible to me at a glance. At the language >>>> level, I think it would make more sense to treat this as an attribute >>>> on class types rather than on properties in structs using the class. I >>>> don't think many people reuse class definitions as both shared >>>> reference types and as value type payloads, >>> >>> Foundation does, or would if they could. >>> >>>> but beyond that, I think that making it an attribute of classes would >>>> put us into a better position to leverage the borrow model to enforce >>>> the "mutable-only-when-unique" aspect of COW implementations. John >>>> alluded to this in the "SIL address types and borrowing" thread: >>>> >>>>> I wonder if it would make more sense to make copy-on-write buffer >>>>> references a move-only type, so that as long as you were just >>>>> working with the raw reference (as opposed to the CoW aggregate, >>>>> which would remain copyable) it wouldn't get implicitly copied >>>>> anymore. You could have mutable and immutable buffer reference >>>>> types, both move-only, and there could be a consuming checkUnique >>>>> operation on the immutable one that, I dunno, returned an Either of >>>>> the mutable and immutable versions. >>>>> >>>>> For CoW aggregates, you'd need some @copied attribute on the field >>>>> to make sure that the CoW attribute was still copyable. Within the >>>>> implementation of the type, though, you would be projecting out the >>>>> reference immediately, and thereafter you'd be certain that you were >>>>> borrowing / moving it around as appropriate. >>>> >>>> If 'copy-on-write' were a trait on classes, then we could distinguish >>>> unique and nonunique references to the class. A unique reference would >>>> act like a move-only type to prevent accidental loss of uniqueness. >>> >>> +1 >>> >>>> We can also allow a copy-on-write class to have "mutating" methods, >>>> and only allow mutation on unique references. It seems to me like, >>>> exploring this direction, we could also come up with a way for the >>>> high-level value-semantics operations on the struct to statically >>>> indicate which methods are known to leave the value's buffers in a >>>> unique state, or which return values that are uniquely owned, which >>>> would give the optimizer more ability to avoid uniqueness checks >>>> across calls without relying on inlining and IPO. >>> >>> That's pretty cool. However, I think there's nothing to prevent any >>> mutating method from storing a copy of self in a global, so I think we'd >>> need some participation from the programmer (either an agreement not to >>> do that, or an explicit claim of uniqueness on exit) in order to >>> identify operations that create/preserve uniqueness. >> >> If a mutating reference (like self in a mutating method) is move-only >> then you would not be able to “copy” it to a global. > > Yes, a reference to a move-only type would work for this purpose. > > >>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >>> <swift-dev@swift.org> wrote: >>> >>> >>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: >>> >>>> This is a proposal for representing copy-on-write buffers in >>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>> depends on how we move forward with SIL ownership. >>>> >>>> :orphan: >>>> >>>> .. highlight:: sil >>>> >>>> =================================== >>>> Copy-On-Write Representation in SIL >>>> =================================== >>>> >>>> .. contents:: >>>> >>>> Overview >>>> ======== >>>> >>>> This document proposes: >>>> >>>> - An ownership attribute to define copy-on-write (COW) buffers in Swift >>>> data >>>> types. >>>> >>>> - A representation of COW buffers in SIL so that optimizations can take >>>> benefit >>>> of it. >>>> >>>> The basic idea is to enable the SIL optimizer to reason about COW data >>>> types >>>> in the same way as a programmer can do. >>>> This means: a COW buffer can only be modified by its owning SIL value, >>>> because >>>> either it's uniquely referenced or the buffer is copied before modified. >>>> >>>> .. note:: >>>> In the following the term "buffer" refers to a Swift heap object. >>>> It can be any heap object, not necessarily a “buffer” with e.g. >>>> tail-allocated elements. >>>> >>>> COW Types >>>> ========= >>>> >>>> The basic structure of COW data types can be simplified as follows:: >>>> >>>> class COWBuffer { >>>> var someData: Int >>>> ... >>>> } >>>> >>>> struct COWType { >>>> var b : COWBuffer >>>> >>>> mutating func change_it() { >>>> if (!isUniquelyReferenced(b)) { >>>> b = copy_buffer(b) >>>> } >>>> b.someData = ... >>>> } >>>> } >>>> >>>> Currently the COW behavior of such types is just defined by their >>>> implementation. >>>> But there is no representation of this special behavior in the SIL. >>>> So the SIL optimizer has no clue about it and cannot take advantage of it. >>>> >>>> For example:: >>>> >>>> func foo(arr : [Int]) { >>>> x = arr[0] >>>> opaque_function() >>>> y = arr[0] // can RLE replace this with y = x? >>>> } >>>> >>>> If opaque_function() wants to change the contents of the array buffer it >>>> first >>>> has to copy it. >>> >>> ...or determine that it's uniquely-referenced. >> >> In this specific example, if opqaue_function holds a reference to arr’s >> buffer, the buffer is not >> uniquely-referenced. > > Right. > >>> >>>> But the optimizer does not know it so it has to conservatively assume >>>> that opaque_function() will write to the location of arr[0]. >>>> >>>> Copy-on-write Ownership Attribute >>>> ================================= >>>> >>>> This section proposes an ownership attribute to define a copy-on-write >>>> buffer. >>>> >>>> Swift Syntax >>>> ------------ >>>> >>>> A COW buffer reference can be defined with a new ownership attribute for >>>> the >>>> buffer variable declaration (similar to “weak” and “unowned”):: >>>> >>>> struct COWType { >>>> copy_on_write var b : COWBuffer >>>> >>>> // ... >>>> } >>>> >>>> The ``copy_on_write`` attribute is purely used for optimization purposes. >>>> It does not change the semantics of the program. >>> >>> Presumably, it changes what code you can execute on `b` without invoking >>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >>> to do anything differently to take advantage of the annotation. >> >> That’s true. >> >>> What are the rules for writing code that uses `copy_on_write`? >> >> See below ("The rules for using ``copy_on_write`` and the built-ins are:”) > > Yeah, I got there, eventually. But just saying “doesn't change > semantics” at this point in the proposal leaves a gap, because it does > change semantic *requirements*. You should mention that. > >>>> .. note:: >>>> >>>> “copy_on_write” is a working title. TODO: decide on the name. >>>> Maybe it should be a @-attribute, like @copy_on_write? >>>> Another question is if we should open this attribute for the public or just >>>> use it internally in the library, because violating the implied rules >>>> (see below) could break memory safety. >>>> >>>> Implementation >>>> -------------- >>>> >>>> The ``copy_on_write`` references can be represented in the AST as a special >>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >>>> The canonical type of a ``CopyOnWriteStorageType`` would be the referenced >>>> buffer class type. >>>> >>>> In SIL the buffer reference will have type:: >>>> >>>> $@sil_cow COWBuffer >>>> >>>> where ``COWBuffer`` is the type of the referenced heap object. >>>> >>>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>>> reference >>>> type to a regular reference type:: >>>> >>>> cow_to_ref >>>> ref_to_cow >>>> >>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>>> >>>> For example the SIL code for:: >>>> >>>> var c: COWType >>>> let x = c.b.someData >>>> >>>> would be:: >>>> >>>> %1 = struct_extract %1 : COWType, #COWType.b >>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>>> %4 = load %3 : $*Int >>>> >>>> The ``ref_to_cow`` instruction is needed to store a new buffer reference >>>> into a >>>> COW type. >>>> >>>> COW Buffers and the Optimizer >>>> ============================= >>>> >>>> A reference to a COW buffer gives the optimizer additional information: >>>> >>>> *A buffer, referenced by a @sil_cow reference is considered to be immutable >>>> during the lifetime of the reference.* >>> >>> This seems like much too broad a rule to allow inplace mutations of >>> uniquely referenced buffers. >> >> The point is that all mutations must be guarded by an is_unique, which >> takes the _address_ of the buffer reference as argument. >> And the optimizer considers this instruction as a potential write to the >> buffer reference. >> The effect is that the lifetime of a buffer reference (as a SIL value) >> will not outlive a is_unique - regardless if this is inside a called >> function or inlined. > > I don't see how that allows me to mutate a uniquely referenced buffer held > in a @sil_cow reference, given what you wrote above.
You would not be able to get a reference to a mutable buffer by reading the COW type’s @sil_cow field. Instead you would only get such a reference as a result of the is_unique instruction/builtin. Or, of course, by creating a new buffer. I’m not sure if this was the question, though. Plus: we will have an explicit conversion instruction (start_unique) to convert an immutable reference to a mutable referece. A SIL optimization can replace an is_unique with this instruction if it can prove that the reference is already unique at that point. > > >>> Unless you mean the reference is >>> immutable, rather than the storage being referred to by it. >>> >>>> This means any address derived from a ``cow_to_ref`` instruction can be >>>> considered to point to immutable memory. >>>> >>>> Some examples of optimizations which will benefit from copy-on-write >>>> representation in SIL: >>>> >>>> - Redundant load elimination >>>> >>>> RLE can assume that opaque code does not modify a COW buffer. >>> >>> How do you distinguish “opaque code” from “code that is meant to >>> modify the buffer and might do so in place if it's uniquely-referenced?” >> >> Again, the is_unique which takes the address of the reference, will >> guarantee that during the lifetime of a buffer there are no >> modifications of the buffer. > > Again, that sounds like it rules out inplace modification of uniquely > referenced buffers. > >> >> >>> >>>> Example:: >>>> >>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>> %4 = load %3 : $*Int >>>> %5 = apply %foo() // Cannot overwrite memory >>>> location %3 >>>> %6 = load %3 : $*Int // Can be replaced by %4 >>>> >>>> Currently we do some ad-hoc optimizations for array, based on semantics, >>>> like array count propagation. These hacks would not be needed >>>> anymore. >>> >>> W0000000000000000000000t. >>> >>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a >>>> projected address) escapes. Even if it escapes, it will reference immutable >>>> memory. >>>> >>>> - CSE, loop hoisting >>>> >>>> Similar to RLE: the optimizer can assume that opaque code cannot modify a >>>> COW buffer >>> >>> Same question here as above, then. >>>> >>>> - ARC optimization >>>> >>>> Knowing that some opaque code cannot overwrite a reference in the COW >>>> buffer >>>> can remove retain/release pairs across such code:: >>>> >>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>>> [guarantee] >>>> %5 = apply %foo() // Cannot overwrite someRef >>>> and dealloc the object >>>> // Use %4 >>>> destroy_value %4 : $MyClass >>>> >>>> Scoping instructions >>>> -------------------- >>>> >>>> To let the optimizer reason about the immutability of the COW buffer, it is >>>> important to *bind* the lifetime of the buffer content to the lifetime of >>>> the >>>> buffer reference. For example:: >>>> >>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>> // load something from %b1 >>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>> again >>>> // load something from %b2 >>>> >>>> The question is: can RLE forward the load of the buffer reference and >>>> replace >>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies >>>> the >>>> buffer. >>>> >>>> To enforce this restriction, the scope of any buffer modification must be >>>> enclosed in a pair of SIL instructions. Those instructions define the scope >>>> of the mutation. Both instructions take the *address* of the buffer >>>> reference as operand and act as a potential write to the buffer reference. >>>> >>>> The purpose of the scoping instructions is to strictly separate the >>>> liferanges >>>> of references to an immutable buffer and references to the mutable buffer. >>> >>> Looks reasonable. >>> >>>> The following example shows why the scoping instructions (specifically the >>>> end-of-scope instruction) are required to prevent loop-hoisting from >>>> interleaving mutable and immutable liferanges:: >>>> >>>> // there should be a begin-of-scope %baddr >>>> %mut_b = load %baddr >>>> store %x to %mut_b // modification of the buffer >>>> // there should be a end-of-scope %baddr >>>> >>>> loop { >>>> %b = load %baddr >>>> %y = load %b // load from the buffer >>>> ... >>>> } >>>> >>>> If there is no end-of-scope instruction, loop hoisting could do:: >>>> >>>> %mut_b = load %baddr >>>> %b = load %baddr // moved out of the loop >>>> store %x to %mut_b >>>> >>>> loop { >>>> %y = load %b >>>> ... >>>> } >>>> >>>> Now the optimizer assumes that ``%b`` references an immutable buffer, so >>>> it could >>>> also hoist the load:: >>>> >>>> %mut_b = load %baddr >>>> %b = load %baddr >>>> %y = load %b // Wrong! Will be overwritten by the following >>>> store >>>> store %x to %mut_b >>>> >>>> loop { >>>> ... >>>> } >>>> >>>> >>>> The following sections describe two alternatives to implement the scoping. >>>> >>>> Scoping Alternative 1: Explicit Built-ins >>>> ----------------------------------------- >>>> >>>> SIL instructions >>>> ^^^^^^^^^^^^^^^^ >>>> >>>> The existing ``is_unique`` instruction is changed to a terminator >>>> instruction:: >>>> >>>> bb0: >>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >>>> address of the COWBuffer reference >>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >>>> reference. Physically identical to "load %0” >>>> // usually empty >>>> br bb3(%1 : $COWBuffer) >>>> bb2: // the false-block >>>> // usually contains: >>>> %2 = apply %copy_buffer >>>> %3 = cow_to_ref %2 >>>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>>> br bb3(%2 : $COWBuffer) >>>> bb3(%4 : $COWBuffer): >>>> // Modify the buffer referenced by %4 >>>> // ... >>>> >>>> The end-of-scope instruction is:: >>>> >>>> end_unique_addr %0 : $*COWBuffer >>>> >>>> It is important that the references to the unique buffers (``%1``, ``%2``) >>>> must >>>> not outlive ``end_unique_addr``. In most cases this can be check by the SIL >>>> verifier. >>>> >>>> The two instructions must be paired properly but not necessarily in the >>>> same function. >>>> >>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to >>>> separate the lifetimes of mutable and immutable accesses to the COW buffer. >>>> Both instructions take an address to the COW buffer reference and are >>>> considered as potential stores to the reference. >>>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>>> lifetimes >>>> across these instructions. >>>> For example, RLE cannot combine two buffer loads which are interleaved with >>>> a ``is_unique_addr_br``:: >>>> >>>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>>> // do something with %1 >>>> … >>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>>> … >>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >>>> with %1 >>>> >>>> Another important thing is that the COW buffer can only be mutated by >>>> using the >>>> reference of the ``is_unique_addr_br`` true-block argument. >>>> The COW buffer cannot be modified by simply loading/extracting the >>>> reference >>>> from the COWType. >>>> Example:: >>>> >>>> %1 = load_strong %0 : $*COWBuffer >>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>> store %7 : $Int to %3 : $*Int // Violation! >>>> >>>> Most obvious violations to this constraint can be catched by the >>>> SILVerifier. >>>> >>>> The ``_addr`` variants of the instructions also have a non-addr >>>> counterpart:: >>>> >>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces the >>>> true-block arg as owned >>>> >>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned >>>> >>>> These instructions are generated by Mem2reg (or a similar optimization) >>>> in case the COW value is stored (in a temporary alloc_stack location) >>>> just for the sake of passing an address to ``is_unique_addr_br`` and >>>> ``end_unique_addr``. >>>> For example in the following code, where the COW data is passed as-value >>>> and >>>> all the mutating functions are inlined:: >>>> >>>> func foo(arr : [Int], x: Int) { >>>> arr[0] = 27 >>>> … >>>> y = arr[x] >>>> … >>>> } >>>> >>>> Finally it’s probably a good idea to add an instruction for converting an >>>> immutable reference to a mutable reference:: >>>> >>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>>> $COWBuffer as owned >>>> >>>> which is basically just a simpler representation of the following pattern:: >>>> >>>> bb0: >>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>>> bb1(%1 : $COWBuffer): >>>> … // main control flow continues here >>>> bb2: >>>> unreachable >>>> >>>> An optimizations, which eliminate uniqueness checks, would replace a >>>> ``is_unique_br`` by a ``start_unique``. >>>> >>>> Built-ins >>>> ^^^^^^^^^ >>>> >>>> A COW type implementor can generate the new instructions by using a set of >>>> built-ins:: >>>> >>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>>> func endUnique<BufferType>(_ buffer: inout BufferType) >>>> >>>> For example:: >>>> >>>> struct COWType { >>>> copy_on_write var b : COWBuffer >>>> >>>> mutating func makeMutable() -> COWBuffer { >>>> if let uniqueBuffer = isUnique(&self.b) { >>>> return uniqueBuffer >>>> } >>>> let copiedBuffer = copyBuffer(self.b) >>>> self.b = copiedBuffer >>>> return copiedBuffer >>>> } >>>> >>>> mutating func setSomeData(x: Int) { >>>> let uniqueBuffer = makeMutable() >>>> uniqueBuffer.someData = x >>>> endUnique(&self.b) >>>> } >>>> } >>> >>> This seems reasonable, but it also looks like the compiler could do the >>> `endUnique` dance for us based, e.g., on the mutability of methods. >> >> I agree, that would be ideal, e.g. the compiler could insert the endUnique >> at the end of an inout >> scope. >> >>> >>>> The ``isUnique`` built-in returns an optional unique buffer reference. >>>> Physically this is the COW buffer which is passed as the inout argument. >>>> The result is nil if the buffer is not uniquely referenced. >>>> In this case usually the original buffer is copied and the reference to the >>>> copy is written back to the original buffer reference location >>>> (``self.b = copiedBuffer``). >>>> Starting at the point of the write-back, the reference to the copy also >>>> becomes >>>> a unique buffer reference. >>>> >>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` pattern >>>> which >>>> constructs the Optional in the successor blocks. Using ``isUnique`` in an >>>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` diamond. >>>> >>>> .. note:: >>>> This makes the definition of the unique buffer location lifetime a little >>>> bit >>>> problematic, because the false-branch of ``isUnique`` is not equivalent to >>>> the false-branch of the ``is_unique_addr_br`` instruction (before >>>> SimplifyCFG >>>> can do its job). >>> >>> I don't know what the implications of these diamonds and the problem >>> described above might be, FWIW. >>> >>>> The rules for using ``copy_on_write`` and the built-ins are: >>>> >>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily in >>>> the >>>> same function. >>>> >>>> 2. The COW buffer may only be mutated by using the unique buffer reference. >>>> >>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>>> ``endUnique`` >>>> pair. >>>> >>>> 4. During the lifetime of the unique buffer reference, the original COW >>>> buffer >>>> reference must not be used in any way, e.g. for reading from the buffer. >>>> >>>> Note that the lifetime of the unique buffer reference does not include the >>>> part between the begin of the ``isUnique`` false-branch and the write-back >>>> of the copy. This means is okay to read from the buffer (using ``self.b``) >>>> for the purpose of copying. >>>> >>>> Examples:: >>>> >>>> mutating func setSomeData(x: Int) { >>>> let uniqueBuffer = makeMutable() >>>> uniqueBuffer.someData = x >>>> // violates rule 1 >>>> } >>>> >>>> mutating func setSomeData(x: Int) { >>>> makeMutable() >>>> self.b.someData = x // violates rule 2 >>>> endUnique(&self.b) >>>> } >>>> >>>> mutating func setSomeData(x: Int) { >>>> let uniqueBuffer = makeMutable() >>>> uniqueBuffer.someData = x >>>> endUnique(&self.b) >>>> uniqueBuffer.someData = 27 // violates rule 3 >>>> } >>>> >>>> mutating func incrementSomeData() { >>>> let uniqueBuffer = makeMutable() >>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>>> endUnique(&self.b) >>>> } >>> >>> It would be instructive to write down the *correct* code for these >>> operations. >> >> added to my todo list. >> >>> >>>> The intention of the rules is to ensure that there is no overlap of a >>>> "read-only" life-range with a "mutable" life-range of the buffer reference. >>>> It’s the responsibility of the implementor to follow the rules. >>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can >>>> statically detect rule violations in obvious cases (with inter-procedural >>>> analysis maybe even in most cases). >>>> >>>> This approach would require to change some of the internals of our >>>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>>> For example, the Array make_mutable semantic functions currently do not >>>> return >>>> the unique buffer. >>> >>> No big deal. >>> >>>> Scoping Alternative 2: Implicit Inout Scopes >>>> -------------------------------------------- >>>> >>>> There is an idea (proposal?) to change the representation of inout >>>> variables >>>> in SIL. This is independent of this proposal, but can be helpful for the >>>> purpose of defining the scope of a COW mutation. >>>> >>>> The basic idea is that SILGen inserts scoping instructions for *all* inout >>>> variables. And those scoping instructions can be used to define the >>>> mutating >>>> scope of a COW buffer. >>>> >>>> The scoping instructions which are inserted by SILGen for an inout scope >>>> are:: >>>> >>>> begin_exclusive >>>> end_exclusive >>>> >>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>>> instructions take the >>>> address of the inout variable as argument. For the optimizer those >>>> instructions >>>> look like potential writes to the inout variable. >>>> >>>> The implementor of a COW type has to follow the rule that the COW buffer >>>> may >>>> only be modified in mutating functions of the COW type. But this is the >>>> case >>>> anyway because any modification needs a uniqueness check and this can only >>>> be >>>> done in mutating functions. >>>> >>>> Example:: >>>> >>>> // > mutating func setSomeData(x: Int) { >>>> // Accepts a unique reference to the array value (avoiding refcount >>>> operations) >>>> sil @setSomeData : $(Int, @inout Array) -> () { >>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>>> >>>> // > makeMutable() (inlined) >>>> // Forward the unique reference to the `self` array value, still >>>> avoiding refcount operations. >>>> // Begin the inlined exclusive scope (could be trivially removed). >>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>>> >>>> // > if !isUnique(&self._storage) { >>>> // Extract a unique inout reference to the class reference to the array >>>> storage. >>>> // This begins the isUnique() argument's exclusive scope. The memory is >>>> already exclusive >>>> // but the scope helps ensure this is the only alias to _storage. >>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>>> #Array._storage >>>> >>>> // Uniqueness checking requires an inout reference to the class >>>> reference. >>>> // The is_unique instruction does not need to create a new storage >>>> reference. >>>> // It's only purpose is to check the RC count, ensure that the checked >>>> reference >>>> // is inout, and prevent the inout scope from being optimized away. >>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>> >>>> // End the isUnique argument's exclusive scope (can also be trivially >>>> removed). >>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>> >>>> br %isuniq, bb_continue, bb_slow >>>> >>>> bb_slow: >>>> // > self._storage = copyBuffer(self._storage) >>>> // Produce a new class reference to storage with verifiably unique RC >>>> semantics. >>>> %copied_storage_class = alloc_ref ... >>>> // A begin/end exclusive scope is implicit in store [assign]. >>>> store [assign] %copied_storage_class to %arrayref._storageref >>>> br bb_continue >>>> >>>> bb_continue: >>>> >>>> // This marks the end of makeMutable's inout `self` scope. Because Array >>>> // contains a "copy_on_write" property, the SIL verifier needs to >>>> // prove that %arrayref.#_storage has not escaped at this point. This >>>> // is equivalent to checking that %arrayref itself is not copied, and >>>> // checking each projection of the "copy_on_write" storage property >>>> // (%arrayref._storageref) is not copied. Or, if any copies are present, >>>> // they must be consumed within this scope. >>>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>>> >>>> // > self._storage.someData = x >>>> // An _addr instruction with one load/store use doesn't really need its >>>> own scope. >>>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >>>> >>>> // ARC optimization can promote this to a borrow, replacing >>>> strong_release with end_borrow. >>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow >>>> ArrayStorage >>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>>> ArrayStorage >>>> >>>> // Write some data into the CoW buffer. >>>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>>> // A single-use _addr instruction, so no scope. >>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>>> // A store with an implicit [exclusive] scope. >>>> store [assign] %x to %somedata_addr >>>> >>>> strong_release %arrayref._storage : $*ArrayStorage<T> >>>> >>>> // End the isUnique argument's exclusive scope. >>>> // The same verification is needed here, but the inner scope would be >>>> eliminated. >>>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>>> >>>> In general this approach looks more "user-friendly" than the first >>>> alternative. >>> >>> Well, I can't really tell, because you haven't shown the Swift code that >>> generates this SIL. >>> >>>> But it depends on implementing the general feature to insert the inout >>>> scoping instructions. Also, we still have to think through all the >>>> details of this approach. >>> >>> FWIW, I am convinced we will need (and get) a stricter inout model that >>> would be conducive to inserting the scoping instructions. >>> >>> >>>> Dependency between a buffer reference to the scope-begin >>>> -------------------------------------------------------- >>> >>> You can only have a dependency between two things, but as phrased “a >>> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >>> would fix it. >>> >>>> With both alternatives there is no explicit dependency from a buffer >>>> reference >>>> to a scope-begin instruction:: >>>> >>>> %b_cow = load %baddr >>>> %b = cow_to_ref %b_cow >>>> %x = load %b // No dependency between this... >>>> ... >>>> begin_exclusive %baddr // ... and this instruction. >>>> ... >>>> >>>> So in theory the optimizer is free to reschedule the instructions:: >>>> >>>> %b_cow = load %baddr >>>> %b = cow_to_ref %b_cow >>>> ... >>>> begin_exclusive %baddr >>>> %x = load %b // Wrong! Buffer could be modified here >>>> ... >>>> >>>> We still have to figure out how to cope with this. >>>> >>>> - We could add an end-of-lifetime instruction for a COW buffer reference, >>>> which >>>> the optimizer may not move over a begin-of-scope instruction. >>>> >>>> - Or we just define the implicit rule for the optimizer that any use of a >>>> COW >>>> reference may not be moved over a begin-of-scope instruction. >>>> >>>> Preconditions >>>> ============= >>>> >>>> To benefit from COW optimizations in the stdlib Array, Set and Dictionary >>>> data >>>> structures we first need eager bridging, meaning getting rid of the bridged >>>> buffer. >>> >>> As you know I'm very much in favor of eager bridging, but I don't see >>> why this would be dependent on it. >> >> We could use copy_on_write with eager bridging, but I don’t think it will >> give any benefits to the >> optimizer. >> For example, the SIL code to get from an Array to a native >> ContiguousArrayStorage reference is pretty hard to understand for the >> optimizer (involves low level bit operations, etc.). > > It wouldn't need to do low-level bit operations if our enums were > capable/controllable enough. I'm just saying, there's no reason we > couldn't give the optimizer something to work with that has higher level > semantics than what we currently do. > >>>> At least for Array this is implemented as low-level bit operations and >>>> optimizations cannot reason about it (e.g. finding a reasonable >>>> RC-root for the buffer reference). >>>> >>>> Another thing is that we currently cannot use ``copy_on_write`` for Array >>>> because of pinning. Array pins it’s buffer when passing an element address >>>> to >>>> an inout parameter. This allows the array buffer to be modified even if its >>>> reference count is > 1. With ``copy_on_write``, a programmer could break >>>> memory >>>> safety when violating the inout rule. Example:: >>>> >>>> var arr = [MyClass()] // a global array >>>> >>>> foo(&arr[0]) // Pins the buffer of arr during the call >>>> >>>> func foo(_ x: inout MyClass) -> Int { >>>> let b = arr // The ref-count of the buffer is not incremented, >>>> because it is pinned! >>>> let r = b[0] // optimizer removes the retain of r because it >>>> thinks the following code cannot modify b >>>> arr.removeAll() // does not copy the array buffer and thus >>>> de-allocates r >>>> return r.i // use-after-free! >>>> } >>> >>> I only know of one way to resolve inout and pinning: >>> >>> * Semantically, references are replaced with a trap value when entering >>> an inout context so that all inout values are provably unique >>> references in the absence of unsafe code. We drop pinning and provide >>> explicit operations that provide simultaneous lvalue accesses to >>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >>> >>> If there are other ideas out there, I'd like to hear them. If not, we >>> should probably decide that this is what we're doing so that we can move >>> forward without this looming uncertainty. >>> >>> -- >>> -Dave >>> >>> _______________________________________________ >>> swift-dev mailing list >>> swift-dev@swift.org >>> https://lists.swift.org/mailman/listinfo/swift-dev >> > > -- > -Dave _______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev