> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev > <swift-dev@swift.org> wrote: > > > on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org > <http://swift-dev-at-swift.org/>> wrote: > >>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <dabrah...@apple.com> wrote: >>> >>> >>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com> wrote: >>> >> >>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev >>>> <swift-dev@swift.org> wrote: >>>> >>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >>>>> <http://swift-dev-at-swift.org/>> wrote: >>>>> >>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>>>>> <swift-dev@swift.org> wrote: >>>>>>> >>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>> depends on how we move forward with SIL ownership. >>>>>>> <CopyOnWrite.rst> >>>>>>> If you have any comments, please let me know. >>>>>> >>>>>> The SIL-level design seems sensible to me at a glance. At the language >>>>>> level, I think it would make more sense to treat this as an attribute >>>>>> on class types rather than on properties in structs using the class. I >>>>>> don't think many people reuse class definitions as both shared >>>>>> reference types and as value type payloads, >>>>> >>>>> Foundation does, or would if they could. >>>>> >>>>>> but beyond that, I think that making it an attribute of classes would >>>>>> put us into a better position to leverage the borrow model to enforce >>>>>> the "mutable-only-when-unique" aspect of COW implementations. John >>>>>> alluded to this in the "SIL address types and borrowing" thread: >>>>>> >>>>>>> I wonder if it would make more sense to make copy-on-write buffer >>>>>>> references a move-only type, so that as long as you were just >>>>>>> working with the raw reference (as opposed to the CoW aggregate, >>>>>>> which would remain copyable) it wouldn't get implicitly copied >>>>>>> anymore. You could have mutable and immutable buffer reference >>>>>>> types, both move-only, and there could be a consuming checkUnique >>>>>>> operation on the immutable one that, I dunno, returned an Either of >>>>>>> the mutable and immutable versions. >>>>>>> >>>>>>> For CoW aggregates, you'd need some @copied attribute on the field >>>>>>> to make sure that the CoW attribute was still copyable. Within the >>>>>>> implementation of the type, though, you would be projecting out the >>>>>>> reference immediately, and thereafter you'd be certain that you were >>>>>>> borrowing / moving it around as appropriate. >>>>>> >>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish >>>>>> unique and nonunique references to the class. A unique reference would >>>>>> act like a move-only type to prevent accidental loss of uniqueness. >>>>> >>>>> +1 >>>>> >>>>>> We can also allow a copy-on-write class to have "mutating" methods, >>>>>> and only allow mutation on unique references. It seems to me like, >>>>>> exploring this direction, we could also come up with a way for the >>>>>> high-level value-semantics operations on the struct to statically >>>>>> indicate which methods are known to leave the value's buffers in a >>>>>> unique state, or which return values that are uniquely owned, which >>>>>> would give the optimizer more ability to avoid uniqueness checks >>>>>> across calls without relying on inlining and IPO. >>>>> >>>>> That's pretty cool. However, I think there's nothing to prevent any >>>>> mutating method from storing a copy of self in a global, so I think we'd >>>>> need some participation from the programmer (either an agreement not to >>>>> do that, or an explicit claim of uniqueness on exit) in order to >>>>> identify operations that create/preserve uniqueness. >>>> >>>> If a mutating reference (like self in a mutating method) is move-only >>>> then you would not be able to “copy” it to a global. >>> >>> Yes, a reference to a move-only type would work for this purpose. >>> >>> >>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >>>>> <swift-dev@swift.org> wrote: >>>>> >>>>> >>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org> wrote: >>>>> >>>>>> This is a proposal for representing copy-on-write buffers in >>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>> depends on how we move forward with SIL ownership. >>>>>> >>>>>> :orphan: >>>>>> >>>>>> .. highlight:: sil >>>>>> >>>>>> =================================== >>>>>> Copy-On-Write Representation in SIL >>>>>> =================================== >>>>>> >>>>>> .. contents:: >>>>>> >>>>>> Overview >>>>>> ======== >>>>>> >>>>>> This document proposes: >>>>>> >>>>>> - An ownership attribute to define copy-on-write (COW) buffers in Swift >>>>>> data >>>>>> types. >>>>>> >>>>>> - A representation of COW buffers in SIL so that optimizations can take >>>>>> benefit >>>>>> of it. >>>>>> >>>>>> The basic idea is to enable the SIL optimizer to reason about COW data >>>>>> types >>>>>> in the same way as a programmer can do. >>>>>> This means: a COW buffer can only be modified by its owning SIL value, >>>>>> because >>>>>> either it's uniquely referenced or the buffer is copied before modified. >>>>>> >>>>>> .. note:: >>>>>> In the following the term "buffer" refers to a Swift heap object. >>>>>> It can be any heap object, not necessarily a “buffer” with e.g. >>>>>> tail-allocated elements. >>>>>> >>>>>> COW Types >>>>>> ========= >>>>>> >>>>>> The basic structure of COW data types can be simplified as follows:: >>>>>> >>>>>> class COWBuffer { >>>>>> var someData: Int >>>>>> ... >>>>>> } >>>>>> >>>>>> struct COWType { >>>>>> var b : COWBuffer >>>>>> >>>>>> mutating func change_it() { >>>>>> if (!isUniquelyReferenced(b)) { >>>>>> b = copy_buffer(b) >>>>>> } >>>>>> b.someData = ... >>>>>> } >>>>>> } >>>>>> >>>>>> Currently the COW behavior of such types is just defined by their >>>>>> implementation. >>>>>> But there is no representation of this special behavior in the SIL. >>>>>> So the SIL optimizer has no clue about it and cannot take advantage of >>>>>> it. >>>>>> >>>>>> For example:: >>>>>> >>>>>> func foo(arr : [Int]) { >>>>>> x = arr[0] >>>>>> opaque_function() >>>>>> y = arr[0] // can RLE replace this with y = x? >>>>>> } >>>>>> >>>>>> If opaque_function() wants to change the contents of the array buffer it >>>>>> first >>>>>> has to copy it. >>>>> >>>>> ...or determine that it's uniquely-referenced. >>>> >>>> In this specific example, if opqaue_function holds a reference to arr’s >>>> buffer, the buffer is not >>>> uniquely-referenced. >>> >>> Right. >>> >>>>> >>>>>> But the optimizer does not know it so it has to conservatively assume >>>>>> that opaque_function() will write to the location of arr[0]. >>>>>> >>>>>> Copy-on-write Ownership Attribute >>>>>> ================================= >>>>>> >>>>>> This section proposes an ownership attribute to define a copy-on-write >>>>>> buffer. >>>>>> >>>>>> Swift Syntax >>>>>> ------------ >>>>>> >>>>>> A COW buffer reference can be defined with a new ownership attribute for >>>>>> the >>>>>> buffer variable declaration (similar to “weak” and “unowned”):: >>>>>> >>>>>> struct COWType { >>>>>> copy_on_write var b : COWBuffer >>>>>> >>>>>> // ... >>>>>> } >>>>>> >>>>>> The ``copy_on_write`` attribute is purely used for optimization purposes. >>>>>> It does not change the semantics of the program. >>>>> >>>>> Presumably, it changes what code you can execute on `b` without invoking >>>>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >>>>> to do anything differently to take advantage of the annotation. >>>> >>>> That’s true. >>>> >>>>> What are the rules for writing code that uses `copy_on_write`? >>>> >>>> See below ("The rules for using ``copy_on_write`` and the built-ins are:”) >>> >>> Yeah, I got there, eventually. But just saying “doesn't change >>> semantics” at this point in the proposal leaves a gap, because it does >>> change semantic *requirements*. You should mention that. >>> >>>>>> .. note:: >>>>>> >>>>>> “copy_on_write” is a working title. TODO: decide on the name. >>>>>> Maybe it should be a @-attribute, like @copy_on_write? >>>>>> Another question is if we should open this attribute for the public or >>>>>> just >>>>>> use it internally in the library, because violating the implied rules >>>>>> (see below) could break memory safety. >>>>>> >>>>>> Implementation >>>>>> -------------- >>>>>> >>>>>> The ``copy_on_write`` references can be represented in the AST as a >>>>>> special >>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the >>>>>> referenced >>>>>> buffer class type. >>>>>> >>>>>> In SIL the buffer reference will have type:: >>>>>> >>>>>> $@sil_cow COWBuffer >>>>>> >>>>>> where ``COWBuffer`` is the type of the referenced heap object. >>>>>> >>>>>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>>>>> reference >>>>>> type to a regular reference type:: >>>>>> >>>>>> cow_to_ref >>>>>> ref_to_cow >>>>>> >>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>>>>> >>>>>> For example the SIL code for:: >>>>>> >>>>>> var c: COWType >>>>>> let x = c.b.someData >>>>>> >>>>>> would be:: >>>>>> >>>>>> %1 = struct_extract %1 : COWType, #COWType.b >>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>>>>> %4 = load %3 : $*Int >>>>>> >>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer reference >>>>>> into a >>>>>> COW type. >>>>>> >>>>>> COW Buffers and the Optimizer >>>>>> ============================= >>>>>> >>>>>> A reference to a COW buffer gives the optimizer additional information: >>>>>> >>>>>> *A buffer, referenced by a @sil_cow reference is considered to be >>>>>> immutable >>>>>> during the lifetime of the reference.* >>>>> >>>>> This seems like much too broad a rule to allow inplace mutations of >>>>> uniquely referenced buffers. >>>> >>>> The point is that all mutations must be guarded by an is_unique, which >>>> takes the _address_ of the buffer reference as argument. >>>> And the optimizer considers this instruction as a potential write to the >>>> buffer reference. >>>> The effect is that the lifetime of a buffer reference (as a SIL value) >>>> will not outlive a is_unique - regardless if this is inside a called >>>> function or inlined. >>> >>> I don't see how that allows me to mutate a uniquely referenced buffer held >>> in a @sil_cow reference, given what you wrote above. >> >> You would not be able to get a reference to a mutable buffer by >> reading the COW type’s @sil_cow field. Instead you would only get >> such a reference as a result of the is_unique instruction/builtin. Or, >> of course, by creating a new buffer. >> >> I’m not sure if this was the question, though. > > I think it just comes down to precise phrasing. AFAICT, what you really > mean to say is something like > > A buffer cannot be directly mutated through a @sil_cow reference; > instead one must mutate it indirectly via the result of is_unique or > start_unique. > > Saying that the buffer is “considered to be immmutable during the > lifetime of the reference” could be taken to mean that the compiler will > assume no mutations of the buffer can occur while the reference exists. > IIUC you are not planning to formally end the reference's lifetime at > the moment is_unique/start_unique returns.
To clarify: I proposed an alternate approach in which the @sil_cow reference is only mutable during the Array’s @inout scope—to be automatically enforced by the compiler once @inout scopes are enforced. But the text in question is not referring to that approach, so your comments are on target. -Andy >> >> Plus: we will have an explicit conversion instruction (start_unique) to >> convert an immutable >> reference to a mutable referece. >> A SIL optimization can replace an is_unique with this instruction if it can >> prove that the reference >> is already unique at that point. >> >>> >>> >>>>> Unless you mean the reference is >>>>> immutable, rather than the storage being referred to by it. >>>>> >>>>>> This means any address derived from a ``cow_to_ref`` instruction can be >>>>>> considered to point to immutable memory. >>>>>> >>>>>> Some examples of optimizations which will benefit from copy-on-write >>>>>> representation in SIL: >>>>>> >>>>>> - Redundant load elimination >>>>>> >>>>>> RLE can assume that opaque code does not modify a COW buffer. >>>>> >>>>> How do you distinguish “opaque code” from “code that is meant to >>>>> modify the buffer and might do so in place if it's uniquely-referenced?” >>>> >>>> Again, the is_unique which takes the address of the reference, will >>>> guarantee that during the lifetime of a buffer there are no >>>> modifications of the buffer. >>> >>> Again, that sounds like it rules out inplace modification of uniquely >>> referenced buffers. >>> >>>> >>>> >>>>> >>>>>> Example:: >>>>>> >>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>> %4 = load %3 : $*Int >>>>>> %5 = apply %foo() // Cannot overwrite memory >>>>>> location %3 >>>>>> %6 = load %3 : $*Int // Can be replaced by %4 >>>>>> >>>>>> Currently we do some ad-hoc optimizations for array, based on semantics, >>>>>> like array count propagation. These hacks would not be needed >>>>>> anymore. >>>>> >>>>> W0000000000000000000000t. >>>>> >>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a >>>>>> projected address) escapes. Even if it escapes, it will reference >>>>>> immutable >>>>>> memory. >>>>>> >>>>>> - CSE, loop hoisting >>>>>> >>>>>> Similar to RLE: the optimizer can assume that opaque code cannot modify a >>>>>> COW buffer >>>>> >>>>> Same question here as above, then. >>>>>> >>>>>> - ARC optimization >>>>>> >>>>>> Knowing that some opaque code cannot overwrite a reference in the COW >>>>>> buffer >>>>>> can remove retain/release pairs across such code:: >>>>>> >>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>>>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>>>>> [guarantee] >>>>>> %5 = apply %foo() // Cannot overwrite someRef >>>>>> and dealloc the object >>>>>> // Use %4 >>>>>> destroy_value %4 : $MyClass >>>>>> >>>>>> Scoping instructions >>>>>> -------------------- >>>>>> >>>>>> To let the optimizer reason about the immutability of the COW buffer, it >>>>>> is >>>>>> important to *bind* the lifetime of the buffer content to the lifetime >>>>>> of the >>>>>> buffer reference. For example:: >>>>>> >>>>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>> // load something from %b1 >>>>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>>>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>> again >>>>>> // load something from %b2 >>>>>> >>>>>> The question is: can RLE forward the load of the buffer reference and >>>>>> replace >>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` modifies >>>>>> the >>>>>> buffer. >>>>>> >>>>>> To enforce this restriction, the scope of any buffer modification must be >>>>>> enclosed in a pair of SIL instructions. Those instructions define the >>>>>> scope >>>>>> of the mutation. Both instructions take the *address* of the buffer >>>>>> reference as operand and act as a potential write to the buffer >>>>>> reference. >>>>>> >>>>>> The purpose of the scoping instructions is to strictly separate the >>>>>> liferanges >>>>>> of references to an immutable buffer and references to the mutable >>>>>> buffer. >>>>> >>>>> Looks reasonable. >>>>> >>>>>> The following example shows why the scoping instructions (specifically >>>>>> the >>>>>> end-of-scope instruction) are required to prevent loop-hoisting from >>>>>> interleaving mutable and immutable liferanges:: >>>>>> >>>>>> // there should be a begin-of-scope %baddr >>>>>> %mut_b = load %baddr >>>>>> store %x to %mut_b // modification of the buffer >>>>>> // there should be a end-of-scope %baddr >>>>>> >>>>>> loop { >>>>>> %b = load %baddr >>>>>> %y = load %b // load from the buffer >>>>>> ... >>>>>> } >>>>>> >>>>>> If there is no end-of-scope instruction, loop hoisting could do:: >>>>>> >>>>>> %mut_b = load %baddr >>>>>> %b = load %baddr // moved out of the loop >>>>>> store %x to %mut_b >>>>>> >>>>>> loop { >>>>>> %y = load %b >>>>>> ... >>>>>> } >>>>>> >>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer, so >>>>>> it could >>>>>> also hoist the load:: >>>>>> >>>>>> %mut_b = load %baddr >>>>>> %b = load %baddr >>>>>> %y = load %b // Wrong! Will be overwritten by the following >>>>>> store >>>>>> store %x to %mut_b >>>>>> >>>>>> loop { >>>>>> ... >>>>>> } >>>>>> >>>>>> >>>>>> The following sections describe two alternatives to implement the >>>>>> scoping. >>>>>> >>>>>> Scoping Alternative 1: Explicit Built-ins >>>>>> ----------------------------------------- >>>>>> >>>>>> SIL instructions >>>>>> ^^^^^^^^^^^^^^^^ >>>>>> >>>>>> The existing ``is_unique`` instruction is changed to a terminator >>>>>> instruction:: >>>>>> >>>>>> bb0: >>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >>>>>> address of the COWBuffer reference >>>>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >>>>>> reference. Physically identical to "load %0” >>>>>> // usually empty >>>>>> br bb3(%1 : $COWBuffer) >>>>>> bb2: // the false-block >>>>>> // usually contains: >>>>>> %2 = apply %copy_buffer >>>>>> %3 = cow_to_ref %2 >>>>>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>>>>> br bb3(%2 : $COWBuffer) >>>>>> bb3(%4 : $COWBuffer): >>>>>> // Modify the buffer referenced by %4 >>>>>> // ... >>>>>> >>>>>> The end-of-scope instruction is:: >>>>>> >>>>>> end_unique_addr %0 : $*COWBuffer >>>>>> >>>>>> It is important that the references to the unique buffers (``%1``, >>>>>> ``%2``) must >>>>>> not outlive ``end_unique_addr``. In most cases this can be check by the >>>>>> SIL >>>>>> verifier. >>>>>> >>>>>> The two instructions must be paired properly but not necessarily in the >>>>>> same function. >>>>>> >>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to >>>>>> separate the lifetimes of mutable and immutable accesses to the COW >>>>>> buffer. >>>>>> Both instructions take an address to the COW buffer reference and are >>>>>> considered as potential stores to the reference. >>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>>>>> lifetimes >>>>>> across these instructions. >>>>>> For example, RLE cannot combine two buffer loads which are interleaved >>>>>> with >>>>>> a ``is_unique_addr_br``:: >>>>>> >>>>>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>>>>> // do something with %1 >>>>>> … >>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>>>>> … >>>>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >>>>>> with %1 >>>>>> >>>>>> Another important thing is that the COW buffer can only be mutated by >>>>>> using the >>>>>> reference of the ``is_unique_addr_br`` true-block argument. >>>>>> The COW buffer cannot be modified by simply loading/extracting the >>>>>> reference >>>>>> from the COWType. >>>>>> Example:: >>>>>> >>>>>> %1 = load_strong %0 : $*COWBuffer >>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>> store %7 : $Int to %3 : $*Int // Violation! >>>>>> >>>>>> Most obvious violations to this constraint can be catched by the >>>>>> SILVerifier. >>>>>> >>>>>> The ``_addr`` variants of the instructions also have a non-addr >>>>>> counterpart:: >>>>>> >>>>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces >>>>>> the true-block arg as owned >>>>>> >>>>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned >>>>>> >>>>>> These instructions are generated by Mem2reg (or a similar optimization) >>>>>> in case the COW value is stored (in a temporary alloc_stack location) >>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and >>>>>> ``end_unique_addr``. >>>>>> For example in the following code, where the COW data is passed as-value >>>>>> and >>>>>> all the mutating functions are inlined:: >>>>>> >>>>>> func foo(arr : [Int], x: Int) { >>>>>> arr[0] = 27 >>>>>> … >>>>>> y = arr[x] >>>>>> … >>>>>> } >>>>>> >>>>>> Finally it’s probably a good idea to add an instruction for converting an >>>>>> immutable reference to a mutable reference:: >>>>>> >>>>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>>>>> $COWBuffer as owned >>>>>> >>>>>> which is basically just a simpler representation of the following >>>>>> pattern:: >>>>>> >>>>>> bb0: >>>>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>>>>> bb1(%1 : $COWBuffer): >>>>>> … // main control flow continues here >>>>>> bb2: >>>>>> unreachable >>>>>> >>>>>> An optimizations, which eliminate uniqueness checks, would replace a >>>>>> ``is_unique_br`` by a ``start_unique``. >>>>>> >>>>>> Built-ins >>>>>> ^^^^^^^^^ >>>>>> >>>>>> A COW type implementor can generate the new instructions by using a set >>>>>> of built-ins:: >>>>>> >>>>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>>>>> func endUnique<BufferType>(_ buffer: inout BufferType) >>>>>> >>>>>> For example:: >>>>>> >>>>>> struct COWType { >>>>>> copy_on_write var b : COWBuffer >>>>>> >>>>>> mutating func makeMutable() -> COWBuffer { >>>>>> if let uniqueBuffer = isUnique(&self.b) { >>>>>> return uniqueBuffer >>>>>> } >>>>>> let copiedBuffer = copyBuffer(self.b) >>>>>> self.b = copiedBuffer >>>>>> return copiedBuffer >>>>>> } >>>>>> >>>>>> mutating func setSomeData(x: Int) { >>>>>> let uniqueBuffer = makeMutable() >>>>>> uniqueBuffer.someData = x >>>>>> endUnique(&self.b) >>>>>> } >>>>>> } >>>>> >>>>> This seems reasonable, but it also looks like the compiler could do the >>>>> `endUnique` dance for us based, e.g., on the mutability of methods. >>>> >>>> I agree, that would be ideal, e.g. the compiler could insert the endUnique >>>> at the end of an inout >>>> scope. >>>> >>>>> >>>>>> The ``isUnique`` built-in returns an optional unique buffer reference. >>>>>> Physically this is the COW buffer which is passed as the inout argument. >>>>>> The result is nil if the buffer is not uniquely referenced. >>>>>> In this case usually the original buffer is copied and the reference to >>>>>> the >>>>>> copy is written back to the original buffer reference location >>>>>> (``self.b = copiedBuffer``). >>>>>> Starting at the point of the write-back, the reference to the copy also >>>>>> becomes >>>>>> a unique buffer reference. >>>>>> >>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` >>>>>> pattern which >>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` in an >>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` >>>>>> diamond. >>>>>> >>>>>> .. note:: >>>>>> This makes the definition of the unique buffer location lifetime a >>>>>> little bit >>>>>> problematic, because the false-branch of ``isUnique`` is not equivalent >>>>>> to >>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before >>>>>> SimplifyCFG >>>>>> can do its job). >>>>> >>>>> I don't know what the implications of these diamonds and the problem >>>>> described above might be, FWIW. >>>>> >>>>>> The rules for using ``copy_on_write`` and the built-ins are: >>>>>> >>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily >>>>>> in the >>>>>> same function. >>>>>> >>>>>> 2. The COW buffer may only be mutated by using the unique buffer >>>>>> reference. >>>>>> >>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>>>>> ``endUnique`` >>>>>> pair. >>>>>> >>>>>> 4. During the lifetime of the unique buffer reference, the original COW >>>>>> buffer >>>>>> reference must not be used in any way, e.g. for reading from the buffer. >>>>>> >>>>>> Note that the lifetime of the unique buffer reference does not include >>>>>> the >>>>>> part between the begin of the ``isUnique`` false-branch and the >>>>>> write-back >>>>>> of the copy. This means is okay to read from the buffer (using >>>>>> ``self.b``) >>>>>> for the purpose of copying. >>>>>> >>>>>> Examples:: >>>>>> >>>>>> mutating func setSomeData(x: Int) { >>>>>> let uniqueBuffer = makeMutable() >>>>>> uniqueBuffer.someData = x >>>>>> // violates rule 1 >>>>>> } >>>>>> >>>>>> mutating func setSomeData(x: Int) { >>>>>> makeMutable() >>>>>> self.b.someData = x // violates rule 2 >>>>>> endUnique(&self.b) >>>>>> } >>>>>> >>>>>> mutating func setSomeData(x: Int) { >>>>>> let uniqueBuffer = makeMutable() >>>>>> uniqueBuffer.someData = x >>>>>> endUnique(&self.b) >>>>>> uniqueBuffer.someData = 27 // violates rule 3 >>>>>> } >>>>>> >>>>>> mutating func incrementSomeData() { >>>>>> let uniqueBuffer = makeMutable() >>>>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>>>>> endUnique(&self.b) >>>>>> } >>>>> >>>>> It would be instructive to write down the *correct* code for these >>>>> operations. >>>> >>>> added to my todo list. >>>> >>>>> >>>>>> The intention of the rules is to ensure that there is no overlap of a >>>>>> "read-only" life-range with a "mutable" life-range of the buffer >>>>>> reference. >>>>>> It’s the responsibility of the implementor to follow the rules. >>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can >>>>>> statically detect rule violations in obvious cases (with inter-procedural >>>>>> analysis maybe even in most cases). >>>>>> >>>>>> This approach would require to change some of the internals of our >>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>>>>> For example, the Array make_mutable semantic functions currently do not >>>>>> return >>>>>> the unique buffer. >>>>> >>>>> No big deal. >>>>> >>>>>> Scoping Alternative 2: Implicit Inout Scopes >>>>>> -------------------------------------------- >>>>>> >>>>>> There is an idea (proposal?) to change the representation of inout >>>>>> variables >>>>>> in SIL. This is independent of this proposal, but can be helpful for the >>>>>> purpose of defining the scope of a COW mutation. >>>>>> >>>>>> The basic idea is that SILGen inserts scoping instructions for *all* >>>>>> inout >>>>>> variables. And those scoping instructions can be used to define the >>>>>> mutating >>>>>> scope of a COW buffer. >>>>>> >>>>>> The scoping instructions which are inserted by SILGen for an inout scope >>>>>> are:: >>>>>> >>>>>> begin_exclusive >>>>>> end_exclusive >>>>>> >>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>>>>> instructions take the >>>>>> address of the inout variable as argument. For the optimizer those >>>>>> instructions >>>>>> look like potential writes to the inout variable. >>>>>> >>>>>> The implementor of a COW type has to follow the rule that the COW buffer >>>>>> may >>>>>> only be modified in mutating functions of the COW type. But this is the >>>>>> case >>>>>> anyway because any modification needs a uniqueness check and this can >>>>>> only be >>>>>> done in mutating functions. >>>>>> >>>>>> Example:: >>>>>> >>>>>> // > mutating func setSomeData(x: Int) { >>>>>> // Accepts a unique reference to the array value (avoiding refcount >>>>>> operations) >>>>>> sil @setSomeData : $(Int, @inout Array) -> () { >>>>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>>>>> >>>>>> // > makeMutable() (inlined) >>>>>> // Forward the unique reference to the `self` array value, still >>>>>> avoiding refcount operations. >>>>>> // Begin the inlined exclusive scope (could be trivially removed). >>>>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>>>>> >>>>>> // > if !isUnique(&self._storage) { >>>>>> // Extract a unique inout reference to the class reference to the array >>>>>> storage. >>>>>> // This begins the isUnique() argument's exclusive scope. The memory is >>>>>> already exclusive >>>>>> // but the scope helps ensure this is the only alias to _storage. >>>>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>>>>> #Array._storage >>>>>> >>>>>> // Uniqueness checking requires an inout reference to the class >>>>>> reference. >>>>>> // The is_unique instruction does not need to create a new storage >>>>>> reference. >>>>>> // It's only purpose is to check the RC count, ensure that the checked >>>>>> reference >>>>>> // is inout, and prevent the inout scope from being optimized away. >>>>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>> >>>>>> // End the isUnique argument's exclusive scope (can also be trivially >>>>>> removed). >>>>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>> >>>>>> br %isuniq, bb_continue, bb_slow >>>>>> >>>>>> bb_slow: >>>>>> // > self._storage = copyBuffer(self._storage) >>>>>> // Produce a new class reference to storage with verifiably unique RC >>>>>> semantics. >>>>>> %copied_storage_class = alloc_ref ... >>>>>> // A begin/end exclusive scope is implicit in store [assign]. >>>>>> store [assign] %copied_storage_class to %arrayref._storageref >>>>>> br bb_continue >>>>>> >>>>>> bb_continue: >>>>>> >>>>>> // This marks the end of makeMutable's inout `self` scope. Because Array >>>>>> // contains a "copy_on_write" property, the SIL verifier needs to >>>>>> // prove that %arrayref.#_storage has not escaped at this point. This >>>>>> // is equivalent to checking that %arrayref itself is not copied, and >>>>>> // checking each projection of the "copy_on_write" storage property >>>>>> // (%arrayref._storageref) is not copied. Or, if any copies are present, >>>>>> // they must be consumed within this scope. >>>>>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>>>>> >>>>>> // > self._storage.someData = x >>>>>> // An _addr instruction with one load/store use doesn't really need its >>>>>> own scope. >>>>>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >>>>>> >>>>>> // ARC optimization can promote this to a borrow, replacing >>>>>> strong_release with end_borrow. >>>>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow >>>>>> ArrayStorage >>>>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>>>>> ArrayStorage >>>>>> >>>>>> // Write some data into the CoW buffer. >>>>>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>>>>> // A single-use _addr instruction, so no scope. >>>>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>>>>> // A store with an implicit [exclusive] scope. >>>>>> store [assign] %x to %somedata_addr >>>>>> >>>>>> strong_release %arrayref._storage : $*ArrayStorage<T> >>>>>> >>>>>> // End the isUnique argument's exclusive scope. >>>>>> // The same verification is needed here, but the inner scope would be >>>>>> eliminated. >>>>>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>>>>> >>>>>> In general this approach looks more "user-friendly" than the first >>>>>> alternative. >>>>> >>>>> Well, I can't really tell, because you haven't shown the Swift code that >>>>> generates this SIL. >>>>> >>>>>> But it depends on implementing the general feature to insert the inout >>>>>> scoping instructions. Also, we still have to think through all the >>>>>> details of this approach. >>>>> >>>>> FWIW, I am convinced we will need (and get) a stricter inout model that >>>>> would be conducive to inserting the scoping instructions. >>>>> >>>>> >>>>>> Dependency between a buffer reference to the scope-begin >>>>>> -------------------------------------------------------- >>>>> >>>>> You can only have a dependency between two things, but as phrased “a >>>>> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >>>>> would fix it. >>>>> >>>>>> With both alternatives there is no explicit dependency from a buffer >>>>>> reference >>>>>> to a scope-begin instruction:: >>>>>> >>>>>> %b_cow = load %baddr >>>>>> %b = cow_to_ref %b_cow >>>>>> %x = load %b // No dependency between this... >>>>>> ... >>>>>> begin_exclusive %baddr // ... and this instruction. >>>>>> ... >>>>>> >>>>>> So in theory the optimizer is free to reschedule the instructions:: >>>>>> >>>>>> %b_cow = load %baddr >>>>>> %b = cow_to_ref %b_cow >>>>>> ... >>>>>> begin_exclusive %baddr >>>>>> %x = load %b // Wrong! Buffer could be modified here >>>>>> ... >>>>>> >>>>>> We still have to figure out how to cope with this. >>>>>> >>>>>> - We could add an end-of-lifetime instruction for a COW buffer >>>>>> reference, which >>>>>> the optimizer may not move over a begin-of-scope instruction. >>>>>> >>>>>> - Or we just define the implicit rule for the optimizer that any use of >>>>>> a COW >>>>>> reference may not be moved over a begin-of-scope instruction. >>>>>> >>>>>> Preconditions >>>>>> ============= >>>>>> >>>>>> To benefit from COW optimizations in the stdlib Array, Set and >>>>>> Dictionary data >>>>>> structures we first need eager bridging, meaning getting rid of the >>>>>> bridged >>>>>> buffer. >>>>> >>>>> As you know I'm very much in favor of eager bridging, but I don't see >>>>> why this would be dependent on it. >>>> >>>> We could use copy_on_write with eager bridging, but I don’t think it will >>>> give any benefits to the >>>> optimizer. >>>> For example, the SIL code to get from an Array to a native >>>> ContiguousArrayStorage reference is pretty hard to understand for the >>>> optimizer (involves low level bit operations, etc.). >>> >>> It wouldn't need to do low-level bit operations if our enums were >>> capable/controllable enough. I'm just saying, there's no reason we >>> couldn't give the optimizer something to work with that has higher level >>> semantics than what we currently do. >>> >>>>>> At least for Array this is implemented as low-level bit operations and >>>>>> optimizations cannot reason about it (e.g. finding a reasonable >>>>>> RC-root for the buffer reference). >>>>>> >>>>>> Another thing is that we currently cannot use ``copy_on_write`` for Array >>>>>> because of pinning. Array pins it’s buffer when passing an element >>>>>> address to >>>>>> an inout parameter. This allows the array buffer to be modified even if >>>>>> its >>>>>> reference count is > 1. With ``copy_on_write``, a programmer could break >>>>>> memory >>>>>> safety when violating the inout rule. Example:: >>>>>> >>>>>> var arr = [MyClass()] // a global array >>>>>> >>>>>> foo(&arr[0]) // Pins the buffer of arr during the call >>>>>> >>>>>> func foo(_ x: inout MyClass) -> Int { >>>>>> let b = arr // The ref-count of the buffer is not incremented, >>>>>> because it is pinned! >>>>>> let r = b[0] // optimizer removes the retain of r because it >>>>>> thinks the following code cannot modify b >>>>>> arr.removeAll() // does not copy the array buffer and thus >>>>>> de-allocates r >>>>>> return r.i // use-after-free! >>>>>> } >>>>> >>>>> I only know of one way to resolve inout and pinning: >>>>> >>>>> * Semantically, references are replaced with a trap value when entering >>>>> an inout context so that all inout values are provably unique >>>>> references in the absence of unsafe code. We drop pinning and provide >>>>> explicit operations that provide simultaneous lvalue accesses to >>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >>>>> >>>>> If there are other ideas out there, I'd like to hear them. If not, we >>>>> should probably decide that this is what we're doing so that we can move >>>>> forward without this looming uncertainty. >>>>> >>>>> -- >>>>> -Dave >>>>> >>>>> _______________________________________________ >>>>> swift-dev mailing list >>>>> swift-dev@swift.org >>>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>> >>> >>> -- >>> -Dave >> >> _______________________________________________ >> swift-dev mailing list >> swift-dev@swift.org >> https://lists.swift.org/mailman/listinfo/swift-dev > > -- > -Dave > > _______________________________________________ > swift-dev mailing list > swift-dev@swift.org <mailto:swift-dev@swift.org> > https://lists.swift.org/mailman/listinfo/swift-dev > <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev