> On Oct 19, 2016, at 6:36 PM, Andrew Trick via swift-dev <swift-dev@swift.org> > wrote: > >> >> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev >> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote: >> >> >> on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org >> <http://swift-dev-at-swift.org/>> wrote: >> >>>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <dabrah...@apple.com >>>> <mailto:dabrah...@apple.com>> wrote: >>>> >>>> >>>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com >>>> <http://eeckstein-at-apple.com/>> wrote: >>>> >>> >>>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev >>>>> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote: >>>>> >>>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org >>>>>> <http://swift-dev-at-swift.org/> <http://swift-dev-at-swift.org/ >>>>>> <http://swift-dev-at-swift.org/>>> wrote: >>>>>> >>>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev >>>>>>>> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote: >>>>>>>> >>>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>>> depends on how we move forward with SIL ownership. >>>>>>>> <CopyOnWrite.rst> >>>>>>>> If you have any comments, please let me know. >>>>>>> >>>>>>> The SIL-level design seems sensible to me at a glance. At the language >>>>>>> level, I think it would make more sense to treat this as an attribute >>>>>>> on class types rather than on properties in structs using the class. I >>>>>>> don't think many people reuse class definitions as both shared >>>>>>> reference types and as value type payloads, >>>>>> >>>>>> Foundation does, or would if they could. >>>>>> >>>>>>> but beyond that, I think that making it an attribute of classes would >>>>>>> put us into a better position to leverage the borrow model to enforce >>>>>>> the "mutable-only-when-unique" aspect of COW implementations. John >>>>>>> alluded to this in the "SIL address types and borrowing" thread: >>>>>>> >>>>>>>> I wonder if it would make more sense to make copy-on-write buffer >>>>>>>> references a move-only type, so that as long as you were just >>>>>>>> working with the raw reference (as opposed to the CoW aggregate, >>>>>>>> which would remain copyable) it wouldn't get implicitly copied >>>>>>>> anymore. You could have mutable and immutable buffer reference >>>>>>>> types, both move-only, and there could be a consuming checkUnique >>>>>>>> operation on the immutable one that, I dunno, returned an Either of >>>>>>>> the mutable and immutable versions. >>>>>>>> >>>>>>>> For CoW aggregates, you'd need some @copied attribute on the field >>>>>>>> to make sure that the CoW attribute was still copyable. Within the >>>>>>>> implementation of the type, though, you would be projecting out the >>>>>>>> reference immediately, and thereafter you'd be certain that you were >>>>>>>> borrowing / moving it around as appropriate. >>>>>>> >>>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish >>>>>>> unique and nonunique references to the class. A unique reference would >>>>>>> act like a move-only type to prevent accidental loss of uniqueness. >>>>>> >>>>>> +1 >>>>>> >>>>>>> We can also allow a copy-on-write class to have "mutating" methods, >>>>>>> and only allow mutation on unique references. It seems to me like, >>>>>>> exploring this direction, we could also come up with a way for the >>>>>>> high-level value-semantics operations on the struct to statically >>>>>>> indicate which methods are known to leave the value's buffers in a >>>>>>> unique state, or which return values that are uniquely owned, which >>>>>>> would give the optimizer more ability to avoid uniqueness checks >>>>>>> across calls without relying on inlining and IPO. >>>>>> >>>>>> That's pretty cool. However, I think there's nothing to prevent any >>>>>> mutating method from storing a copy of self in a global, so I think we'd >>>>>> need some participation from the programmer (either an agreement not to >>>>>> do that, or an explicit claim of uniqueness on exit) in order to >>>>>> identify operations that create/preserve uniqueness. >>>>> >>>>> If a mutating reference (like self in a mutating method) is move-only >>>>> then you would not be able to “copy” it to a global. >>>> >>>> Yes, a reference to a move-only type would work for this purpose. >>>> >>>> >>>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev >>>>>> <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote: >>>>>> >>>>>> >>>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org >>>>>> <http://swift-dev-at-swift.org/>> wrote: >>>>>> >>>>>>> This is a proposal for representing copy-on-write buffers in >>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily >>>>>>> depends on how we move forward with SIL ownership. >>>>>>> >>>>>>> :orphan: >>>>>>> >>>>>>> .. highlight:: sil >>>>>>> >>>>>>> =================================== >>>>>>> Copy-On-Write Representation in SIL >>>>>>> =================================== >>>>>>> >>>>>>> .. contents:: >>>>>>> >>>>>>> Overview >>>>>>> ======== >>>>>>> >>>>>>> This document proposes: >>>>>>> >>>>>>> - An ownership attribute to define copy-on-write (COW) buffers in Swift >>>>>>> data >>>>>>> types. >>>>>>> >>>>>>> - A representation of COW buffers in SIL so that optimizations can take >>>>>>> benefit >>>>>>> of it. >>>>>>> >>>>>>> The basic idea is to enable the SIL optimizer to reason about COW data >>>>>>> types >>>>>>> in the same way as a programmer can do. >>>>>>> This means: a COW buffer can only be modified by its owning SIL value, >>>>>>> because >>>>>>> either it's uniquely referenced or the buffer is copied before modified. >>>>>>> >>>>>>> .. note:: >>>>>>> In the following the term "buffer" refers to a Swift heap object. >>>>>>> It can be any heap object, not necessarily a “buffer” with e.g. >>>>>>> tail-allocated elements. >>>>>>> >>>>>>> COW Types >>>>>>> ========= >>>>>>> >>>>>>> The basic structure of COW data types can be simplified as follows:: >>>>>>> >>>>>>> class COWBuffer { >>>>>>> var someData: Int >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> struct COWType { >>>>>>> var b : COWBuffer >>>>>>> >>>>>>> mutating func change_it() { >>>>>>> if (!isUniquelyReferenced(b)) { >>>>>>> b = copy_buffer(b) >>>>>>> } >>>>>>> b.someData = ... >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Currently the COW behavior of such types is just defined by their >>>>>>> implementation. >>>>>>> But there is no representation of this special behavior in the SIL. >>>>>>> So the SIL optimizer has no clue about it and cannot take advantage of >>>>>>> it. >>>>>>> >>>>>>> For example:: >>>>>>> >>>>>>> func foo(arr : [Int]) { >>>>>>> x = arr[0] >>>>>>> opaque_function() >>>>>>> y = arr[0] // can RLE replace this with y = x? >>>>>>> } >>>>>>> >>>>>>> If opaque_function() wants to change the contents of the array buffer >>>>>>> it first >>>>>>> has to copy it. >>>>>> >>>>>> ...or determine that it's uniquely-referenced. >>>>> >>>>> In this specific example, if opqaue_function holds a reference to arr’s >>>>> buffer, the buffer is not >>>>> uniquely-referenced. >>>> >>>> Right. >>>> >>>>>> >>>>>>> But the optimizer does not know it so it has to conservatively assume >>>>>>> that opaque_function() will write to the location of arr[0]. >>>>>>> >>>>>>> Copy-on-write Ownership Attribute >>>>>>> ================================= >>>>>>> >>>>>>> This section proposes an ownership attribute to define a copy-on-write >>>>>>> buffer. >>>>>>> >>>>>>> Swift Syntax >>>>>>> ------------ >>>>>>> >>>>>>> A COW buffer reference can be defined with a new ownership attribute >>>>>>> for the >>>>>>> buffer variable declaration (similar to “weak” and “unowned”):: >>>>>>> >>>>>>> struct COWType { >>>>>>> copy_on_write var b : COWBuffer >>>>>>> >>>>>>> // ... >>>>>>> } >>>>>>> >>>>>>> The ``copy_on_write`` attribute is purely used for optimization >>>>>>> purposes. >>>>>>> It does not change the semantics of the program. >>>>>> >>>>>> Presumably, it changes what code you can execute on `b` without invoking >>>>>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able >>>>>> to do anything differently to take advantage of the annotation. >>>>> >>>>> That’s true. >>>>> >>>>>> What are the rules for writing code that uses `copy_on_write`? >>>>> >>>>> See below ("The rules for using ``copy_on_write`` and the built-ins are:”) >>>> >>>> Yeah, I got there, eventually. But just saying “doesn't change >>>> semantics” at this point in the proposal leaves a gap, because it does >>>> change semantic *requirements*. You should mention that. >>>> >>>>>>> .. note:: >>>>>>> >>>>>>> “copy_on_write” is a working title. TODO: decide on the name. >>>>>>> Maybe it should be a @-attribute, like @copy_on_write? >>>>>>> Another question is if we should open this attribute for the public or >>>>>>> just >>>>>>> use it internally in the library, because violating the implied rules >>>>>>> (see below) could break memory safety. >>>>>>> >>>>>>> Implementation >>>>>>> -------------- >>>>>>> >>>>>>> The ``copy_on_write`` references can be represented in the AST as a >>>>>>> special >>>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented. >>>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the >>>>>>> referenced >>>>>>> buffer class type. >>>>>>> >>>>>>> In SIL the buffer reference will have type:: >>>>>>> >>>>>>> $@sil_cow COWBuffer >>>>>>> >>>>>>> where ``COWBuffer`` is the type of the referenced heap object. >>>>>>> >>>>>>> Two conversion instructions are needed to convert from a ``@sil_cow`` >>>>>>> reference >>>>>>> type to a regular reference type:: >>>>>>> >>>>>>> cow_to_ref >>>>>>> ref_to_cow >>>>>>> >>>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``. >>>>>>> >>>>>>> For example the SIL code for:: >>>>>>> >>>>>>> var c: COWType >>>>>>> let x = c.b.someData >>>>>>> >>>>>>> would be:: >>>>>>> >>>>>>> %1 = struct_extract %1 : COWType, #COWType.b >>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData >>>>>>> %4 = load %3 : $*Int >>>>>>> >>>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer >>>>>>> reference into a >>>>>>> COW type. >>>>>>> >>>>>>> COW Buffers and the Optimizer >>>>>>> ============================= >>>>>>> >>>>>>> A reference to a COW buffer gives the optimizer additional information: >>>>>>> >>>>>>> *A buffer, referenced by a @sil_cow reference is considered to be >>>>>>> immutable >>>>>>> during the lifetime of the reference.* >>>>>> >>>>>> This seems like much too broad a rule to allow inplace mutations of >>>>>> uniquely referenced buffers. >>>>> >>>>> The point is that all mutations must be guarded by an is_unique, which >>>>> takes the _address_ of the buffer reference as argument. >>>>> And the optimizer considers this instruction as a potential write to the >>>>> buffer reference. >>>>> The effect is that the lifetime of a buffer reference (as a SIL value) >>>>> will not outlive a is_unique - regardless if this is inside a called >>>>> function or inlined. >>>> >>>> I don't see how that allows me to mutate a uniquely referenced buffer held >>>> in a @sil_cow reference, given what you wrote above. >>> >>> You would not be able to get a reference to a mutable buffer by >>> reading the COW type’s @sil_cow field. Instead you would only get >>> such a reference as a result of the is_unique instruction/builtin. Or, >>> of course, by creating a new buffer. >>> >>> I’m not sure if this was the question, though. >> >> I think it just comes down to precise phrasing. AFAICT, what you really >> mean to say is something like >> >> A buffer cannot be directly mutated through a @sil_cow reference; >> instead one must mutate it indirectly via the result of is_unique or >> start_unique.
Exactly, that’s what I wanted to say. >> >> Saying that the buffer is “considered to be immmutable during the >> lifetime of the reference” could be taken to mean that the compiler will >> assume no mutations of the buffer can occur while the reference exists. >> IIUC you are not planning to formally end the reference's lifetime at >> the moment is_unique/start_unique returns. > > To clarify: I proposed an alternate approach in which the @sil_cow reference > is only mutable during the Array’s @inout scope—to be automatically enforced > by the compiler once @inout scopes are enforced. But the text in question is > not referring to that approach, so your comments are on target. After thinking about Joe’s suggestion (having the cow attribute on the class type and make a reference to that type move-only), I’m more inclined to go with the isUnique builtin. If such a reference can only be returned by isUnique, it is really guaranteed that only a uniquely referenced buffer can be mutated. With the inout approach, the programmer is not forced to make the uniqueness check before modifying the buffer. > -Andy > >>> >>> Plus: we will have an explicit conversion instruction (start_unique) to >>> convert an immutable >>> reference to a mutable referece. >>> A SIL optimization can replace an is_unique with this instruction if it can >>> prove that the reference >>> is already unique at that point. >>> >>>> >>>> >>>>>> Unless you mean the reference is >>>>>> immutable, rather than the storage being referred to by it. >>>>>> >>>>>>> This means any address derived from a ``cow_to_ref`` instruction can be >>>>>>> considered to point to immutable memory. >>>>>>> >>>>>>> Some examples of optimizations which will benefit from copy-on-write >>>>>>> representation in SIL: >>>>>>> >>>>>>> - Redundant load elimination >>>>>>> >>>>>>> RLE can assume that opaque code does not modify a COW buffer. >>>>>> >>>>>> How do you distinguish “opaque code” from “code that is meant to >>>>>> modify the buffer and might do so in place if it's uniquely-referenced?” >>>>> >>>>> Again, the is_unique which takes the address of the reference, will >>>>> guarantee that during the lifetime of a buffer there are no >>>>> modifications of the buffer. >>>> >>>> Again, that sounds like it rules out inplace modification of uniquely >>>> referenced buffers. >>>> >>>>> >>>>> >>>>>> >>>>>>> Example:: >>>>>>> >>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>> %4 = load %3 : $*Int >>>>>>> %5 = apply %foo() // Cannot overwrite memory >>>>>>> location %3 >>>>>>> %6 = load %3 : $*Int // Can be replaced by %4 >>>>>>> >>>>>>> Currently we do some ad-hoc optimizations for array, based on semantics, >>>>>>> like array count propagation. These hacks would not be needed >>>>>>> anymore. >>>>>> >>>>>> W0000000000000000000000t. >>>>>> >>>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a >>>>>>> projected address) escapes. Even if it escapes, it will reference >>>>>>> immutable >>>>>>> memory. >>>>>>> >>>>>>> - CSE, loop hoisting >>>>>>> >>>>>>> Similar to RLE: the optimizer can assume that opaque code cannot modify >>>>>>> a >>>>>>> COW buffer >>>>>> >>>>>> Same question here as above, then. >>>>>>> >>>>>>> - ARC optimization >>>>>>> >>>>>>> Knowing that some opaque code cannot overwrite a reference in the COW >>>>>>> buffer >>>>>>> can remove retain/release pairs across such code:: >>>>>>> >>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef >>>>>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong >>>>>>> [guarantee] >>>>>>> %5 = apply %foo() // Cannot overwrite someRef >>>>>>> and dealloc the object >>>>>>> // Use %4 >>>>>>> destroy_value %4 : $MyClass >>>>>>> >>>>>>> Scoping instructions >>>>>>> -------------------- >>>>>>> >>>>>>> To let the optimizer reason about the immutability of the COW buffer, >>>>>>> it is >>>>>>> important to *bind* the lifetime of the buffer content to the lifetime >>>>>>> of the >>>>>>> buffer reference. For example:: >>>>>>> >>>>>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>>> // load something from %b1 >>>>>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer) >>>>>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference >>>>>>> again >>>>>>> // load something from %b2 >>>>>>> >>>>>>> The question is: can RLE forward the load of the buffer reference and >>>>>>> replace >>>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()`` >>>>>>> modifies the >>>>>>> buffer. >>>>>>> >>>>>>> To enforce this restriction, the scope of any buffer modification must >>>>>>> be >>>>>>> enclosed in a pair of SIL instructions. Those instructions define the >>>>>>> scope >>>>>>> of the mutation. Both instructions take the *address* of the buffer >>>>>>> reference as operand and act as a potential write to the buffer >>>>>>> reference. >>>>>>> >>>>>>> The purpose of the scoping instructions is to strictly separate the >>>>>>> liferanges >>>>>>> of references to an immutable buffer and references to the mutable >>>>>>> buffer. >>>>>> >>>>>> Looks reasonable. >>>>>> >>>>>>> The following example shows why the scoping instructions (specifically >>>>>>> the >>>>>>> end-of-scope instruction) are required to prevent loop-hoisting from >>>>>>> interleaving mutable and immutable liferanges:: >>>>>>> >>>>>>> // there should be a begin-of-scope %baddr >>>>>>> %mut_b = load %baddr >>>>>>> store %x to %mut_b // modification of the buffer >>>>>>> // there should be a end-of-scope %baddr >>>>>>> >>>>>>> loop { >>>>>>> %b = load %baddr >>>>>>> %y = load %b // load from the buffer >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> If there is no end-of-scope instruction, loop hoisting could do:: >>>>>>> >>>>>>> %mut_b = load %baddr >>>>>>> %b = load %baddr // moved out of the loop >>>>>>> store %x to %mut_b >>>>>>> >>>>>>> loop { >>>>>>> %y = load %b >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer, >>>>>>> so it could >>>>>>> also hoist the load:: >>>>>>> >>>>>>> %mut_b = load %baddr >>>>>>> %b = load %baddr >>>>>>> %y = load %b // Wrong! Will be overwritten by the following >>>>>>> store >>>>>>> store %x to %mut_b >>>>>>> >>>>>>> loop { >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> >>>>>>> The following sections describe two alternatives to implement the >>>>>>> scoping. >>>>>>> >>>>>>> Scoping Alternative 1: Explicit Built-ins >>>>>>> ----------------------------------------- >>>>>>> >>>>>>> SIL instructions >>>>>>> ^^^^^^^^^^^^^^^^ >>>>>>> >>>>>>> The existing ``is_unique`` instruction is changed to a terminator >>>>>>> instruction:: >>>>>>> >>>>>>> bb0: >>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the >>>>>>> address of the COWBuffer reference >>>>>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique >>>>>>> reference. Physically identical to "load %0” >>>>>>> // usually empty >>>>>>> br bb3(%1 : $COWBuffer) >>>>>>> bb2: // the false-block >>>>>>> // usually contains: >>>>>>> %2 = apply %copy_buffer >>>>>>> %3 = cow_to_ref %2 >>>>>>> store_strong %3 to %0 : $*@sil_cow COWBuffer >>>>>>> br bb3(%2 : $COWBuffer) >>>>>>> bb3(%4 : $COWBuffer): >>>>>>> // Modify the buffer referenced by %4 >>>>>>> // ... >>>>>>> >>>>>>> The end-of-scope instruction is:: >>>>>>> >>>>>>> end_unique_addr %0 : $*COWBuffer >>>>>>> >>>>>>> It is important that the references to the unique buffers (``%1``, >>>>>>> ``%2``) must >>>>>>> not outlive ``end_unique_addr``. In most cases this can be check by the >>>>>>> SIL >>>>>>> verifier. >>>>>>> >>>>>>> The two instructions must be paired properly but not necessarily in the >>>>>>> same function. >>>>>>> >>>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to >>>>>>> separate the lifetimes of mutable and immutable accesses to the COW >>>>>>> buffer. >>>>>>> Both instructions take an address to the COW buffer reference and are >>>>>>> considered as potential stores to the reference. >>>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference >>>>>>> lifetimes >>>>>>> across these instructions. >>>>>>> For example, RLE cannot combine two buffer loads which are interleaved >>>>>>> with >>>>>>> a ``is_unique_addr_br``:: >>>>>>> >>>>>>> %1 = load_strong %0 : $*@sil_cow COWBuffer >>>>>>> // do something with %1 >>>>>>> … >>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer >>>>>>> … >>>>>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this >>>>>>> with %1 >>>>>>> >>>>>>> Another important thing is that the COW buffer can only be mutated by >>>>>>> using the >>>>>>> reference of the ``is_unique_addr_br`` true-block argument. >>>>>>> The COW buffer cannot be modified by simply loading/extracting the >>>>>>> reference >>>>>>> from the COWType. >>>>>>> Example:: >>>>>>> >>>>>>> %1 = load_strong %0 : $*COWBuffer >>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer >>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData >>>>>>> store %7 : $Int to %3 : $*Int // Violation! >>>>>>> >>>>>>> Most obvious violations to this constraint can be catched by the >>>>>>> SILVerifier. >>>>>>> >>>>>>> The ``_addr`` variants of the instructions also have a non-addr >>>>>>> counterpart:: >>>>>>> >>>>>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces >>>>>>> the true-block arg as owned >>>>>>> >>>>>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned >>>>>>> >>>>>>> These instructions are generated by Mem2reg (or a similar optimization) >>>>>>> in case the COW value is stored (in a temporary alloc_stack location) >>>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and >>>>>>> ``end_unique_addr``. >>>>>>> For example in the following code, where the COW data is passed >>>>>>> as-value and >>>>>>> all the mutating functions are inlined:: >>>>>>> >>>>>>> func foo(arr : [Int], x: Int) { >>>>>>> arr[0] = 27 >>>>>>> … >>>>>>> y = arr[x] >>>>>>> … >>>>>>> } >>>>>>> >>>>>>> Finally it’s probably a good idea to add an instruction for converting >>>>>>> an >>>>>>> immutable reference to a mutable reference:: >>>>>>> >>>>>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 : >>>>>>> $COWBuffer as owned >>>>>>> >>>>>>> which is basically just a simpler representation of the following >>>>>>> pattern:: >>>>>>> >>>>>>> bb0: >>>>>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2 >>>>>>> bb1(%1 : $COWBuffer): >>>>>>> … // main control flow continues here >>>>>>> bb2: >>>>>>> unreachable >>>>>>> >>>>>>> An optimizations, which eliminate uniqueness checks, would replace a >>>>>>> ``is_unique_br`` by a ``start_unique``. >>>>>>> >>>>>>> Built-ins >>>>>>> ^^^^^^^^^ >>>>>>> >>>>>>> A COW type implementor can generate the new instructions by using a set >>>>>>> of built-ins:: >>>>>>> >>>>>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType? >>>>>>> func endUnique<BufferType>(_ buffer: inout BufferType) >>>>>>> >>>>>>> For example:: >>>>>>> >>>>>>> struct COWType { >>>>>>> copy_on_write var b : COWBuffer >>>>>>> >>>>>>> mutating func makeMutable() -> COWBuffer { >>>>>>> if let uniqueBuffer = isUnique(&self.b) { >>>>>>> return uniqueBuffer >>>>>>> } >>>>>>> let copiedBuffer = copyBuffer(self.b) >>>>>>> self.b = copiedBuffer >>>>>>> return copiedBuffer >>>>>>> } >>>>>>> >>>>>>> mutating func setSomeData(x: Int) { >>>>>>> let uniqueBuffer = makeMutable() >>>>>>> uniqueBuffer.someData = x >>>>>>> endUnique(&self.b) >>>>>>> } >>>>>>> } >>>>>> >>>>>> This seems reasonable, but it also looks like the compiler could do the >>>>>> `endUnique` dance for us based, e.g., on the mutability of methods. >>>>> >>>>> I agree, that would be ideal, e.g. the compiler could insert the >>>>> endUnique at the end of an inout >>>>> scope. >>>>> >>>>>> >>>>>>> The ``isUnique`` built-in returns an optional unique buffer reference. >>>>>>> Physically this is the COW buffer which is passed as the inout argument. >>>>>>> The result is nil if the buffer is not uniquely referenced. >>>>>>> In this case usually the original buffer is copied and the reference to >>>>>>> the >>>>>>> copy is written back to the original buffer reference location >>>>>>> (``self.b = copiedBuffer``). >>>>>>> Starting at the point of the write-back, the reference to the copy also >>>>>>> becomes >>>>>>> a unique buffer reference. >>>>>>> >>>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br`` >>>>>>> pattern which >>>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` in >>>>>>> an >>>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds". >>>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br`` >>>>>>> diamond. >>>>>>> >>>>>>> .. note:: >>>>>>> This makes the definition of the unique buffer location lifetime a >>>>>>> little bit >>>>>>> problematic, because the false-branch of ``isUnique`` is not equivalent >>>>>>> to >>>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before >>>>>>> SimplifyCFG >>>>>>> can do its job). >>>>>> >>>>>> I don't know what the implications of these diamonds and the problem >>>>>> described above might be, FWIW. >>>>>> >>>>>>> The rules for using ``copy_on_write`` and the built-ins are: >>>>>>> >>>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily >>>>>>> in the >>>>>>> same function. >>>>>>> >>>>>>> 2. The COW buffer may only be mutated by using the unique buffer >>>>>>> reference. >>>>>>> >>>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` - >>>>>>> ``endUnique`` >>>>>>> pair. >>>>>>> >>>>>>> 4. During the lifetime of the unique buffer reference, the original COW >>>>>>> buffer >>>>>>> reference must not be used in any way, e.g. for reading from the buffer. >>>>>>> >>>>>>> Note that the lifetime of the unique buffer reference does not include >>>>>>> the >>>>>>> part between the begin of the ``isUnique`` false-branch and the >>>>>>> write-back >>>>>>> of the copy. This means is okay to read from the buffer (using >>>>>>> ``self.b``) >>>>>>> for the purpose of copying. >>>>>>> >>>>>>> Examples:: >>>>>>> >>>>>>> mutating func setSomeData(x: Int) { >>>>>>> let uniqueBuffer = makeMutable() >>>>>>> uniqueBuffer.someData = x >>>>>>> // violates rule 1 >>>>>>> } >>>>>>> >>>>>>> mutating func setSomeData(x: Int) { >>>>>>> makeMutable() >>>>>>> self.b.someData = x // violates rule 2 >>>>>>> endUnique(&self.b) >>>>>>> } >>>>>>> >>>>>>> mutating func setSomeData(x: Int) { >>>>>>> let uniqueBuffer = makeMutable() >>>>>>> uniqueBuffer.someData = x >>>>>>> endUnique(&self.b) >>>>>>> uniqueBuffer.someData = 27 // violates rule 3 >>>>>>> } >>>>>>> >>>>>>> mutating func incrementSomeData() { >>>>>>> let uniqueBuffer = makeMutable() >>>>>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4 >>>>>>> endUnique(&self.b) >>>>>>> } >>>>>> >>>>>> It would be instructive to write down the *correct* code for these >>>>>> operations. >>>>> >>>>> added to my todo list. >>>>> >>>>>> >>>>>>> The intention of the rules is to ensure that there is no overlap of a >>>>>>> "read-only" life-range with a "mutable" life-range of the buffer >>>>>>> reference. >>>>>>> It’s the responsibility of the implementor to follow the rules. >>>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can >>>>>>> statically detect rule violations in obvious cases (with >>>>>>> inter-procedural >>>>>>> analysis maybe even in most cases). >>>>>>> >>>>>>> This approach would require to change some of the internals of our >>>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.). >>>>>>> For example, the Array make_mutable semantic functions currently do not >>>>>>> return >>>>>>> the unique buffer. >>>>>> >>>>>> No big deal. >>>>>> >>>>>>> Scoping Alternative 2: Implicit Inout Scopes >>>>>>> -------------------------------------------- >>>>>>> >>>>>>> There is an idea (proposal?) to change the representation of inout >>>>>>> variables >>>>>>> in SIL. This is independent of this proposal, but can be helpful for the >>>>>>> purpose of defining the scope of a COW mutation. >>>>>>> >>>>>>> The basic idea is that SILGen inserts scoping instructions for *all* >>>>>>> inout >>>>>>> variables. And those scoping instructions can be used to define the >>>>>>> mutating >>>>>>> scope of a COW buffer. >>>>>>> >>>>>>> The scoping instructions which are inserted by SILGen for an inout >>>>>>> scope are:: >>>>>>> >>>>>>> begin_exclusive >>>>>>> end_exclusive >>>>>>> >>>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those >>>>>>> instructions take the >>>>>>> address of the inout variable as argument. For the optimizer those >>>>>>> instructions >>>>>>> look like potential writes to the inout variable. >>>>>>> >>>>>>> The implementor of a COW type has to follow the rule that the COW >>>>>>> buffer may >>>>>>> only be modified in mutating functions of the COW type. But this is the >>>>>>> case >>>>>>> anyway because any modification needs a uniqueness check and this can >>>>>>> only be >>>>>>> done in mutating functions. >>>>>>> >>>>>>> Example:: >>>>>>> >>>>>>> // > mutating func setSomeData(x: Int) { >>>>>>> // Accepts a unique reference to the array value (avoiding refcount >>>>>>> operations) >>>>>>> sil @setSomeData : $(Int, @inout Array) -> () { >>>>>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0 >>>>>>> >>>>>>> // > makeMutable() (inlined) >>>>>>> // Forward the unique reference to the `self` array value, still >>>>>>> avoiding refcount operations. >>>>>>> // Begin the inlined exclusive scope (could be trivially removed). >>>>>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1 >>>>>>> >>>>>>> // > if !isUnique(&self._storage) { >>>>>>> // Extract a unique inout reference to the class reference to the >>>>>>> array storage. >>>>>>> // This begins the isUnique() argument's exclusive scope. The memory >>>>>>> is already exclusive >>>>>>> // but the scope helps ensure this is the only alias to _storage. >>>>>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref, >>>>>>> #Array._storage >>>>>>> >>>>>>> // Uniqueness checking requires an inout reference to the class >>>>>>> reference. >>>>>>> // The is_unique instruction does not need to create a new storage >>>>>>> reference. >>>>>>> // It's only purpose is to check the RC count, ensure that the checked >>>>>>> reference >>>>>>> // is inout, and prevent the inout scope from being optimized away. >>>>>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>>> >>>>>>> // End the isUnique argument's exclusive scope (can also be trivially >>>>>>> removed). >>>>>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T> >>>>>>> >>>>>>> br %isuniq, bb_continue, bb_slow >>>>>>> >>>>>>> bb_slow: >>>>>>> // > self._storage = copyBuffer(self._storage) >>>>>>> // Produce a new class reference to storage with verifiably unique RC >>>>>>> semantics. >>>>>>> %copied_storage_class = alloc_ref ... >>>>>>> // A begin/end exclusive scope is implicit in store [assign]. >>>>>>> store [assign] %copied_storage_class to %arrayref._storageref >>>>>>> br bb_continue >>>>>>> >>>>>>> bb_continue: >>>>>>> >>>>>>> // This marks the end of makeMutable's inout `self` scope. Because >>>>>>> Array >>>>>>> // contains a "copy_on_write" property, the SIL verifier needs to >>>>>>> // prove that %arrayref.#_storage has not escaped at this point. This >>>>>>> // is equivalent to checking that %arrayref itself is not copied, and >>>>>>> // checking each projection of the "copy_on_write" storage property >>>>>>> // (%arrayref._storageref) is not copied. Or, if any copies are >>>>>>> present, >>>>>>> // they must be consumed within this scope. >>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #1 >>>>>>> >>>>>>> // > self._storage.someData = x >>>>>>> // An _addr instruction with one load/store use doesn't really need >>>>>>> its own scope. >>>>>>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage >>>>>>> >>>>>>> // ARC optimization can promote this to a borrow, replacing >>>>>>> strong_release with end_borrow. >>>>>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow >>>>>>> ArrayStorage >>>>>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow >>>>>>> ArrayStorage >>>>>>> >>>>>>> // Write some data into the CoW buffer. >>>>>>> // (For simplicity, pretend ArrayStorage has a "someData" field). >>>>>>> // A single-use _addr instruction, so no scope. >>>>>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData >>>>>>> // A store with an implicit [exclusive] scope. >>>>>>> store [assign] %x to %somedata_addr >>>>>>> >>>>>>> strong_release %arrayref._storage : $*ArrayStorage<T> >>>>>>> >>>>>>> // End the isUnique argument's exclusive scope. >>>>>>> // The same verification is needed here, but the inner scope would be >>>>>>> eliminated. >>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #0 >>>>>>> >>>>>>> In general this approach looks more "user-friendly" than the first >>>>>>> alternative. >>>>>> >>>>>> Well, I can't really tell, because you haven't shown the Swift code that >>>>>> generates this SIL. >>>>>> >>>>>>> But it depends on implementing the general feature to insert the inout >>>>>>> scoping instructions. Also, we still have to think through all the >>>>>>> details of this approach. >>>>>> >>>>>> FWIW, I am convinced we will need (and get) a stricter inout model that >>>>>> would be conducive to inserting the scoping instructions. >>>>>> >>>>>> >>>>>>> Dependency between a buffer reference to the scope-begin >>>>>>> -------------------------------------------------------- >>>>>> >>>>>> You can only have a dependency between two things, but as phrased “a >>>>>> buffer reference to the scope-begin” sounds like one thing. s/to/and/ >>>>>> would fix it. >>>>>> >>>>>>> With both alternatives there is no explicit dependency from a buffer >>>>>>> reference >>>>>>> to a scope-begin instruction:: >>>>>>> >>>>>>> %b_cow = load %baddr >>>>>>> %b = cow_to_ref %b_cow >>>>>>> %x = load %b // No dependency between this... >>>>>>> ... >>>>>>> begin_exclusive %baddr // ... and this instruction. >>>>>>> ... >>>>>>> >>>>>>> So in theory the optimizer is free to reschedule the instructions:: >>>>>>> >>>>>>> %b_cow = load %baddr >>>>>>> %b = cow_to_ref %b_cow >>>>>>> ... >>>>>>> begin_exclusive %baddr >>>>>>> %x = load %b // Wrong! Buffer could be modified here >>>>>>> ... >>>>>>> >>>>>>> We still have to figure out how to cope with this. >>>>>>> >>>>>>> - We could add an end-of-lifetime instruction for a COW buffer >>>>>>> reference, which >>>>>>> the optimizer may not move over a begin-of-scope instruction. >>>>>>> >>>>>>> - Or we just define the implicit rule for the optimizer that any use of >>>>>>> a COW >>>>>>> reference may not be moved over a begin-of-scope instruction. >>>>>>> >>>>>>> Preconditions >>>>>>> ============= >>>>>>> >>>>>>> To benefit from COW optimizations in the stdlib Array, Set and >>>>>>> Dictionary data >>>>>>> structures we first need eager bridging, meaning getting rid of the >>>>>>> bridged >>>>>>> buffer. >>>>>> >>>>>> As you know I'm very much in favor of eager bridging, but I don't see >>>>>> why this would be dependent on it. >>>>> >>>>> We could use copy_on_write with eager bridging, but I don’t think it will >>>>> give any benefits to the >>>>> optimizer. >>>>> For example, the SIL code to get from an Array to a native >>>>> ContiguousArrayStorage reference is pretty hard to understand for the >>>>> optimizer (involves low level bit operations, etc.). >>>> >>>> It wouldn't need to do low-level bit operations if our enums were >>>> capable/controllable enough. I'm just saying, there's no reason we >>>> couldn't give the optimizer something to work with that has higher level >>>> semantics than what we currently do. >>>> >>>>>>> At least for Array this is implemented as low-level bit operations and >>>>>>> optimizations cannot reason about it (e.g. finding a reasonable >>>>>>> RC-root for the buffer reference). >>>>>>> >>>>>>> Another thing is that we currently cannot use ``copy_on_write`` for >>>>>>> Array >>>>>>> because of pinning. Array pins it’s buffer when passing an element >>>>>>> address to >>>>>>> an inout parameter. This allows the array buffer to be modified even if >>>>>>> its >>>>>>> reference count is > 1. With ``copy_on_write``, a programmer could >>>>>>> break memory >>>>>>> safety when violating the inout rule. Example:: >>>>>>> >>>>>>> var arr = [MyClass()] // a global array >>>>>>> >>>>>>> foo(&arr[0]) // Pins the buffer of arr during the call >>>>>>> >>>>>>> func foo(_ x: inout MyClass) -> Int { >>>>>>> let b = arr // The ref-count of the buffer is not incremented, >>>>>>> because it is pinned! >>>>>>> let r = b[0] // optimizer removes the retain of r because it >>>>>>> thinks the following code cannot modify b >>>>>>> arr.removeAll() // does not copy the array buffer and thus >>>>>>> de-allocates r >>>>>>> return r.i // use-after-free! >>>>>>> } >>>>>> >>>>>> I only know of one way to resolve inout and pinning: >>>>>> >>>>>> * Semantically, references are replaced with a trap value when entering >>>>>> an inout context so that all inout values are provably unique >>>>>> references in the absence of unsafe code. We drop pinning and provide >>>>>> explicit operations that provide simultaneous lvalue accesses to >>>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices. >>>>>> >>>>>> If there are other ideas out there, I'd like to hear them. If not, we >>>>>> should probably decide that this is what we're doing so that we can move >>>>>> forward without this looming uncertainty. >>>>>> >>>>>> -- >>>>>> -Dave >>>>>> >>>>>> _______________________________________________ >>>>>> swift-dev mailing list >>>>>> swift-dev@swift.org <mailto:swift-dev@swift.org> >>>>>> https://lists.swift.org/mailman/listinfo/swift-dev >>>>> >>>> >>>> -- >>>> -Dave >>> >>> _______________________________________________ >>> swift-dev mailing list >>> swift-dev@swift.org <mailto:swift-dev@swift.org> >>> https://lists.swift.org/mailman/listinfo/swift-dev >> >> -- >> -Dave >> >> _______________________________________________ >> swift-dev mailing list >> swift-dev@swift.org <mailto:swift-dev@swift.org> >> https://lists.swift.org/mailman/listinfo/swift-dev >> <https://lists.swift.org/mailman/listinfo/swift-dev> > _______________________________________________ > swift-dev mailing list > swift-dev@swift.org <mailto:swift-dev@swift.org> > https://lists.swift.org/mailman/listinfo/swift-dev > <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev