> On Oct 19, 2016, at 6:36 PM, Andrew Trick via swift-dev <[email protected]>
> wrote:
>
>>
>> On Oct 19, 2016, at 10:13 AM, Dave Abrahams via swift-dev
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>>
>> on Tue Oct 18 2016, Erik Eckstein <swift-dev-AT-swift.org
>> <http://swift-dev-at-swift.org/>> wrote:
>>
>>>> On Oct 17, 2016, at 10:21 AM, Dave Abrahams <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>
>>>> on Mon Oct 17 2016, Erik Eckstein <eeckstein-AT-apple.com
>>>> <http://eeckstein-at-apple.com/>> wrote:
>>>>
>>>
>>>>> On Oct 16, 2016, at 2:05 PM, Dave Abrahams via swift-dev
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>
>>>>>> on Thu Oct 13 2016, Joe Groff <swift-dev-AT-swift.org
>>>>>> <http://swift-dev-at-swift.org/> <http://swift-dev-at-swift.org/
>>>>>> <http://swift-dev-at-swift.org/>>> wrote:
>>>>>>
>>>>>>>> On Oct 11, 2016, at 4:48 PM, Erik Eckstein via swift-dev
>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>
>>>>>>>> This is a proposal for representing copy-on-write buffers in
>>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>>>>>>> depends on how we move forward with SIL ownership.
>>>>>>>> <CopyOnWrite.rst>
>>>>>>>> If you have any comments, please let me know.
>>>>>>>
>>>>>>> The SIL-level design seems sensible to me at a glance. At the language
>>>>>>> level, I think it would make more sense to treat this as an attribute
>>>>>>> on class types rather than on properties in structs using the class. I
>>>>>>> don't think many people reuse class definitions as both shared
>>>>>>> reference types and as value type payloads,
>>>>>>
>>>>>> Foundation does, or would if they could.
>>>>>>
>>>>>>> but beyond that, I think that making it an attribute of classes would
>>>>>>> put us into a better position to leverage the borrow model to enforce
>>>>>>> the "mutable-only-when-unique" aspect of COW implementations. John
>>>>>>> alluded to this in the "SIL address types and borrowing" thread:
>>>>>>>
>>>>>>>> I wonder if it would make more sense to make copy-on-write buffer
>>>>>>>> references a move-only type, so that as long as you were just
>>>>>>>> working with the raw reference (as opposed to the CoW aggregate,
>>>>>>>> which would remain copyable) it wouldn't get implicitly copied
>>>>>>>> anymore. You could have mutable and immutable buffer reference
>>>>>>>> types, both move-only, and there could be a consuming checkUnique
>>>>>>>> operation on the immutable one that, I dunno, returned an Either of
>>>>>>>> the mutable and immutable versions.
>>>>>>>>
>>>>>>>> For CoW aggregates, you'd need some @copied attribute on the field
>>>>>>>> to make sure that the CoW attribute was still copyable. Within the
>>>>>>>> implementation of the type, though, you would be projecting out the
>>>>>>>> reference immediately, and thereafter you'd be certain that you were
>>>>>>>> borrowing / moving it around as appropriate.
>>>>>>>
>>>>>>> If 'copy-on-write' were a trait on classes, then we could distinguish
>>>>>>> unique and nonunique references to the class. A unique reference would
>>>>>>> act like a move-only type to prevent accidental loss of uniqueness.
>>>>>>
>>>>>> +1
>>>>>>
>>>>>>> We can also allow a copy-on-write class to have "mutating" methods,
>>>>>>> and only allow mutation on unique references. It seems to me like,
>>>>>>> exploring this direction, we could also come up with a way for the
>>>>>>> high-level value-semantics operations on the struct to statically
>>>>>>> indicate which methods are known to leave the value's buffers in a
>>>>>>> unique state, or which return values that are uniquely owned, which
>>>>>>> would give the optimizer more ability to avoid uniqueness checks
>>>>>>> across calls without relying on inlining and IPO.
>>>>>>
>>>>>> That's pretty cool. However, I think there's nothing to prevent any
>>>>>> mutating method from storing a copy of self in a global, so I think we'd
>>>>>> need some participation from the programmer (either an agreement not to
>>>>>> do that, or an explicit claim of uniqueness on exit) in order to
>>>>>> identify operations that create/preserve uniqueness.
>>>>>
>>>>> If a mutating reference (like self in a mutating method) is move-only
>>>>> then you would not be able to “copy” it to a global.
>>>>
>>>> Yes, a reference to a move-only type would work for this purpose.
>>>>
>>>>
>>>>>> On Oct 16, 2016, at 2:01 PM, Dave Abrahams via swift-dev
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>
>>>>>> on Tue Oct 11 2016, Erik Eckstein <swift-dev-AT-swift.org
>>>>>> <http://swift-dev-at-swift.org/>> wrote:
>>>>>>
>>>>>>> This is a proposal for representing copy-on-write buffers in
>>>>>>> SIL. Actually it’s still a draft for a proposal. It also heavily
>>>>>>> depends on how we move forward with SIL ownership.
>>>>>>>
>>>>>>> :orphan:
>>>>>>>
>>>>>>> .. highlight:: sil
>>>>>>>
>>>>>>> ===================================
>>>>>>> Copy-On-Write Representation in SIL
>>>>>>> ===================================
>>>>>>>
>>>>>>> .. contents::
>>>>>>>
>>>>>>> Overview
>>>>>>> ========
>>>>>>>
>>>>>>> This document proposes:
>>>>>>>
>>>>>>> - An ownership attribute to define copy-on-write (COW) buffers in Swift
>>>>>>> data
>>>>>>> types.
>>>>>>>
>>>>>>> - A representation of COW buffers in SIL so that optimizations can take
>>>>>>> benefit
>>>>>>> of it.
>>>>>>>
>>>>>>> The basic idea is to enable the SIL optimizer to reason about COW data
>>>>>>> types
>>>>>>> in the same way as a programmer can do.
>>>>>>> This means: a COW buffer can only be modified by its owning SIL value,
>>>>>>> because
>>>>>>> either it's uniquely referenced or the buffer is copied before modified.
>>>>>>>
>>>>>>> .. note::
>>>>>>> In the following the term "buffer" refers to a Swift heap object.
>>>>>>> It can be any heap object, not necessarily a “buffer” with e.g.
>>>>>>> tail-allocated elements.
>>>>>>>
>>>>>>> COW Types
>>>>>>> =========
>>>>>>>
>>>>>>> The basic structure of COW data types can be simplified as follows::
>>>>>>>
>>>>>>> class COWBuffer {
>>>>>>> var someData: Int
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> struct COWType {
>>>>>>> var b : COWBuffer
>>>>>>>
>>>>>>> mutating func change_it() {
>>>>>>> if (!isUniquelyReferenced(b)) {
>>>>>>> b = copy_buffer(b)
>>>>>>> }
>>>>>>> b.someData = ...
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> Currently the COW behavior of such types is just defined by their
>>>>>>> implementation.
>>>>>>> But there is no representation of this special behavior in the SIL.
>>>>>>> So the SIL optimizer has no clue about it and cannot take advantage of
>>>>>>> it.
>>>>>>>
>>>>>>> For example::
>>>>>>>
>>>>>>> func foo(arr : [Int]) {
>>>>>>> x = arr[0]
>>>>>>> opaque_function()
>>>>>>> y = arr[0] // can RLE replace this with y = x?
>>>>>>> }
>>>>>>>
>>>>>>> If opaque_function() wants to change the contents of the array buffer
>>>>>>> it first
>>>>>>> has to copy it.
>>>>>>
>>>>>> ...or determine that it's uniquely-referenced.
>>>>>
>>>>> In this specific example, if opqaue_function holds a reference to arr’s
>>>>> buffer, the buffer is not
>>>>> uniquely-referenced.
>>>>
>>>> Right.
>>>>
>>>>>>
>>>>>>> But the optimizer does not know it so it has to conservatively assume
>>>>>>> that opaque_function() will write to the location of arr[0].
>>>>>>>
>>>>>>> Copy-on-write Ownership Attribute
>>>>>>> =================================
>>>>>>>
>>>>>>> This section proposes an ownership attribute to define a copy-on-write
>>>>>>> buffer.
>>>>>>>
>>>>>>> Swift Syntax
>>>>>>> ------------
>>>>>>>
>>>>>>> A COW buffer reference can be defined with a new ownership attribute
>>>>>>> for the
>>>>>>> buffer variable declaration (similar to “weak” and “unowned”)::
>>>>>>>
>>>>>>> struct COWType {
>>>>>>> copy_on_write var b : COWBuffer
>>>>>>>
>>>>>>> // ...
>>>>>>> }
>>>>>>>
>>>>>>> The ``copy_on_write`` attribute is purely used for optimization
>>>>>>> purposes.
>>>>>>> It does not change the semantics of the program.
>>>>>>
>>>>>> Presumably, it changes what code you can execute on `b` without invoking
>>>>>> traps or undefined behavior. Otherwise, the optimizer wouldn't be able
>>>>>> to do anything differently to take advantage of the annotation.
>>>>>
>>>>> That’s true.
>>>>>
>>>>>> What are the rules for writing code that uses `copy_on_write`?
>>>>>
>>>>> See below ("The rules for using ``copy_on_write`` and the built-ins are:”)
>>>>
>>>> Yeah, I got there, eventually. But just saying “doesn't change
>>>> semantics” at this point in the proposal leaves a gap, because it does
>>>> change semantic *requirements*. You should mention that.
>>>>
>>>>>>> .. note::
>>>>>>>
>>>>>>> “copy_on_write” is a working title. TODO: decide on the name.
>>>>>>> Maybe it should be a @-attribute, like @copy_on_write?
>>>>>>> Another question is if we should open this attribute for the public or
>>>>>>> just
>>>>>>> use it internally in the library, because violating the implied rules
>>>>>>> (see below) could break memory safety.
>>>>>>>
>>>>>>> Implementation
>>>>>>> --------------
>>>>>>>
>>>>>>> The ``copy_on_write`` references can be represented in the AST as a
>>>>>>> special
>>>>>>> ``StorageType``, just like how ``unowned`` and ``weak`` is represented.
>>>>>>> The canonical type of a ``CopyOnWriteStorageType`` would be the
>>>>>>> referenced
>>>>>>> buffer class type.
>>>>>>>
>>>>>>> In SIL the buffer reference will have type::
>>>>>>>
>>>>>>> $@sil_cow COWBuffer
>>>>>>>
>>>>>>> where ``COWBuffer`` is the type of the referenced heap object.
>>>>>>>
>>>>>>> Two conversion instructions are needed to convert from a ``@sil_cow``
>>>>>>> reference
>>>>>>> type to a regular reference type::
>>>>>>>
>>>>>>> cow_to_ref
>>>>>>> ref_to_cow
>>>>>>>
>>>>>>> Again, this is similar to ``ref_to_unowned`` and ``unowned_to_ref``.
>>>>>>>
>>>>>>> For example the SIL code for::
>>>>>>>
>>>>>>> var c: COWType
>>>>>>> let x = c.b.someData
>>>>>>>
>>>>>>> would be::
>>>>>>>
>>>>>>> %1 = struct_extract %1 : COWType, #COWType.b
>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #COWBuffer.someData
>>>>>>> %4 = load %3 : $*Int
>>>>>>>
>>>>>>> The ``ref_to_cow`` instruction is needed to store a new buffer
>>>>>>> reference into a
>>>>>>> COW type.
>>>>>>>
>>>>>>> COW Buffers and the Optimizer
>>>>>>> =============================
>>>>>>>
>>>>>>> A reference to a COW buffer gives the optimizer additional information:
>>>>>>>
>>>>>>> *A buffer, referenced by a @sil_cow reference is considered to be
>>>>>>> immutable
>>>>>>> during the lifetime of the reference.*
>>>>>>
>>>>>> This seems like much too broad a rule to allow inplace mutations of
>>>>>> uniquely referenced buffers.
>>>>>
>>>>> The point is that all mutations must be guarded by an is_unique, which
>>>>> takes the _address_ of the buffer reference as argument.
>>>>> And the optimizer considers this instruction as a potential write to the
>>>>> buffer reference.
>>>>> The effect is that the lifetime of a buffer reference (as a SIL value)
>>>>> will not outlive a is_unique - regardless if this is inside a called
>>>>> function or inlined.
>>>>
>>>> I don't see how that allows me to mutate a uniquely referenced buffer held
>>>> in a @sil_cow reference, given what you wrote above.
>>>
>>> You would not be able to get a reference to a mutable buffer by
>>> reading the COW type’s @sil_cow field. Instead you would only get
>>> such a reference as a result of the is_unique instruction/builtin. Or,
>>> of course, by creating a new buffer.
>>>
>>> I’m not sure if this was the question, though.
>>
>> I think it just comes down to precise phrasing. AFAICT, what you really
>> mean to say is something like
>>
>> A buffer cannot be directly mutated through a @sil_cow reference;
>> instead one must mutate it indirectly via the result of is_unique or
>> start_unique.
Exactly, that’s what I wanted to say.
>>
>> Saying that the buffer is “considered to be immmutable during the
>> lifetime of the reference” could be taken to mean that the compiler will
>> assume no mutations of the buffer can occur while the reference exists.
>> IIUC you are not planning to formally end the reference's lifetime at
>> the moment is_unique/start_unique returns.
>
> To clarify: I proposed an alternate approach in which the @sil_cow reference
> is only mutable during the Array’s @inout scope—to be automatically enforced
> by the compiler once @inout scopes are enforced. But the text in question is
> not referring to that approach, so your comments are on target.
After thinking about Joe’s suggestion (having the cow attribute on the class
type and make a reference to that type move-only), I’m more inclined to go with
the isUnique builtin. If such a reference can only be returned by isUnique, it
is really guaranteed that only a uniquely referenced buffer can be mutated.
With the inout approach, the programmer is not forced to make the uniqueness
check before modifying the buffer.
> -Andy
>
>>>
>>> Plus: we will have an explicit conversion instruction (start_unique) to
>>> convert an immutable
>>> reference to a mutable referece.
>>> A SIL optimization can replace an is_unique with this instruction if it can
>>> prove that the reference
>>> is already unique at that point.
>>>
>>>>
>>>>
>>>>>> Unless you mean the reference is
>>>>>> immutable, rather than the storage being referred to by it.
>>>>>>
>>>>>>> This means any address derived from a ``cow_to_ref`` instruction can be
>>>>>>> considered to point to immutable memory.
>>>>>>>
>>>>>>> Some examples of optimizations which will benefit from copy-on-write
>>>>>>> representation in SIL:
>>>>>>>
>>>>>>> - Redundant load elimination
>>>>>>>
>>>>>>> RLE can assume that opaque code does not modify a COW buffer.
>>>>>>
>>>>>> How do you distinguish “opaque code” from “code that is meant to
>>>>>> modify the buffer and might do so in place if it's uniquely-referenced?”
>>>>>
>>>>> Again, the is_unique which takes the address of the reference, will
>>>>> guarantee that during the lifetime of a buffer there are no
>>>>> modifications of the buffer.
>>>>
>>>> Again, that sounds like it rules out inplace modification of uniquely
>>>> referenced buffers.
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>> Example::
>>>>>>>
>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData
>>>>>>> %4 = load %3 : $*Int
>>>>>>> %5 = apply %foo() // Cannot overwrite memory
>>>>>>> location %3
>>>>>>> %6 = load %3 : $*Int // Can be replaced by %4
>>>>>>>
>>>>>>> Currently we do some ad-hoc optimizations for array, based on semantics,
>>>>>>> like array count propagation. These hacks would not be needed
>>>>>>> anymore.
>>>>>>
>>>>>> W0000000000000000000000t.
>>>>>>
>>>>>>> Note that it’s not required to check if a ``cow_to_ref`` reference (or a
>>>>>>> projected address) escapes. Even if it escapes, it will reference
>>>>>>> immutable
>>>>>>> memory.
>>>>>>>
>>>>>>> - CSE, loop hoisting
>>>>>>>
>>>>>>> Similar to RLE: the optimizer can assume that opaque code cannot modify
>>>>>>> a
>>>>>>> COW buffer
>>>>>>
>>>>>> Same question here as above, then.
>>>>>>>
>>>>>>> - ARC optimization
>>>>>>>
>>>>>>> Knowing that some opaque code cannot overwrite a reference in the COW
>>>>>>> buffer
>>>>>>> can remove retain/release pairs across such code::
>>>>>>>
>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someRef
>>>>>>> %4 = load_strong %3 : $*MyClass // Can do a load_strong
>>>>>>> [guarantee]
>>>>>>> %5 = apply %foo() // Cannot overwrite someRef
>>>>>>> and dealloc the object
>>>>>>> // Use %4
>>>>>>> destroy_value %4 : $MyClass
>>>>>>>
>>>>>>> Scoping instructions
>>>>>>> --------------------
>>>>>>>
>>>>>>> To let the optimizer reason about the immutability of the COW buffer,
>>>>>>> it is
>>>>>>> important to *bind* the lifetime of the buffer content to the lifetime
>>>>>>> of the
>>>>>>> buffer reference. For example::
>>>>>>>
>>>>>>> %b1 = load %baddr : $@sil_cow COWBuffer // load the buffer reference
>>>>>>> // load something from %b1
>>>>>>> %a = apply %foo(%baddr : $@sil_cow COWBuffer)
>>>>>>> %b2 = load %baddr : $@sil_cow COWBuffer // load the buffer reference
>>>>>>> again
>>>>>>> // load something from %b2
>>>>>>>
>>>>>>> The question is: can RLE forward the load of the buffer reference and
>>>>>>> replace
>>>>>>> ``%b2`` with ``%b1``? It must not be able to do so if ``foo()``
>>>>>>> modifies the
>>>>>>> buffer.
>>>>>>>
>>>>>>> To enforce this restriction, the scope of any buffer modification must
>>>>>>> be
>>>>>>> enclosed in a pair of SIL instructions. Those instructions define the
>>>>>>> scope
>>>>>>> of the mutation. Both instructions take the *address* of the buffer
>>>>>>> reference as operand and act as a potential write to the buffer
>>>>>>> reference.
>>>>>>>
>>>>>>> The purpose of the scoping instructions is to strictly separate the
>>>>>>> liferanges
>>>>>>> of references to an immutable buffer and references to the mutable
>>>>>>> buffer.
>>>>>>
>>>>>> Looks reasonable.
>>>>>>
>>>>>>> The following example shows why the scoping instructions (specifically
>>>>>>> the
>>>>>>> end-of-scope instruction) are required to prevent loop-hoisting from
>>>>>>> interleaving mutable and immutable liferanges::
>>>>>>>
>>>>>>> // there should be a begin-of-scope %baddr
>>>>>>> %mut_b = load %baddr
>>>>>>> store %x to %mut_b // modification of the buffer
>>>>>>> // there should be a end-of-scope %baddr
>>>>>>>
>>>>>>> loop {
>>>>>>> %b = load %baddr
>>>>>>> %y = load %b // load from the buffer
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> If there is no end-of-scope instruction, loop hoisting could do::
>>>>>>>
>>>>>>> %mut_b = load %baddr
>>>>>>> %b = load %baddr // moved out of the loop
>>>>>>> store %x to %mut_b
>>>>>>>
>>>>>>> loop {
>>>>>>> %y = load %b
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> Now the optimizer assumes that ``%b`` references an immutable buffer,
>>>>>>> so it could
>>>>>>> also hoist the load::
>>>>>>>
>>>>>>> %mut_b = load %baddr
>>>>>>> %b = load %baddr
>>>>>>> %y = load %b // Wrong! Will be overwritten by the following
>>>>>>> store
>>>>>>> store %x to %mut_b
>>>>>>>
>>>>>>> loop {
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> The following sections describe two alternatives to implement the
>>>>>>> scoping.
>>>>>>>
>>>>>>> Scoping Alternative 1: Explicit Built-ins
>>>>>>> -----------------------------------------
>>>>>>>
>>>>>>> SIL instructions
>>>>>>> ^^^^^^^^^^^^^^^^
>>>>>>>
>>>>>>> The existing ``is_unique`` instruction is changed to a terminator
>>>>>>> instruction::
>>>>>>>
>>>>>>> bb0:
>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer, bb1, bb2 // %0 is the
>>>>>>> address of the COWBuffer reference
>>>>>>> bb1(%1 : $COWBuffer): // the true-block. The payload %1 is the unique
>>>>>>> reference. Physically identical to "load %0”
>>>>>>> // usually empty
>>>>>>> br bb3(%1 : $COWBuffer)
>>>>>>> bb2: // the false-block
>>>>>>> // usually contains:
>>>>>>> %2 = apply %copy_buffer
>>>>>>> %3 = cow_to_ref %2
>>>>>>> store_strong %3 to %0 : $*@sil_cow COWBuffer
>>>>>>> br bb3(%2 : $COWBuffer)
>>>>>>> bb3(%4 : $COWBuffer):
>>>>>>> // Modify the buffer referenced by %4
>>>>>>> // ...
>>>>>>>
>>>>>>> The end-of-scope instruction is::
>>>>>>>
>>>>>>> end_unique_addr %0 : $*COWBuffer
>>>>>>>
>>>>>>> It is important that the references to the unique buffers (``%1``,
>>>>>>> ``%2``) must
>>>>>>> not outlive ``end_unique_addr``. In most cases this can be check by the
>>>>>>> SIL
>>>>>>> verifier.
>>>>>>>
>>>>>>> The two instructions must be paired properly but not necessarily in the
>>>>>>> same function.
>>>>>>>
>>>>>>> The purpose of an ``is_unique_addr_br`` - ``end_unique_addr`` pair is to
>>>>>>> separate the lifetimes of mutable and immutable accesses to the COW
>>>>>>> buffer.
>>>>>>> Both instructions take an address to the COW buffer reference and are
>>>>>>> considered as potential stores to the reference.
>>>>>>> This makes sure that the SIL optimizer cannot mix-up buffer reference
>>>>>>> lifetimes
>>>>>>> across these instructions.
>>>>>>> For example, RLE cannot combine two buffer loads which are interleaved
>>>>>>> with
>>>>>>> a ``is_unique_addr_br``::
>>>>>>>
>>>>>>> %1 = load_strong %0 : $*@sil_cow COWBuffer
>>>>>>> // do something with %1
>>>>>>> …
>>>>>>> is_unique_addr_br %0 : $*@sil_cow COWBuffer
>>>>>>> …
>>>>>>> %2 = load_strong %0 : $*@sil_cow COWBuffer // RLE cannot replace this
>>>>>>> with %1
>>>>>>>
>>>>>>> Another important thing is that the COW buffer can only be mutated by
>>>>>>> using the
>>>>>>> reference of the ``is_unique_addr_br`` true-block argument.
>>>>>>> The COW buffer cannot be modified by simply loading/extracting the
>>>>>>> reference
>>>>>>> from the COWType.
>>>>>>> Example::
>>>>>>>
>>>>>>> %1 = load_strong %0 : $*COWBuffer
>>>>>>> %2 = cow_to_ref %1 : $@sil_cow COWBuffer
>>>>>>> %3 = ref_element_addr %2 : $COWBuffer, #someData
>>>>>>> store %7 : $Int to %3 : $*Int // Violation!
>>>>>>>
>>>>>>> Most obvious violations to this constraint can be catched by the
>>>>>>> SILVerifier.
>>>>>>>
>>>>>>> The ``_addr`` variants of the instructions also have a non-addr
>>>>>>> counterpart::
>>>>>>>
>>>>>>> is_unique_br %0 : $COWBuffer, bb1, bb2. // consumes %0 and produces
>>>>>>> the true-block arg as owned
>>>>>>>
>>>>>>> %1 = end_unique %0 : $COWBuffer // consumes %0 and produces %1 as owned
>>>>>>>
>>>>>>> These instructions are generated by Mem2reg (or a similar optimization)
>>>>>>> in case the COW value is stored (in a temporary alloc_stack location)
>>>>>>> just for the sake of passing an address to ``is_unique_addr_br`` and
>>>>>>> ``end_unique_addr``.
>>>>>>> For example in the following code, where the COW data is passed
>>>>>>> as-value and
>>>>>>> all the mutating functions are inlined::
>>>>>>>
>>>>>>> func foo(arr : [Int], x: Int) {
>>>>>>> arr[0] = 27
>>>>>>> …
>>>>>>> y = arr[x]
>>>>>>> …
>>>>>>> }
>>>>>>>
>>>>>>> Finally it’s probably a good idea to add an instruction for converting
>>>>>>> an
>>>>>>> immutable reference to a mutable reference::
>>>>>>>
>>>>>>> %1 = start_unique %0 : $COWBuffer // consumes %0 and produces %1 :
>>>>>>> $COWBuffer as owned
>>>>>>>
>>>>>>> which is basically just a simpler representation of the following
>>>>>>> pattern::
>>>>>>>
>>>>>>> bb0:
>>>>>>> is_unique_br %0 : $@sil_cow COWBuffer, bb1, bb2
>>>>>>> bb1(%1 : $COWBuffer):
>>>>>>> … // main control flow continues here
>>>>>>> bb2:
>>>>>>> unreachable
>>>>>>>
>>>>>>> An optimizations, which eliminate uniqueness checks, would replace a
>>>>>>> ``is_unique_br`` by a ``start_unique``.
>>>>>>>
>>>>>>> Built-ins
>>>>>>> ^^^^^^^^^
>>>>>>>
>>>>>>> A COW type implementor can generate the new instructions by using a set
>>>>>>> of built-ins::
>>>>>>>
>>>>>>> func isUnique<BufferType>(_ buffer: inout BufferType) -> BufferType?
>>>>>>> func endUnique<BufferType>(_ buffer: inout BufferType)
>>>>>>>
>>>>>>> For example::
>>>>>>>
>>>>>>> struct COWType {
>>>>>>> copy_on_write var b : COWBuffer
>>>>>>>
>>>>>>> mutating func makeMutable() -> COWBuffer {
>>>>>>> if let uniqueBuffer = isUnique(&self.b) {
>>>>>>> return uniqueBuffer
>>>>>>> }
>>>>>>> let copiedBuffer = copyBuffer(self.b)
>>>>>>> self.b = copiedBuffer
>>>>>>> return copiedBuffer
>>>>>>> }
>>>>>>>
>>>>>>> mutating func setSomeData(x: Int) {
>>>>>>> let uniqueBuffer = makeMutable()
>>>>>>> uniqueBuffer.someData = x
>>>>>>> endUnique(&self.b)
>>>>>>> }
>>>>>>> }
>>>>>>
>>>>>> This seems reasonable, but it also looks like the compiler could do the
>>>>>> `endUnique` dance for us based, e.g., on the mutability of methods.
>>>>>
>>>>> I agree, that would be ideal, e.g. the compiler could insert the
>>>>> endUnique at the end of an inout
>>>>> scope.
>>>>>
>>>>>>
>>>>>>> The ``isUnique`` built-in returns an optional unique buffer reference.
>>>>>>> Physically this is the COW buffer which is passed as the inout argument.
>>>>>>> The result is nil if the buffer is not uniquely referenced.
>>>>>>> In this case usually the original buffer is copied and the reference to
>>>>>>> the
>>>>>>> copy is written back to the original buffer reference location
>>>>>>> (``self.b = copiedBuffer``).
>>>>>>> Starting at the point of the write-back, the reference to the copy also
>>>>>>> becomes
>>>>>>> a unique buffer reference.
>>>>>>>
>>>>>>> The ``isUnique`` built-in is lowered to the ``is_unique_addr_br``
>>>>>>> pattern which
>>>>>>> constructs the Optional in the successor blocks. Using ``isUnique`` in
>>>>>>> an
>>>>>>> if-let (as shown above) will end up in two consecutive CFG "diamonds".
>>>>>>> Simplify-CFG can combine those into a single ``is_unique_addr_br``
>>>>>>> diamond.
>>>>>>>
>>>>>>> .. note::
>>>>>>> This makes the definition of the unique buffer location lifetime a
>>>>>>> little bit
>>>>>>> problematic, because the false-branch of ``isUnique`` is not equivalent
>>>>>>> to
>>>>>>> the false-branch of the ``is_unique_addr_br`` instruction (before
>>>>>>> SimplifyCFG
>>>>>>> can do its job).
>>>>>>
>>>>>> I don't know what the implications of these diamonds and the problem
>>>>>> described above might be, FWIW.
>>>>>>
>>>>>>> The rules for using ``copy_on_write`` and the built-ins are:
>>>>>>>
>>>>>>> 1. ``isUnique`` must be paired with ``endUnique``, but not necessarily
>>>>>>> in the
>>>>>>> same function.
>>>>>>>
>>>>>>> 2. The COW buffer may only be mutated by using the unique buffer
>>>>>>> reference.
>>>>>>>
>>>>>>> 3. The COW buffer must not be mutated outside the ``isUnique`` -
>>>>>>> ``endUnique``
>>>>>>> pair.
>>>>>>>
>>>>>>> 4. During the lifetime of the unique buffer reference, the original COW
>>>>>>> buffer
>>>>>>> reference must not be used in any way, e.g. for reading from the buffer.
>>>>>>>
>>>>>>> Note that the lifetime of the unique buffer reference does not include
>>>>>>> the
>>>>>>> part between the begin of the ``isUnique`` false-branch and the
>>>>>>> write-back
>>>>>>> of the copy. This means is okay to read from the buffer (using
>>>>>>> ``self.b``)
>>>>>>> for the purpose of copying.
>>>>>>>
>>>>>>> Examples::
>>>>>>>
>>>>>>> mutating func setSomeData(x: Int) {
>>>>>>> let uniqueBuffer = makeMutable()
>>>>>>> uniqueBuffer.someData = x
>>>>>>> // violates rule 1
>>>>>>> }
>>>>>>>
>>>>>>> mutating func setSomeData(x: Int) {
>>>>>>> makeMutable()
>>>>>>> self.b.someData = x // violates rule 2
>>>>>>> endUnique(&self.b)
>>>>>>> }
>>>>>>>
>>>>>>> mutating func setSomeData(x: Int) {
>>>>>>> let uniqueBuffer = makeMutable()
>>>>>>> uniqueBuffer.someData = x
>>>>>>> endUnique(&self.b)
>>>>>>> uniqueBuffer.someData = 27 // violates rule 3
>>>>>>> }
>>>>>>>
>>>>>>> mutating func incrementSomeData() {
>>>>>>> let uniqueBuffer = makeMutable()
>>>>>>> uniqueBuffer.someData = self.b.someData + 1 // violates rule 4
>>>>>>> endUnique(&self.b)
>>>>>>> }
>>>>>>
>>>>>> It would be instructive to write down the *correct* code for these
>>>>>> operations.
>>>>>
>>>>> added to my todo list.
>>>>>
>>>>>>
>>>>>>> The intention of the rules is to ensure that there is no overlap of a
>>>>>>> "read-only" life-range with a "mutable" life-range of the buffer
>>>>>>> reference.
>>>>>>> It’s the responsibility of the implementor to follow the rules.
>>>>>>> But the compiler (a mandatory diagnostics pass and the SIL verifier) can
>>>>>>> statically detect rule violations in obvious cases (with
>>>>>>> inter-procedural
>>>>>>> analysis maybe even in most cases).
>>>>>>>
>>>>>>> This approach would require to change some of the internals of our
>>>>>>> current COW data structures in the stdlib (Array, Dictionary, etc.).
>>>>>>> For example, the Array make_mutable semantic functions currently do not
>>>>>>> return
>>>>>>> the unique buffer.
>>>>>>
>>>>>> No big deal.
>>>>>>
>>>>>>> Scoping Alternative 2: Implicit Inout Scopes
>>>>>>> --------------------------------------------
>>>>>>>
>>>>>>> There is an idea (proposal?) to change the representation of inout
>>>>>>> variables
>>>>>>> in SIL. This is independent of this proposal, but can be helpful for the
>>>>>>> purpose of defining the scope of a COW mutation.
>>>>>>>
>>>>>>> The basic idea is that SILGen inserts scoping instructions for *all*
>>>>>>> inout
>>>>>>> variables. And those scoping instructions can be used to define the
>>>>>>> mutating
>>>>>>> scope of a COW buffer.
>>>>>>>
>>>>>>> The scoping instructions which are inserted by SILGen for an inout
>>>>>>> scope are::
>>>>>>>
>>>>>>> begin_exclusive
>>>>>>> end_exclusive
>>>>>>>
>>>>>>> Simliar to ``is_unique_addr_br`` and ``end_unique_addr``, those
>>>>>>> instructions take the
>>>>>>> address of the inout variable as argument. For the optimizer those
>>>>>>> instructions
>>>>>>> look like potential writes to the inout variable.
>>>>>>>
>>>>>>> The implementor of a COW type has to follow the rule that the COW
>>>>>>> buffer may
>>>>>>> only be modified in mutating functions of the COW type. But this is the
>>>>>>> case
>>>>>>> anyway because any modification needs a uniqueness check and this can
>>>>>>> only be
>>>>>>> done in mutating functions.
>>>>>>>
>>>>>>> Example::
>>>>>>>
>>>>>>> // > mutating func setSomeData(x: Int) {
>>>>>>> // Accepts a unique reference to the array value (avoiding refcount
>>>>>>> operations)
>>>>>>> sil @setSomeData : $(Int, @inout Array) -> () {
>>>>>>> bb_entry(%x : Int, %arrayref : $*Array<T>) // Begin scope #0
>>>>>>>
>>>>>>> // > makeMutable() (inlined)
>>>>>>> // Forward the unique reference to the `self` array value, still
>>>>>>> avoiding refcount operations.
>>>>>>> // Begin the inlined exclusive scope (could be trivially removed).
>>>>>>> begin_exclusive %arrayref : $*Array<T> // Begin scope #1
>>>>>>>
>>>>>>> // > if !isUnique(&self._storage) {
>>>>>>> // Extract a unique inout reference to the class reference to the
>>>>>>> array storage.
>>>>>>> // This begins the isUnique() argument's exclusive scope. The memory
>>>>>>> is already exclusive
>>>>>>> // but the scope helps ensure this is the only alias to _storage.
>>>>>>> %arrayref._storageref = struct_element_addr [exclusive] %arrayref,
>>>>>>> #Array._storage
>>>>>>>
>>>>>>> // Uniqueness checking requires an inout reference to the class
>>>>>>> reference.
>>>>>>> // The is_unique instruction does not need to create a new storage
>>>>>>> reference.
>>>>>>> // It's only purpose is to check the RC count, ensure that the checked
>>>>>>> reference
>>>>>>> // is inout, and prevent the inout scope from being optimized away.
>>>>>>> %isuniq = is_unique %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>>>>>>
>>>>>>> // End the isUnique argument's exclusive scope (can also be trivially
>>>>>>> removed).
>>>>>>> end_exclusive %arrayref._storageref : $*@sil_cow ArrayStorage<T>
>>>>>>>
>>>>>>> br %isuniq, bb_continue, bb_slow
>>>>>>>
>>>>>>> bb_slow:
>>>>>>> // > self._storage = copyBuffer(self._storage)
>>>>>>> // Produce a new class reference to storage with verifiably unique RC
>>>>>>> semantics.
>>>>>>> %copied_storage_class = alloc_ref ...
>>>>>>> // A begin/end exclusive scope is implicit in store [assign].
>>>>>>> store [assign] %copied_storage_class to %arrayref._storageref
>>>>>>> br bb_continue
>>>>>>>
>>>>>>> bb_continue:
>>>>>>>
>>>>>>> // This marks the end of makeMutable's inout `self` scope. Because
>>>>>>> Array
>>>>>>> // contains a "copy_on_write" property, the SIL verifier needs to
>>>>>>> // prove that %arrayref.#_storage has not escaped at this point. This
>>>>>>> // is equivalent to checking that %arrayref itself is not copied, and
>>>>>>> // checking each projection of the "copy_on_write" storage property
>>>>>>> // (%arrayref._storageref) is not copied. Or, if any copies are
>>>>>>> present,
>>>>>>> // they must be consumed within this scope.
>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #1
>>>>>>>
>>>>>>> // > self._storage.someData = x
>>>>>>> // An _addr instruction with one load/store use doesn't really need
>>>>>>> its own scope.
>>>>>>> %arrayref._storageref = struct_element_addr %arrayref, #Array._storage
>>>>>>>
>>>>>>> // ARC optimization can promote this to a borrow, replacing
>>>>>>> strong_release with end_borrow.
>>>>>>> %arrayref.cow_storage = load [copy] %arrayref._storageref : $*@sil_cow
>>>>>>> ArrayStorage
>>>>>>> %arrayref._storage = cow_to_ref %arrayref.cow_storage : $@sil_cow
>>>>>>> ArrayStorage
>>>>>>>
>>>>>>> // Write some data into the CoW buffer.
>>>>>>> // (For simplicity, pretend ArrayStorage has a "someData" field).
>>>>>>> // A single-use _addr instruction, so no scope.
>>>>>>> %somedata_addr = ref_element_addr %arrayref._storage, #someData
>>>>>>> // A store with an implicit [exclusive] scope.
>>>>>>> store [assign] %x to %somedata_addr
>>>>>>>
>>>>>>> strong_release %arrayref._storage : $*ArrayStorage<T>
>>>>>>>
>>>>>>> // End the isUnique argument's exclusive scope.
>>>>>>> // The same verification is needed here, but the inner scope would be
>>>>>>> eliminated.
>>>>>>> end_exclusive %arrayref : $*Array<T> // End scope #0
>>>>>>>
>>>>>>> In general this approach looks more "user-friendly" than the first
>>>>>>> alternative.
>>>>>>
>>>>>> Well, I can't really tell, because you haven't shown the Swift code that
>>>>>> generates this SIL.
>>>>>>
>>>>>>> But it depends on implementing the general feature to insert the inout
>>>>>>> scoping instructions. Also, we still have to think through all the
>>>>>>> details of this approach.
>>>>>>
>>>>>> FWIW, I am convinced we will need (and get) a stricter inout model that
>>>>>> would be conducive to inserting the scoping instructions.
>>>>>>
>>>>>>
>>>>>>> Dependency between a buffer reference to the scope-begin
>>>>>>> --------------------------------------------------------
>>>>>>
>>>>>> You can only have a dependency between two things, but as phrased “a
>>>>>> buffer reference to the scope-begin” sounds like one thing. s/to/and/
>>>>>> would fix it.
>>>>>>
>>>>>>> With both alternatives there is no explicit dependency from a buffer
>>>>>>> reference
>>>>>>> to a scope-begin instruction::
>>>>>>>
>>>>>>> %b_cow = load %baddr
>>>>>>> %b = cow_to_ref %b_cow
>>>>>>> %x = load %b // No dependency between this...
>>>>>>> ...
>>>>>>> begin_exclusive %baddr // ... and this instruction.
>>>>>>> ...
>>>>>>>
>>>>>>> So in theory the optimizer is free to reschedule the instructions::
>>>>>>>
>>>>>>> %b_cow = load %baddr
>>>>>>> %b = cow_to_ref %b_cow
>>>>>>> ...
>>>>>>> begin_exclusive %baddr
>>>>>>> %x = load %b // Wrong! Buffer could be modified here
>>>>>>> ...
>>>>>>>
>>>>>>> We still have to figure out how to cope with this.
>>>>>>>
>>>>>>> - We could add an end-of-lifetime instruction for a COW buffer
>>>>>>> reference, which
>>>>>>> the optimizer may not move over a begin-of-scope instruction.
>>>>>>>
>>>>>>> - Or we just define the implicit rule for the optimizer that any use of
>>>>>>> a COW
>>>>>>> reference may not be moved over a begin-of-scope instruction.
>>>>>>>
>>>>>>> Preconditions
>>>>>>> =============
>>>>>>>
>>>>>>> To benefit from COW optimizations in the stdlib Array, Set and
>>>>>>> Dictionary data
>>>>>>> structures we first need eager bridging, meaning getting rid of the
>>>>>>> bridged
>>>>>>> buffer.
>>>>>>
>>>>>> As you know I'm very much in favor of eager bridging, but I don't see
>>>>>> why this would be dependent on it.
>>>>>
>>>>> We could use copy_on_write with eager bridging, but I don’t think it will
>>>>> give any benefits to the
>>>>> optimizer.
>>>>> For example, the SIL code to get from an Array to a native
>>>>> ContiguousArrayStorage reference is pretty hard to understand for the
>>>>> optimizer (involves low level bit operations, etc.).
>>>>
>>>> It wouldn't need to do low-level bit operations if our enums were
>>>> capable/controllable enough. I'm just saying, there's no reason we
>>>> couldn't give the optimizer something to work with that has higher level
>>>> semantics than what we currently do.
>>>>
>>>>>>> At least for Array this is implemented as low-level bit operations and
>>>>>>> optimizations cannot reason about it (e.g. finding a reasonable
>>>>>>> RC-root for the buffer reference).
>>>>>>>
>>>>>>> Another thing is that we currently cannot use ``copy_on_write`` for
>>>>>>> Array
>>>>>>> because of pinning. Array pins it’s buffer when passing an element
>>>>>>> address to
>>>>>>> an inout parameter. This allows the array buffer to be modified even if
>>>>>>> its
>>>>>>> reference count is > 1. With ``copy_on_write``, a programmer could
>>>>>>> break memory
>>>>>>> safety when violating the inout rule. Example::
>>>>>>>
>>>>>>> var arr = [MyClass()] // a global array
>>>>>>>
>>>>>>> foo(&arr[0]) // Pins the buffer of arr during the call
>>>>>>>
>>>>>>> func foo(_ x: inout MyClass) -> Int {
>>>>>>> let b = arr // The ref-count of the buffer is not incremented,
>>>>>>> because it is pinned!
>>>>>>> let r = b[0] // optimizer removes the retain of r because it
>>>>>>> thinks the following code cannot modify b
>>>>>>> arr.removeAll() // does not copy the array buffer and thus
>>>>>>> de-allocates r
>>>>>>> return r.i // use-after-free!
>>>>>>> }
>>>>>>
>>>>>> I only know of one way to resolve inout and pinning:
>>>>>>
>>>>>> * Semantically, references are replaced with a trap value when entering
>>>>>> an inout context so that all inout values are provably unique
>>>>>> references in the absence of unsafe code. We drop pinning and provide
>>>>>> explicit operations that provide simultaneous lvalue accesses to
>>>>>> distinct regions, e.g. c.swap(i1, i2) where i1 and i2 are indices.
>>>>>>
>>>>>> If there are other ideas out there, I'd like to hear them. If not, we
>>>>>> should probably decide that this is what we're doing so that we can move
>>>>>> forward without this looming uncertainty.
>>>>>>
>>>>>> --
>>>>>> -Dave
>>>>>>
>>>>>> _______________________________________________
>>>>>> swift-dev mailing list
>>>>>> [email protected] <mailto:[email protected]>
>>>>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>>>>
>>>>
>>>> --
>>>> -Dave
>>>
>>> _______________________________________________
>>> swift-dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.swift.org/mailman/listinfo/swift-dev
>>
>> --
>> -Dave
>>
>> _______________________________________________
>> swift-dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.swift.org/mailman/listinfo/swift-dev
>> <https://lists.swift.org/mailman/listinfo/swift-dev>
> _______________________________________________
> swift-dev mailing list
> [email protected] <mailto:[email protected]>
> https://lists.swift.org/mailman/listinfo/swift-dev
> <https://lists.swift.org/mailman/listinfo/swift-dev>
_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev