[swift-dev] Continuation-based versus nested-function-based materializeForSet

Joe Groff via swift-dev Mon, 31 Oct 2016 12:23:00 -0700

We currently abstract over mutable property accesses using what I’ll call a 
continuation-based model–the materializeForSet accessor is called before an 
inout access, and returns a continuation callback that must be called when the 
inout access is completed. I think a nested-function-based model, if designed 
correctly, has the potential to be more efficient and easier to implement.


As background for anyone intimately familiar with the implementation, Swift’s 
language model for mutable properties requires a reconciliation step after a 
mutation occurs. For example, in an inout access like this:

foo(&x.y.z)
the y or z property may be computed, requiring a getter call before the call to 
foo and a setter call afterward, or either property may have willSet or didSet 
observers that fire after foo finishes. If the base x is a class, protocol, or 
generic value, the implementation of y can`t be statically known, so we need an 
abstract interface for mutating the property, projecting out the value at the 
start of the access and committing the mutated value back at the end. Ideally, 
the interface should accommodate efficient access to both stored and computed 
properties, avoiding unnecessary copying of stored properties to avoid inducing 
CoW copies or, in the near future, violating the constraints of move-only types 
or the borrow model. There are two broad approaches to doing this:

Continuation-based access

The accessor that begins the formal access can return a continuation closure to 
call when the access is completed. This is the approach Swift currently uses 
with its materializeForSet accessor pattern. A property under this model 
compiles to something like this C code:

struct X {
  // getter for Y
  Y (*getY)(const X *self);
  // setter for Y
  void (*setY)(X *self, Y newValue);
  // materializeForSet for Y
  struct MaterializeForSetReturn {
    // The projected field
    Y *y;
    // The function that finishes the access, or null
    void (*finish)(X *self, void *buffer);
  }; 
  struct MaterializeForSetReturn (*materalizeForSettingY)(X *self, void 
*buffer);
};

void callFooOnXDotYDotZ(X *x) {
  // To compile:
  //   foo(&x.y.z)
  // 
  // - call the materializeForSet accessor to begin the access to y, giving it 
a buffer
  //   to store a computed value or capture state to pass to the end of the 
access
  char alignas(Y) bufferY[sizeof(Y) + 3*sizeof(void*)];
  auto materializedY = x->materializeForSettingY(x, bufferY);
  // - call materializeForSet to begin access to z
  char alignas(Z) bufferZ[sizeof(Z) + 3*sizeof(void*)];
  autom materializedZ = 
materializedY.y->materializeForSettingZ(materializedY.y, bufferZ);
  // - perform the access
  foo(materializedZ.z);
  // - finish the accesses, if we were given a finishing callback
  if (materializedZ.finish)
    materializedZ.finish(materializedY.y, bufferZ);
  if (materializedY.finish)
    materializedY.finish(x, bufferY);
}
A stored property can be exposed through this interface by trivially returning 
the projected address of the subobject, with no continuation:

struct MaterializeForSetReturn
materializeForSettingStoredY(X *self, void *buffer) {
  return (struct MaterializeForSetReturn){&self->y, NULL};
}
whereas a computed property can call the getter, store the computed value into 
the buffer, and return a continuation function that calls the setter:

struct MaterializeForSetReturn
materializeForSettingComputedY(X *self, void *buffer) {
  Y oldValue = getY(self);
  memcpy(buffer, &oldValue, sizeof(Y));
  return (struct MaterializeForSetReturn){(Y *)buffer, finishSettingComputedY};
}
void finishSettingComputedY(X *self, void *buffer) {
  Y newValue;
  memcpy(&newValue, buffer, sizeof(Y));
  setY(self, newValue);
}
A benefit of this approach is that it maintains the integrity of the source 
function in the compiled code. On the other hand, in order for the property 
implementation to transfer state from the start to the completion of the 
access, the caller must preallocate some stack space; if it’s too much, the 
space is wasted, and if it’s not enough, the property implementation must use 
malloc to get more space for itself. (It’s theoretically possible to avoid this 
with some clever stack pointer accounting.) There’s also a code size impact on 
the caller, which needs to bracket the inout access with a call on each side. 
The underlying CPU has to execute three or five branches per segment of the 
access path (calling and returning from materializeForSet, testing for null, 
and potentially calling and returning from the continuation if there is one).

Nested functions

Alternatively, we can implement the formal access pattern as a series of nested 
functions. A property under this model compiles to something like this C code:

struct X {
  // getter for Y
  Y (*getY)(const X *self);
  // setter for Y
  void (*setY)(X *self, Y newValue);
  // mutator for Y
  void *(*mutateY)(X *self, void *context, void *(*innerMutation)(Y *y, void 
*context));
};

void callFooOnXDotYDotZ(X *x) {
  // To compile:
  //   foo(&x.y)
  // - call the mutate accessor for y, with the inner access as a
  //   function whose pointer gets passed in
  x->mutateY(x, NULL, callFooOnXDotY_inner_1);
}
void callFooOnXDotYDotZ_inner_1(Y *y, void *context) {
  // - call the mutate accessor for z
  y->mutateZ(y, context, callFooOnXDotY_inner_2);
}
void callFooOnXDotYDotZ_inner_2(Z *z, void *context) {
  foo(z);
}
A stored property can implement mutate by projecting the physical address of 
the subobject, tail-calling the subsequent inner function:

void *mutateStoredY(X *self, void *context, void *(*innerMutation)(Y *y, void 
*context)) {
  /*tail*/ return innerMutation(&self->y, context);
}
and a computed property can bracket the inner mutation in getter and setter 
calls:

void *mutateComputedY(X *self, void *context, void *(*innerMutation)(Y *y, void 
*context)) {
  Y value = getY(self);
  void *result = innerMutation(&value, context);
  setY(self, value);
  return result;
}
This makes things easier for the property implementation, which can easily save 
as much state as it needs to on the stack instead of relying on a preallocated 
buffer handed down from the caller. There’s one fewer branch necessary per 
segment of the access path compared to the continuation-based model, either 
three if the accessor has no reconciliation necessary and can tail-call the 
inner function, or four if it does a normal call. On the other hand, the caller 
function has to be broken up into many smaller subfunctions, potentially one 
for every segment of an access path.

Letting nested functions walk the access path

We can reduce the number of nested functions that are needed for a nested 
access path by designing a convention that allows each mutator to directly call 
the mutator for the next segment of access. For example, the caller could 
construct the entire access path as an array of pointers to mutation functions:

typedef void *(*MutatorFunction)(void *base, MutatorFunction *accessPath, void 
*context);
void callFooOnXDotYDotZDotW(X *x) {
  // foo(&x.y.z.w)
  MutatorFunction accessPath[] = {
    (MutatorFunction)mutateZ,
    (MutatorFunction)mutateW,
    (MutatorFunction)callFooOnXDotYDotZDotW_inner
  };
  mutateY(x, accessPath, NULL);
}
Each property accessor function then loads the next step of the access out of 
the path, and advances the pointer forward for the nested access:

void *mutateY(X *self, MutatorFunction *accessPath, void *context) {
  // Project out y
  Y *y = self->y;
  // Call the next step of the access path, passing the remainder of the access 
path
  return (*accessPath)(y, accessPath + 1, context);
}
And the inner function at the end of the access path ignores the access path 
parameter and performs the access:

void *callFooOnXDotYDotZDotW_inner(W *w, MutatorFunction *_ignored, void 
*context) {
  foo(w);
}
In this model, only one inner function needs to be broken out of the caller for 
the innermost inout access, at least for simple property projections. There are 
only two branches needed per segment, or one for segments that can tail-call 
the inner access. Parameterized segments such as subscripts can be supported by 
storing the parameters in the access path buffer as well, and having the 
subscript accessor advance past the parameters when it consumes them. For 
example:

void callFooOnXDotYSubIDotZ(X *x, intptr_t i) {
  // foo(&x.y[i].z)
  uintptr_t accessPath[] = {
    (uintptr_t)mutateYSubscript,
    (uintptr_t)i,
    (uintptr_t)mutateZ,
    (uintptr_t)callFooOnXDotYSubIDotZ_inner
  };
  
  mutateY(x, accessPath, NULL);
}

void *mutateYSubscript(Y *y, uintptr_t *accessPath, void *context) {
  // Get the subscript index out of the access path
  intptr_t i = (intptr_t)*accessPath++;
  // Get the next step of the access out of the access path
  MutatorFunction next = (MutatorFunction)*accessPath++;
  
  return next(&y->elements[i], accessPath, context);
}
On the other hand, this does push the inner access functions and subscript 
arguments onto the stack, which is a small amount of overhead. Furthermore, the 
caller function has to push any state it needs to pass from outside the access 
to inside on the stack as well, if it isn’t able to cram it otherwise into a 
single context pointer argument.

Machine-level calling convention tricks

To be able to feed a return value from the inner access back to the outer 
caller, nested mutators would need to preserve all of the return registers 
potentially used by the inner access on the way out of the access. To minimize 
the impact on the outer caller, the convention for mutators could also specify 
that they need to preserve the contents of callee-save registers when they call 
down to nested accesses. This should allow as much state to be kept in 
registers between the outer caller and inner access as if they were naturally 
in the same function, with at most a function call in between.

With these sorts of optimizations, I think the nested function approach has the 
potential to be competitive with, or even more efficient than, our current 
continuation-based materializeForSet design.

-Joe

_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev

[swift-dev] Continuation-based versus nested-function-based materializeForSet

Reply via email to