We currently abstract over mutable property accesses using what I’ll call a
continuation-based model–the materializeForSet accessor is called before an
inout access, and returns a continuation callback that must be called when the
inout access is completed. I think a nested-function-based model, if designed
correctly, has the potential to be more efficient and easier to implement.
As background for anyone intimately familiar with the implementation, Swift’s
language model for mutable properties requires a reconciliation step after a
mutation occurs. For example, in an inout access like this:
foo(&x.y.z)
the y or z property may be computed, requiring a getter call before the call to
foo and a setter call afterward, or either property may have willSet or didSet
observers that fire after foo finishes. If the base x is a class, protocol, or
generic value, the implementation of y can`t be statically known, so we need an
abstract interface for mutating the property, projecting out the value at the
start of the access and committing the mutated value back at the end. Ideally,
the interface should accommodate efficient access to both stored and computed
properties, avoiding unnecessary copying of stored properties to avoid inducing
CoW copies or, in the near future, violating the constraints of move-only types
or the borrow model. There are two broad approaches to doing this:
Continuation-based access
The accessor that begins the formal access can return a continuation closure to
call when the access is completed. This is the approach Swift currently uses
with its materializeForSet accessor pattern. A property under this model
compiles to something like this C code:
struct X {
// getter for Y
Y (*getY)(const X *self);
// setter for Y
void (*setY)(X *self, Y newValue);
// materializeForSet for Y
struct MaterializeForSetReturn {
// The projected field
Y *y;
// The function that finishes the access, or null
void (*finish)(X *self, void *buffer);
};
struct MaterializeForSetReturn (*materalizeForSettingY)(X *self, void
*buffer);
};
void callFooOnXDotYDotZ(X *x) {
// To compile:
// foo(&x.y.z)
//
// - call the materializeForSet accessor to begin the access to y, giving it
a buffer
// to store a computed value or capture state to pass to the end of the
access
char alignas(Y) bufferY[sizeof(Y) + 3*sizeof(void*)];
auto materializedY = x->materializeForSettingY(x, bufferY);
// - call materializeForSet to begin access to z
char alignas(Z) bufferZ[sizeof(Z) + 3*sizeof(void*)];
autom materializedZ =
materializedY.y->materializeForSettingZ(materializedY.y, bufferZ);
// - perform the access
foo(materializedZ.z);
// - finish the accesses, if we were given a finishing callback
if (materializedZ.finish)
materializedZ.finish(materializedY.y, bufferZ);
if (materializedY.finish)
materializedY.finish(x, bufferY);
}
A stored property can be exposed through this interface by trivially returning
the projected address of the subobject, with no continuation:
struct MaterializeForSetReturn
materializeForSettingStoredY(X *self, void *buffer) {
return (struct MaterializeForSetReturn){&self->y, NULL};
}
whereas a computed property can call the getter, store the computed value into
the buffer, and return a continuation function that calls the setter:
struct MaterializeForSetReturn
materializeForSettingComputedY(X *self, void *buffer) {
Y oldValue = getY(self);
memcpy(buffer, &oldValue, sizeof(Y));
return (struct MaterializeForSetReturn){(Y *)buffer, finishSettingComputedY};
}
void finishSettingComputedY(X *self, void *buffer) {
Y newValue;
memcpy(&newValue, buffer, sizeof(Y));
setY(self, newValue);
}
A benefit of this approach is that it maintains the integrity of the source
function in the compiled code. On the other hand, in order for the property
implementation to transfer state from the start to the completion of the
access, the caller must preallocate some stack space; if it’s too much, the
space is wasted, and if it’s not enough, the property implementation must use
malloc to get more space for itself. (It’s theoretically possible to avoid this
with some clever stack pointer accounting.) There’s also a code size impact on
the caller, which needs to bracket the inout access with a call on each side.
The underlying CPU has to execute three or five branches per segment of the
access path (calling and returning from materializeForSet, testing for null,
and potentially calling and returning from the continuation if there is one).
Nested functions
Alternatively, we can implement the formal access pattern as a series of nested
functions. A property under this model compiles to something like this C code:
struct X {
// getter for Y
Y (*getY)(const X *self);
// setter for Y
void (*setY)(X *self, Y newValue);
// mutator for Y
void *(*mutateY)(X *self, void *context, void *(*innerMutation)(Y *y, void
*context));
};
void callFooOnXDotYDotZ(X *x) {
// To compile:
// foo(&x.y)
// - call the mutate accessor for y, with the inner access as a
// function whose pointer gets passed in
x->mutateY(x, NULL, callFooOnXDotY_inner_1);
}
void callFooOnXDotYDotZ_inner_1(Y *y, void *context) {
// - call the mutate accessor for z
y->mutateZ(y, context, callFooOnXDotY_inner_2);
}
void callFooOnXDotYDotZ_inner_2(Z *z, void *context) {
foo(z);
}
A stored property can implement mutate by projecting the physical address of
the subobject, tail-calling the subsequent inner function:
void *mutateStoredY(X *self, void *context, void *(*innerMutation)(Y *y, void
*context)) {
/*tail*/ return innerMutation(&self->y, context);
}
and a computed property can bracket the inner mutation in getter and setter
calls:
void *mutateComputedY(X *self, void *context, void *(*innerMutation)(Y *y, void
*context)) {
Y value = getY(self);
void *result = innerMutation(&value, context);
setY(self, value);
return result;
}
This makes things easier for the property implementation, which can easily save
as much state as it needs to on the stack instead of relying on a preallocated
buffer handed down from the caller. There’s one fewer branch necessary per
segment of the access path compared to the continuation-based model, either
three if the accessor has no reconciliation necessary and can tail-call the
inner function, or four if it does a normal call. On the other hand, the caller
function has to be broken up into many smaller subfunctions, potentially one
for every segment of an access path.
Letting nested functions walk the access path
We can reduce the number of nested functions that are needed for a nested
access path by designing a convention that allows each mutator to directly call
the mutator for the next segment of access. For example, the caller could
construct the entire access path as an array of pointers to mutation functions:
typedef void *(*MutatorFunction)(void *base, MutatorFunction *accessPath, void
*context);
void callFooOnXDotYDotZDotW(X *x) {
// foo(&x.y.z.w)
MutatorFunction accessPath[] = {
(MutatorFunction)mutateZ,
(MutatorFunction)mutateW,
(MutatorFunction)callFooOnXDotYDotZDotW_inner
};
mutateY(x, accessPath, NULL);
}
Each property accessor function then loads the next step of the access out of
the path, and advances the pointer forward for the nested access:
void *mutateY(X *self, MutatorFunction *accessPath, void *context) {
// Project out y
Y *y = self->y;
// Call the next step of the access path, passing the remainder of the access
path
return (*accessPath)(y, accessPath + 1, context);
}
And the inner function at the end of the access path ignores the access path
parameter and performs the access:
void *callFooOnXDotYDotZDotW_inner(W *w, MutatorFunction *_ignored, void
*context) {
foo(w);
}
In this model, only one inner function needs to be broken out of the caller for
the innermost inout access, at least for simple property projections. There are
only two branches needed per segment, or one for segments that can tail-call
the inner access. Parameterized segments such as subscripts can be supported by
storing the parameters in the access path buffer as well, and having the
subscript accessor advance past the parameters when it consumes them. For
example:
void callFooOnXDotYSubIDotZ(X *x, intptr_t i) {
// foo(&x.y[i].z)
uintptr_t accessPath[] = {
(uintptr_t)mutateYSubscript,
(uintptr_t)i,
(uintptr_t)mutateZ,
(uintptr_t)callFooOnXDotYSubIDotZ_inner
};
mutateY(x, accessPath, NULL);
}
void *mutateYSubscript(Y *y, uintptr_t *accessPath, void *context) {
// Get the subscript index out of the access path
intptr_t i = (intptr_t)*accessPath++;
// Get the next step of the access out of the access path
MutatorFunction next = (MutatorFunction)*accessPath++;
return next(&y->elements[i], accessPath, context);
}
On the other hand, this does push the inner access functions and subscript
arguments onto the stack, which is a small amount of overhead. Furthermore, the
caller function has to push any state it needs to pass from outside the access
to inside on the stack as well, if it isn’t able to cram it otherwise into a
single context pointer argument.
Machine-level calling convention tricks
To be able to feed a return value from the inner access back to the outer
caller, nested mutators would need to preserve all of the return registers
potentially used by the inner access on the way out of the access. To minimize
the impact on the outer caller, the convention for mutators could also specify
that they need to preserve the contents of callee-save registers when they call
down to nested accesses. This should allow as much state to be kept in
registers between the outer caller and inner access as if they were naturally
in the same function, with at most a function call in between.
With these sorts of optimizations, I think the nested function approach has the
potential to be competitive with, or even more efficient than, our current
continuation-based materializeForSet design.
-Joe_______________________________________________
swift-dev mailing list
[email protected]
https://lists.swift.org/mailman/listinfo/swift-dev