> On Feb 4, 2017, at 2:35 AM, Andrew Trick via swift-dev <swift-dev@swift.org> 
> wrote:
> 
> 
>> On Feb 3, 2017, at 9:37 PM, John McCall <rjmcc...@apple.com 
>> <mailto:rjmcc...@apple.com>> wrote:
>> 
>>>> IV. The function that performs the lookup:
>>>>  IV1) is parameterized by an isa
>>>>  IV2) is not parameterized by an isa
>>>> IV1 allows the same function to be used for super-dispatch but requires 
>>>> extra work to be inlined at the call site (possibly requiring a chain of 
>>>> resolution function calls).
>>> 
>>> In my first message I was trying to accomplish IV1. But IV2 is simpler
>>> and I can't see a fundamental advantage to IV1.
>> 
>> Well, you can use IV1 to implement super dispatch (+ sibling dispatch, if we 
>> add it)
>> by passing in the isa of either the superclass or the current class.  IV2 
>> means
>> that the dispatch function is always based on the isa from the object, so 
>> those
>> dispatch schemes need something else to implement them.
>> 
>>> Why would it need a lookup chain?
>> 
>> Code size, because you might not want to inline the isa load at every call 
>> site.
>> So, for a normal dispatch, you'd have an IV2 function (defined client-side?)
>> that just loads the isa and calls the IV1 function (defined by the class).
> 
> Right. Looks like I wrote the opposite of what I meant. The important thing 
> to me is that the vtable offset load + check is issued in parallel with the 
> isa load. I was originally pushing IV2 for this reason, but now think that 
> optimization could be entirely lazy via a client-side cache.

Is this client-side cache per-image or per-callsite? 


>> So we'd almost certainly want a client-side resolver function that handled
>> the normal case.  Is that what you mean when you say II1+II2?  So the local
>> resolver would be I2; II1; III2; IV2; V1, which leaves us with a 
>> three-instruction
>> call sequence, which I think is equivalent to Objective-C, and that function
>> would do this sequence:
>> 
>> define @local_resolveMethodAddress(%object, %method_index)
>>   %raw_isa = load %object                        // 1 instruction
>>   %isa_mask = load @swift_isaMask                // 3: 2 to materialize 
>> address from GOT (not necessarily with ±1MB), 1 to load from it
>>   %isa = and %raw_isa, %isa_mask                 // 1
>>   %cache_table = @local.A.cache_table            // 2: not necessarily 
>> within ±1MB
>>   %cache = add %cache_table, %method_index * 8   // 1
>>   tailcall @A.resolveMethod(%isa, %method_index, %cache)  // 1
>> 
>> John.
> 
> Yes, exactly, except we haven’t even done any client-side vtable optimization 
> yet.
> 
> To me the point of the local cache is to avoid calling @A.resolveMethod in 
> the common case. So we need another load-compare-and-branch, which makes the 
> local helper 12-13 instructions. Then you have the vtable load itself, so 
> that’s 13-14 instructions. You would be saving on dynamic instructions but 
> paying with 4 extra static instructions per class.
> 
> It would be lame if we can't force @local.A.cache_table to be ±1MB relative 
> to the helper.

You should assume that code and data are far apart from each other. The linker 
will optimize two-instruction far loads to a nop and a near load if they are in 
fact close together, but in full-size apps that is uncommon and in the dyld 
shared cache it never happens. (The shared cache deliberately separates all 
code from all data in order to save VM map entries.)


-- 
Greg Parker     gpar...@apple.com     Runtime Wrangler


_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to