> On Feb 6, 2017, at 9:02 AM, Greg Parker <gpar...@apple.com> wrote:
> 
>> 
>> On Feb 4, 2017, at 2:35 AM, Andrew Trick via swift-dev <swift-dev@swift.org 
>> <mailto:swift-dev@swift.org>> wrote:
>> 
>> 
>>> On Feb 3, 2017, at 9:37 PM, John McCall <rjmcc...@apple.com 
>>> <mailto:rjmcc...@apple.com>> wrote:
>>> 
>>>>> IV. The function that performs the lookup:
>>>>>  IV1) is parameterized by an isa
>>>>>  IV2) is not parameterized by an isa
>>>>> IV1 allows the same function to be used for super-dispatch but requires 
>>>>> extra work to be inlined at the call site (possibly requiring a chain of 
>>>>> resolution function calls).
>>>> 
>>>> In my first message I was trying to accomplish IV1. But IV2 is simpler
>>>> and I can't see a fundamental advantage to IV1.
>>> 
>>> Well, you can use IV1 to implement super dispatch (+ sibling dispatch, if 
>>> we add it)
>>> by passing in the isa of either the superclass or the current class.  IV2 
>>> means
>>> that the dispatch function is always based on the isa from the object, so 
>>> those
>>> dispatch schemes need something else to implement them.
>>> 
>>>> Why would it need a lookup chain?
>>> 
>>> Code size, because you might not want to inline the isa load at every call 
>>> site.
>>> So, for a normal dispatch, you'd have an IV2 function (defined client-side?)
>>> that just loads the isa and calls the IV1 function (defined by the class).
>> 
>> Right. Looks like I wrote the opposite of what I meant. The important thing 
>> to me is that the vtable offset load + check is issued in parallel with the 
>> isa load. I was originally pushing IV2 for this reason, but now think that 
>> optimization could be entirely lazy via a client-side cache.
> 
> Is this client-side cache per-image or per-callsite? 

Per-image, with up to one cache entry per imported method to hold the vtable 
offset.
-Andy

>>> So we'd almost certainly want a client-side resolver function that handled
>>> the normal case.  Is that what you mean when you say II1+II2?  So the local
>>> resolver would be I2; II1; III2; IV2; V1, which leaves us with a 
>>> three-instruction
>>> call sequence, which I think is equivalent to Objective-C, and that function
>>> would do this sequence:
>>> 
>>> define @local_resolveMethodAddress(%object, %method_index)
>>>   %raw_isa = load %object                        // 1 instruction
>>>   %isa_mask = load @swift_isaMask                // 3: 2 to materialize 
>>> address from GOT (not necessarily with ±1MB), 1 to load from it
>>>   %isa = and %raw_isa, %isa_mask                 // 1
>>>   %cache_table = @local.A.cache_table            // 2: not necessarily 
>>> within ±1MB
>>>   %cache = add %cache_table, %method_index * 8   // 1
>>>   tailcall @A.resolveMethod(%isa, %method_index, %cache)  // 1
>>> 
>>> John.
>> 
>> Yes, exactly, except we haven’t even done any client-side vtable 
>> optimization yet.
>> 
>> To me the point of the local cache is to avoid calling @A.resolveMethod in 
>> the common case. So we need another load-compare-and-branch, which makes the 
>> local helper 12-13 instructions. Then you have the vtable load itself, so 
>> that’s 13-14 instructions. You would be saving on dynamic instructions but 
>> paying with 4 extra static instructions per class.
>> 
>> It would be lame if we can't force @local.A.cache_table to be ±1MB relative 
>> to the helper.
> 
> You should assume that code and data are far apart from each other. The 
> linker will optimize two-instruction far loads to a nop and a near load if 
> they are in fact close together, but in full-size apps that is uncommon and 
> in the dyld shared cache it never happens. (The shared cache deliberately 
> separates all code from all data in order to save VM map entries.)
> 
> 
> -- 
> Greg Parker     gpar...@apple.com <mailto:gpar...@apple.com>     Runtime 
> Wrangler

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to