> On Feb 6, 2017, at 9:02 AM, Greg Parker <gpar...@apple.com> wrote: > >> >> On Feb 4, 2017, at 2:35 AM, Andrew Trick via swift-dev <swift-dev@swift.org >> <mailto:swift-dev@swift.org>> wrote: >> >> >>> On Feb 3, 2017, at 9:37 PM, John McCall <rjmcc...@apple.com >>> <mailto:rjmcc...@apple.com>> wrote: >>> >>>>> IV. The function that performs the lookup: >>>>> IV1) is parameterized by an isa >>>>> IV2) is not parameterized by an isa >>>>> IV1 allows the same function to be used for super-dispatch but requires >>>>> extra work to be inlined at the call site (possibly requiring a chain of >>>>> resolution function calls). >>>> >>>> In my first message I was trying to accomplish IV1. But IV2 is simpler >>>> and I can't see a fundamental advantage to IV1. >>> >>> Well, you can use IV1 to implement super dispatch (+ sibling dispatch, if >>> we add it) >>> by passing in the isa of either the superclass or the current class. IV2 >>> means >>> that the dispatch function is always based on the isa from the object, so >>> those >>> dispatch schemes need something else to implement them. >>> >>>> Why would it need a lookup chain? >>> >>> Code size, because you might not want to inline the isa load at every call >>> site. >>> So, for a normal dispatch, you'd have an IV2 function (defined client-side?) >>> that just loads the isa and calls the IV1 function (defined by the class). >> >> Right. Looks like I wrote the opposite of what I meant. The important thing >> to me is that the vtable offset load + check is issued in parallel with the >> isa load. I was originally pushing IV2 for this reason, but now think that >> optimization could be entirely lazy via a client-side cache. > > Is this client-side cache per-image or per-callsite?
Per-image, with up to one cache entry per imported method to hold the vtable offset. -Andy >>> So we'd almost certainly want a client-side resolver function that handled >>> the normal case. Is that what you mean when you say II1+II2? So the local >>> resolver would be I2; II1; III2; IV2; V1, which leaves us with a >>> three-instruction >>> call sequence, which I think is equivalent to Objective-C, and that function >>> would do this sequence: >>> >>> define @local_resolveMethodAddress(%object, %method_index) >>> %raw_isa = load %object // 1 instruction >>> %isa_mask = load @swift_isaMask // 3: 2 to materialize >>> address from GOT (not necessarily with ±1MB), 1 to load from it >>> %isa = and %raw_isa, %isa_mask // 1 >>> %cache_table = @local.A.cache_table // 2: not necessarily >>> within ±1MB >>> %cache = add %cache_table, %method_index * 8 // 1 >>> tailcall @A.resolveMethod(%isa, %method_index, %cache) // 1 >>> >>> John. >> >> Yes, exactly, except we haven’t even done any client-side vtable >> optimization yet. >> >> To me the point of the local cache is to avoid calling @A.resolveMethod in >> the common case. So we need another load-compare-and-branch, which makes the >> local helper 12-13 instructions. Then you have the vtable load itself, so >> that’s 13-14 instructions. You would be saving on dynamic instructions but >> paying with 4 extra static instructions per class. >> >> It would be lame if we can't force @local.A.cache_table to be ±1MB relative >> to the helper. > > You should assume that code and data are far apart from each other. The > linker will optimize two-instruction far loads to a nop and a near load if > they are in fact close together, but in full-size apps that is uncommon and > in the dyld shared cache it never happens. (The shared cache deliberately > separates all code from all data in order to save VM map entries.) > > > -- > Greg Parker gpar...@apple.com <mailto:gpar...@apple.com> Runtime > Wrangler
_______________________________________________ swift-dev mailing list swift-dev@swift.org https://lists.swift.org/mailman/listinfo/swift-dev