I've been considering if keeping the shadow TLB data structure around is the right approach. To begin with, Linux does not keep an analogous structure, and it's basically the same problem: when resuming the guest after a context switch, the TLB will be mostly empty of guest entries, and suffer a high TLB miss rate until it can fault more back in.
The shadow TLB allows us to skip the repopulation phase, but at the expense of some memory and some overhead on every exit. Also, it's difficult or impossible to implement a shadow TLB on cores where software cannot address TLB entries by index (i.e. the hardware automatically selects the index). If we did remove the shadow TLB, I think we'd suffer even more TLB misses, so we'd need to really optimize our TLB miss handler. (Arguably we should already be doing that anyways.) Linux's TLB miss handler walks the Linux page tables entirely in assembly, which also removes the need to save/restore the full C ABI set of GPRs. Logically speaking, our TLB miss handler must do the following: 1. walk guest TLB(s) to find a matching entry 2. if not present, deliver fault to guest 3. if present 1. calculate guest physical address from TLB entry 2. check if the guest is allowed to access that address 3. if yes, write new TLB entry into hardware 4. if no, deliver fault to userspace I was hoping to be able implement 1 through 3.3 in assembly. Doing that would require hoisting some of the logic currently in kvmppc_mmu_map() to execute before the page fault occurs, mostly because it's implemented in C. In particular we need kvm_is_visible_gfn(), and most unfortunately gfn_to_pfn(). I started moving the logic around, but gfn_to_pfn() has me stuck because calling that on *all* gfns at memory registration time would a) pin all guest memory all the time, and b) require us to keep a large data structure to remember all the pfns. Avi suggested that we could keep a cache of "gotten" pfns (gfn_to_pfn() calls get_user_pages()). That could obviate both C calls, but now I'm not sure it's worth the development complexity. I'll have to keep thinking about those other cores without directly index-able TLBs. -- Hollis Blanchard IBM Linux Technology Center -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html