2015-03-24 17:08 GMT-04:00 George Dunlap <george.dun...@eu.citrix.com>:
> On Tue, Mar 24, 2015 at 3:27 PM, Meng Xu <xumengpa...@gmail.com> wrote: > >> The simplest way to get your prototype working, in that case, would be > >> to return the idle vcpu for that pcpu if the guest is blocked. > > > > > > Exactly! Thank you so much for pointing this out! I did hardwired it > always > > to return the vcpu that is supposed to be blocked. Now I totally > understand > > what happened. :-) > > > > But this lead to another issue to my design: > > If I return the idle vcpu when the dedicated VCPU is blocked, it will do > the > > context_switch(prev, next); when the dedicated VCPU is unblocked, another > > context_switch() is triggered. > > It means that we can not eliminate the context_switch overhead for the > > dedicated CPU. > > The ideal performance for the dedicated VCPU on the dedicated CPU should > be > > super-close to the bare-metal CPU. Here we still have the context_switch > > overhead, which is about 1500-2000 cycles. > > > > Can we avoid the context switch overhead? > > If you look at xen/arch/x86/domain.c:context_switch(), you'll see that > it's already got clever algorithms for avoiding as much context switch > work as possible. In particular, __context_switch() (which on x86 > does the actual work of context switching) won't be called when > switching *into* the idle vcpu; nor will it be called if you're > switching from the idle vcpu back to the vcpu it switched away from > (curr_vcpu == next). Not familiar with the arm path, but hopefully > they do something similar. > > IOW, a context switch to the idle domain isn't really a context switch. :-) > I see. > > > However, because credit2 scheduler counts the credit in domain level, the > > function of counting the credit burned should not be avoided. > > Actually, that's not true. In credit2, the weight is set at a domain > level, but that only changes the "burn rate". Individual vcpus are > assigned and charged their own credits; and credit of a vcpu in one > runqueue has no comparison to or direct effect on the credit of a vcpu > in another runqueue. It wouldn't be at all inconsistent to simply not > do the credit calculation for a "dedicated" vcpu. The effect on other > vcpus would be exactly the same as having that vcpu on a runqueue by > itself. > I see. If the accounting of the budget is per-vcpu level, then we don't need to keep accounting the budget burn for the dedicated VCPU. We just need to restore/reenable the accounting mechanism for the dedicated VCPU when it is changed from dedicated to non-dedicated. But this is not a key issue for the current design, anyway. I will first do it for RTDS scheduler and measure the performance and if it works great, I will do it for the credit2/credit scheduler. :-) > > >> But it's not really accurate to say > >> that you're avoiding the scheduler entirely. At the moment, as far as > >> I can tell, you're still going through all the normal schedule.c > >> machinery between wake-up and actually running the vm; and the normal > >> machinery for interrupt delivery. > > > > > > Yes. :-( > > Ideally, I want to isolate all such interference from the dedicated CPU > so > > that the dedicated VCPU on it will have the high-performance that is > close > > to the bare-metal cpu. However, I'm concerning about how complex it will > be > > and how it will affect the existing functions that relies on interrupts. > > Right; so there are several bits of overhead you might address: > > 1. The overhead of scheduling calculations -- credit, load balancing, > sorting lists, &c; and regular scheduling interrupts. > > 2. The overhead in the generic code of having the flexibility to run > more than one vcpu. This would (probably) be measured in the number > of instructions from a waking interrupt to actually running the guest > OS handler. > > 3. The maintenance things that happen in softirq context, like > periodic clock synchronization, &c. > > Addressing #1 is fairly easy. The most simple thing to do would be to > make a new scheduler and use cpupools; but it shouldn't be terribly > difficult to build the functionality within existing schedulers. > Right. > > My guess is that #2 would involve basically rewriting a parallel set > of entry / exit routines which were pared down to an absolute minimum, > and then having machinery in place to switch a CPU to use those > routines (with a specific vcpu) rather than the current, more > fully-functional ones. It might also require cutting back on the > functionality given to the guest as well in terms of hypecalls -- > making this "minimalist" Xen environment work with all the existing > hypercalls might be a lot of work. > > That sounds like a lot of very complicated work, and before you tried > it I think you'd want to be very much convinced that it would pay off > in terms of reduced wake-up latency. Getting from 5000 cycles down to > 1000 cycles might be worth it; getting from 1400 cycles down to 1000, > or 5000 cycles down to 4600, maybe not so much. :-) > Exactly! I will do some measurement on the overhead in #2 before I really try to do it. Since #1 is fairly easy, I will first implement #1 and see how much gap it remains to achieve the bare-metal performance. > > I'm not sure exactly what #3 would entail; it might involve basically > taking the cpu offline from Xen's perspective. (Again, not sure if > it's possible or worth it.) > > You might take a look at this presentation from FOSDEM last year, to > see if you can get any interesting ideas: > > https://archive.fosdem.org/2014/schedule/event/virtiaas13/ Thank you very much for sharing this video! It is very interesting. In my mind, to really eliminate those softirq, we have to remap/redirect those interrupts to other cores. I'm unsure about the difficulty it is and the benefits it may bring. :-( Thank you very much! Best, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel