2015-03-24 17:08 GMT-04:00 George Dunlap <george.dun...@eu.citrix.com>:

> On Tue, Mar 24, 2015 at 3:27 PM, Meng Xu <xumengpa...@gmail.com> wrote:
> >> The simplest way to get your prototype working, in that case, would be
> >> to return the idle vcpu for that pcpu if the guest is blocked.
> >
> >
> > Exactly! Thank you so much for pointing this out!  I did hardwired it
> always
> > to return the vcpu that is supposed to be blocked. Now I totally
> understand
> > what happened. :-)
> >
> > But this lead to another issue to my design:
> > If I return the idle vcpu when the dedicated VCPU is blocked, it will do
> the
> > context_switch(prev, next); when the dedicated VCPU is unblocked, another
> > context_switch() is triggered.
> > It means that we can not eliminate the context_switch overhead for the
> > dedicated CPU.
> > The ideal performance for the dedicated VCPU on the dedicated CPU should
> be
> > super-close to the bare-metal CPU. Here we still have the context_switch
> > overhead, which is about  1500-2000  cycles.
> >
> > Can we avoid the context switch overhead?
>
> If you look at xen/arch/x86/domain.c:context_switch(), you'll see that
> it's already got clever algorithms for avoiding as much context switch
> work as possible.  In particular, __context_switch() (which on x86
> does the actual work of context switching) won't be called when
> switching *into* the idle vcpu; nor will it be called if you're
> switching from the idle vcpu back to the vcpu it switched away from
> (curr_vcpu == next).  Not familiar with the arm path, but hopefully
> they do something similar.
>
> IOW, a context switch to the idle domain isn't really a context switch. :-)
>

​I see.
​


>
> > However, because credit2 scheduler counts the credit in domain level, the
> > function of counting the credit burned should not be avoided.
>
> Actually, that's not true.  In credit2, the weight is set at a domain
> level, but that only changes the "burn rate".  Individual vcpus are
> assigned and charged their own credits; and credit of a vcpu in one
> runqueue has no comparison to or direct effect on the credit of a vcpu
> in another runqueue.  It wouldn't be at all inconsistent to simply not
> do the credit calculation for a "dedicated" vcpu.  The effect on other
> vcpus would be exactly the same as having that vcpu on a runqueue by
> itself.
>

​I see. If the accounting of the budget is per-vcpu level, then we don't
need to keep accounting the budget burn for the dedicated VCPU. We just
need to restore/reenable the accounting ​mechanism for the dedicated VCPU
when it is changed from dedicated to non-dedicated. But this is not a key
issue for the current design, anyway. I will first do it for RTDS scheduler
and measure the performance and if it works great, I will do it for the
credit2/credit scheduler. :-)


>
> >> But it's not really accurate to say
> >> that you're avoiding the scheduler entirely.  At the moment, as far as
> >> I can tell, you're still going through all the normal schedule.c
> >> machinery between wake-up and actually running the vm; and the normal
> >> machinery for interrupt delivery.
> >
> >
> > Yes. :-(
> > Ideally, I want to isolate all such interference from the dedicated CPU
> so
> > that the dedicated VCPU on it will have the high-performance that is
> close
> > to the bare-metal cpu. However, I'm concerning about how complex it will
> be
> > and how it will affect the existing functions that relies on  interrupts.
>
> Right; so there are several bits of overhead you might address:
>
> 1. The overhead of scheduling calculations -- credit, load balancing,
> sorting lists, &c; and regular scheduling interrupts.
>
> 2. The overhead in the generic code of having the flexibility to run
> more than one vcpu.  This would (probably) be measured in the number
> of instructions from a waking interrupt to actually running the guest
> OS handler.
>
> 3. The maintenance things that happen in softirq context, like
> periodic clock synchronization, &c.
>
> Addressing #1 is fairly easy.  The most simple thing to do would be to
> make a new scheduler and use cpupools; but it shouldn't be terribly
> difficult to build the functionality within existing schedulers.
>

​Right. ​



>
> My guess is that #2 would involve basically rewriting a parallel set
> of entry / exit routines which were pared down to an absolute minimum,
> and then having machinery in place to switch a CPU to use those
> routines (with a specific vcpu) rather than the current, more
> fully-functional ones.   It might also require cutting back on the
> functionality given to the guest as well in terms of hypecalls --
> making this "minimalist" Xen environment work with all the existing
> hypercalls might be a lot of work.
>
> That sounds like a lot of very complicated work, and before you tried
> it I think you'd want to be very much convinced that it would pay off
> in terms of reduced wake-up latency.  Getting from 5000 cycles down to
> 1000 cycles might be worth it; getting from 1400 cycles down to 1000,
> or 5000 cycles down to 4600, maybe not so much. :-)
>

​Exactly! I will do some measurement on the overhead in #2 before I really
try to do it. Since #1 is fairly easy, I will first implement #1 and see
how much gap it remains to achieve the bare-metal performance.
​


>
> I'm not sure exactly what #3 would entail; it might involve basically
> taking the cpu offline from Xen's perspective.  (Again, not sure if
> it's possible or worth it.)
>
> You might take a look at this presentation from FOSDEM last year, to
> see if you can get any interesting ideas:
>
> https://archive.fosdem.org/2014/schedule/event/virtiaas13/


​Thank you very much for sharing this video! It is very interesting. In my
mind, to really eliminate those softirq, ​we have to remap/redirect those
interrupts to other cores. I'm unsure about the difficulty it is and the
benefits it may bring. :-(

​Thank you very much!​

​Best,​

​Meng​

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to