Ok, I will try to explain, correct me if I got anything wrong:

The problem here is not interrupts lost but interrupts not delivered in time.

there are basically two path to inject an interrupt into VM (or vCPU to another vCPU):
Path 1, the traditional way:
1) set bit in vlapic IRR field which represent an interrupt, then kick vcpu
      2) a VCPU_KICK_SOFTIRQ softirq raised
3) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise return and do nothing
      4) send an EVENT_CHECK_VECTOR IPI  to target vcpu
      5) target vcpu will VMEXIT due to EXIT_REASON_EXTERNAL_INTERRUPT
      6) the interrupt handler basically do nothing
      7) interrupt in IRR will be evaluated
      8) VCPU_KICK_SOFTIRQ will be cleared when do_softirq
      9) there will be an interrupt inject into vcpu when VMENTRY

Path 2, the Posted-interrupt way (current logic):
1) set bit in posted-interrupt descriptor which represent an interrupt 2) if VCPU_KICK_SOFTIRQ bit not set, then set it, otherwise return and do nothing
      3) send an POSTED_INTR_NOTIFICATION_VECTOR IPI to target vcpu
4) if target vcpu in non-ROOT mode it will receive the interrupt immediately otherwise interrupt will be injected when VMENTRY

As the first operation in both path is setting a interrupt represent bit, so no interrupts will lost.

The issue is:
in path 2, the first interrupt will cause VCPU_KICK_SOFTIRQ set to 1,
and unless a VMEXIT occured or somewhere called do_softirq directly,
VCPU_KICK_SOFTIRQ will not cleared, that will make the later interrupts injection ignored at step 2),
which will delay irq handler process in VM.

And because path 2 set VCPU_KICK_SOFTIRQ to 1, the kick vcpu logic in path 1 will also return in step 3), which make this vcpu only can handle interrupt when some other reason cause VMEXIT.

On 2015/9/8 10:46, Zhang, Yang Z wrote:
Hanweidong (Randy) wrote on 2015-09-08:
Jan Beulich wrote on ent: 2015年9月7日 22:46:
Subject: Re: [Xen-devel] [PATCH] Remove a set operation for
VCPU_KICK_SOFTIRQ when post interrupt to vm.

On 07.09.15 at 16:24, <john.liuqim...@huawei.com> wrote:
I believe this also has something to do with a windows guest boot hang
issue.

It randomly occured, when boot a guest has windows 2008 os and pv-
driver installed. The boot process hangs when wait xenstored replay
event signal.

It can be reproduced after hundreds reboot using the xen staging
branch. But after I changed this code the hang issue can not reproduce.
The change below (which I don't think was ever posted to xen-devel)
does not make any sense, as it prohibits timely delivery of guest
interrupts. If there is an issue, I think you'd need to start with
clearly
This change won't prohibit timely delivery of guest interrupts,
intead, it helps to deliver guest interrupt timely. Posted interrupt
delivery doesn't kick cpu, so it should not set VCPU_KICK_SOFTIRQ bit,
and doesn't care about if VCPU_KICK_SOFTIRQ is set or not. if
VCPU_KICK_SOFTIRQ is set, next interrupt will not be delivered due to
test_and_set_bit check. What's more, it also impacts vcpu_kick() to
kick cpu (smp_send_event_check_cpu) when VCPU_KICK_SOFTIRQ is set.
The patch seems wrong to me since the interrupt will lost in some corner cases 
with those changes. Can you explain more detail like why next interrupt will 
get lost if set the softirq here?

Best regards,
Yang





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to