On Mon, 16 Jan 2017, André Przywara wrote:
> On 13/01/17 19:37, Stefano Stabellini wrote:
> > On Thu, 12 Jan 2017, Andre Przywara wrote:
> 
> Hi Stefano,
> 
> ...
> 
> >>>> +    list_for_each_entry(lpi_irq, &v->arch.vgic.pending_lpi_list, entry)
> >>>> +    {
> >>>> +        if ( lpi_irq->pirq.irq == lpi )
> >>>> +            return &lpi_irq->pirq;
> >>>> +
> >>>> +        if ( lpi_irq->pirq.irq == 0 && !empty )
> >>>> +            empty = lpi_irq;
> >>>> +    }
> >>>
> >>> This is another one of those cases where a list is too slow for the hot
> >>> path. The idea of allocating pending_irq struct on demand is good, but
> >>> storing them in a linked list would kill performance. Probably the best
> >>> thing we could do is an hashtable and we should preallocate the initial
> >>> array of elements. I don't know what the size of the initial array
> >>> should be, but we can start around 50, and change it in the future once
> >>> we do tests with real workloads. Of course the other key parameter is
> >>> the hash function, not sure which one is the right one, but ideally we
> >>> would never have to allocate new pending_irq struct for LPIs because the
> >>> preallocated set would suffice.
> >>
> >> As I mentioned in the last post, I expect this number to be really low
> >> (less than 5). 
> > 
> > Are you able to check this assumption in a real scenario? If not you,
> > somebody else?
> > 
> > 
> >> Let's face it: If you have multiple interrupts pending
> >> for a significant amount of time you won't make any actual progress in
> >> the guest, because it's busy with handling interrupts.
> >> So my picture of LPI handling is:
> >> 1) A device triggers an MSI, so the host receives the LPI. Ideally this
> >> will be handled by the pCPU where the right VCPU is running atm, so it
> >> will exit to EL2. Xen will handle the LPI by assigning one struct
> >> pending_irq to it and will inject it into the guest.
> >> 2) The VCPU gets to run again and calls the interrupt handler, because
> >> the (virtual) LPI is pending.
> >> 3) The (Linux) IRQ handler reads the ICC_IAR register to learn the IRQ
> >> number, and will get the virtual LPI number.
> >> => At this point the LPI is done when it comes to the VGIC. The LR state
> >> will be set to 0 (neither pending or active). This is independent of the
> >> EOI the handler will execute soon (or later).
> >> 4) On the next exit the VGIC code will discover that the IRQ is done
> >> (LR.state == 0) and will discard the struct pending_irq (set the LPI
> >> number to 0 to make it available to the next LPI).
> > 
> > I am following
> > 
> > 
> >> Even if there would be multiple LPIs pending at the same time (because
> >> the guest had interrupts disabled, for instance), I believe they can be
> >> all handled without exiting. Upon EOIing (priority-dropping, really) the
> >> first LPI, the next virtual LPI would fire, calling the interrupt
> >> handler again, and so no. Unless the kernel decides to do something that
> >> exits (even accessing the hardware normally wouldn't, I believe), we can
> >> clear all pending LPIs in one go.
> >>
> >> So I have a hard time to imagine how we can really have many LPIs
> >> pending and thus struct pending_irqs allocated.
> >> Note that this may differ from SPIs, for instance, because the IRQ life
> >> cycle is more complex there (extending till the EOI).
> >>
> >> Does that make some sense? Or am I missing something here?
> > 
> > In my tests with much smaller platforms than the ones existing today, I
> > could easily have 2-3 interrupts pending at the same time without much
> > load and without any SR-IOV NICs or any other fancy PCIE hardware.
> 
> The difference to LPIs is that SPIs can be level triggered (eventually
> requiring a driver to delete the interrupt condition in the device),
> also require an explicit deactivation to finish off the IRQ state machine.
> Both these things will lead to an IRQ to stay much longer in the LRs
> than one would expect for an always edge triggered LPI lacking an active
> state.
> Also the timer IRQ is a PPI and thus a frequent visitor in the LRs.
> 
> > It would be nice to test on Cavium ThunderX for example.
> 
> Yes, I agree that there is quite some guessing involved, so proving this
> sounds like a worthwhile task.
> 
> > It's also easy to switch to rbtrees.
> 
> On Friday I looked at rbtrees in Xen, which thankfully seem to be the
> same as in Linux. So I converted the its_devices list over.
> 
> But in this case here I don't believe that rbtrees are the best data
> structure, since we frequently need to look up entries, but also need to
> find new, empty ones (when an LPI has fired).
> And for allocating new LPIs we don't need a certain slot, just any free
> would do. We probably want to avoid actually malloc-ing pending_irq
> structures for that.
> 
> So a hash table with open addressing sounds like a better fit here:
> - With a clever hash function (taken for instance Linux' LPI allocation
> scheme into account) we get very quick lookup times for already assigned
> LPIs.
> - Assigning an LPI would use the same hash function, probably finding an
> unused pending_irq, which we then could easily allocate.
> 
> I will try to write something along those lines.

Thanks Andre, indeed it sounds like a better option.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to