On 09/17/2012 12:39 PM, Henri Roosen wrote: > On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix > <gilles.chanteperd...@xenomai.org> wrote: >> On 09/17/2012 11:42 AM, Jan Kiszka wrote: >>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote: >>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote: >>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote: >>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote: >>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote: >>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote: >>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> looking at x86 latencies, I found that what was taking long on my >>>>>>>>>> atom >>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I >>>>>>>>>> experimented >>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the >>>>>>>>>> "task >>>>>>>>>> priority" register. This seems to improve latencies on my atom: >>>>>>>>>> >>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png >>>>>>>>>> >>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low >>>>>>>>>> priority sets, the final implementation would use >>>>>>>>>> ipipe_enable_irqdesc >>>>>>>>>> to detect a high priority domain, and change the vector at that time. >>>>>>>>>> >>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, >>>>>>>>>> but >>>>>>>>>> it generates spurious interrupts (I do not know if it really is a >>>>>>>>>> matter, as handling a spurious interrupt is still faster than >>>>>>>>>> masking an >>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a >>>>>>>>>> documented behaviour of the LAPIC. >>>>>>>>>> >>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow >>>>>>>>>> IO-APIC the exception more than the rule, or having to split the >>>>>>>>>> vector >>>>>>>>>> space appears too great a restriction? >>>>>>>>> >>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI >>>>>>>>> devices - likely what we are primarily interesting in here - due to >>>>>>>>> MSI. >>>>>>>> >>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the >>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA >>>>>>>> driver (IOW, non PCI devices). >>>>>>> >>>>>>> Those are all PCI as well. And modern chipsets include variants of them >>>>>>> with MSI(-X) support. >>>>>>> >>>>>>>> >>>>>>>> atom login: root >>>>>>>> # cat /proc/interrupts >>>>>>>> CPU0 CPU1 >>>>>>>> 0: 41 0 IO-APIC-edge timer >>>>>>>> 4: 39 0 IO-APIC-edge serial >>>>>>>> 9: 0 0 IO-APIC-fasteoi acpi >>>>>>>> 14: 0 0 IO-APIC-edge ata_piix >>>>>>>> 15: 0 0 IO-APIC-edge ata_piix >>>>>>>> 16: 0 0 IO-APIC-fasteoi uhci_hcd:usb5 >>>>>>>> 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb4 >>>>>>>> 19: 0 0 IO-APIC-fasteoi ata_piix, uhci_hcd:usb3 >>>>>>>> 23: 6598 0 IO-APIC-fasteoi ehci_hcd:usb1, >>>>>>>> uhci_hcd:usb2 >>>>>>>> 43: 2704 0 PCI-MSI-edge eth0 >>>>>>>> 44: 249 0 PCI-MSI-edge snd_hda_intel >>>>>>>> NMI: 0 0 Non-maskable interrupts >>>>>>>> LOC: 661 644 Local timer interrupts >>>>>>>> SPU: 0 0 Spurious interrupts >>>>>>>> PMI: 0 0 Performance monitoring interrupts >>>>>>>> IWI: 0 0 IRQ work interrupts >>>>>>>> RTR: 0 0 APIC ICR read retries >>>>>>>> RES: 1582 2225 Rescheduling interrupts >>>>>>>> CAL: 26 48 Function call interrupts >>>>>>>> TLB: 10 19 TLB shootdowns >>>>>>>> ERR: 0 >>>>>>>> MIS: 0 >>>>>>>> >>>>>>>> I do not think peripherals integrated to chipsets can really be >>>>>>>> considered "legacy". And they tend to be used in the field... >>>>>>> >>>>>>> The good news is that, even on your low-end atom, you can avoid those >>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one >>>>>>> core and the RT on the other. That's getting easier and easier due to >>>>>>> the inflation of cores. >>>>>> >>>>>> What if you want to use RTUSB for instance? >>>>> >>>>> Then I will likely not worry about a few micros of additional latency >>>>> due to IO-APIC accesses. >>>> >>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it, >>>> takes 10us in UP, and 20us in SMP (with the tracer on). >>> >>> ...and on more appropriate chipsets? I bet the Atom is (once again) off >>> here. >> >> I do not know, do you care for sharing your traces with us? I only run >> Xenomai on atom (which I am not sure do not qualify as "modern", new >> atoms seem to be produced), geode (ok, this one is definitely dead, but >> there seem to be people still running xenomai on them), and an old >> pentium III with an old VIA686 chipset, where masking the IO-APIC is >> even slower than acking the i8259. >> >> Anyway, the IO-APIC registers accesses does not look designed for speed: >> it has an indirect scheme that seem more designed to save space in the >> processor mapping and to be configured once and for all when >> enabling/disabling interrupt, not at each and every interrupt. >> >> The point is: people may want to use Xenomai on atoms. We do not really >> know on what kind of x86 people run xenomai, knowing that would help us >> directing our efforts. > > We are currently investigating whether we can use Atom's for our > future products. We have to stick to the x86 architecture and our > products should work without big cooling fans. Currently running tests > on Atom D2700 (which I know is EOL, but for research purposes should > give us a good indication). > > A 20us latency gain is a lot and would be very welcome in our system!
If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in /proc/interrupts? -- Gilles. _______________________________________________ Xenomai mailing list Xenomai@xenomai.org http://www.xenomai.org/mailman/listinfo/xenomai