Re: [Xenomai] IO-APIC latencies

Jan Kiszka Mon, 17 Sep 2012 05:28:05 -0700

On 2012-09-17 14:15, Henri Roosen wrote:
> On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
> <gilles.chanteperd...@xenomai.org> wrote:
>> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>>
>>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>>> <gilles.chanteperd...@xenomai.org> wrote:
>>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my 
>>>>>>>>>>>> atom
>>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I 
>>>>>>>>>>>> experimented
>>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the 
>>>>>>>>>>>> "task
>>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>>
>>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>>
>>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>>> priority sets, the final implementation would use 
>>>>>>>>>>>> ipipe_enable_irqdesc
>>>>>>>>>>>> to detect a high priority domain, and change the vector at that 
>>>>>>>>>>>> time.
>>>>>>>>>>>>
>>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA 
>>>>>>>>>>>> chipset, but
>>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than 
>>>>>>>>>>>> masking an
>>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the 
>>>>>>>>>>>> vector
>>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>>
>>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>>> devices - likely what we are primarily interesting in here - due to 
>>>>>>>>>>> MSI.
>>>>>>>>>>
>>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>>
>>>>>>>>> Those are all PCI as well. And modern chipsets include variants of 
>>>>>>>>> them
>>>>>>>>> with MSI(-X) support.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> atom login: root
>>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>>            CPU0       CPU1
>>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, 
>>>>>>>>>> uhci_hcd:usb3
>>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, 
>>>>>>>>>> uhci_hcd:usb2
>>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>>> ERR:          0
>>>>>>>>>> MIS:          0
>>>>>>>>>>
>>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>>
>>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>>> the inflation of cores.
>>>>>>>>
>>>>>>>> What if you want to use RTUSB for instance?
>>>>>>>
>>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>>> due to IO-APIC accesses.
>>>>>>
>>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>>
>>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>>> here.
>>>>
>>>> I do not know, do you care for sharing your traces with us? I only run
>>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>>> there seem to be people still running xenomai on them), and an old
>>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>>> even slower than acking the i8259.
>>>>
>>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>>> it has an indirect scheme that seem more designed to save space in the
>>>> processor mapping and to be configured once and for all when
>>>> enabling/disabling interrupt, not at each and every interrupt.
>>>>
>>>> The point is: people may want to use Xenomai on atoms. We do not really
>>>> know on what kind of x86 people run xenomai, knowing that would help us
>>>> directing our efforts.
>>>
>>> We are currently investigating whether we can use Atom's for our
>>> future products. We have to stick to the x86 architecture and our
>>> products should work without big cooling fans. Currently running tests
>>> on Atom D2700 (which I know is EOL, but for research purposes should
>>> give us a good indication).
>>>
>>> A 20us latency gain is a lot and would be very welcome in our system!
>>
>>
>> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
>> /proc/interrupts?
>>
> 
> The kernel config has no CONFIG_MSI, but instead:
> CONFIG_ARCH_SUPPORTS_MSI=y
> CONFIG_PCI_MSI=y
> 
> There is still IO-APIC-fasteoi in /proc/interrupts:
> 
> # cat /proc/interrupts
>            CPU0       CPU1
>   0:        250          0   IO-APIC-edge      timer
>   4:         71          0   IO-APIC-edge      serial
>   7:         29          0   IO-APIC-edge
>   8:          0          0   IO-APIC-edge      rtc0
>   9:          0          0   IO-APIC-fasteoi   acpi
>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>  19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>  23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
>  40:        940          0   PCI-MSI-edge      eth0
>  41:         21          0   PCI-MSI-edge      xhci_hcd
>  42:          0          0   PCI-MSI-edge      xhci_hcd
>  43:          0          0   PCI-MSI-edge      xhci_hcd
> NMI:          0          0   Non-maskable interrupts
> LOC:      29559      25129   Local timer interrupts
> SPU:          0          0   Spurious interrupts
> PMI:          0          0   Performance monitoring interrupts
> IWI:          0          0   IRQ work interrupts
> RTR:          0          0   APIC ICR read retries
> RES:         20          0   Rescheduling interrupts
> CAL:          0          8   Function call interrupts
> TLB:          9          5   TLB shootdowns
> ERR:         74
> MIS:          0


Unless you are short on CPU resources: isolcpus=1. At least bind all
Linux IRQs to one CPU. That's independent of any potential low-level
optimizations.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] IO-APIC latencies

Reply via email to