On Mon, Sep 17, 2012 at 1:14 PM, Gilles Chanteperdrix
<gilles.chanteperd...@xenomai.org> wrote:
> On 09/17/2012 12:39 PM, Henri Roosen wrote:
>
>> On Mon, Sep 17, 2012 at 12:00 PM, Gilles Chanteperdrix
>> <gilles.chanteperd...@xenomai.org> wrote:
>>> On 09/17/2012 11:42 AM, Jan Kiszka wrote:
>>>> On 2012-09-17 11:29, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 11:07 AM, Jan Kiszka wrote:
>>>>>> On 2012-09-17 10:32, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 10:18 AM, Jan Kiszka wrote:
>>>>>>>> On 2012-09-17 10:07, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/17/2012 09:43 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2012-09-17 08:30, Gilles Chanteperdrix wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> looking at x86 latencies, I found that what was taking long on my 
>>>>>>>>>>> atom
>>>>>>>>>>> was masking the fasteoi interrupts at IO-APIC level. So, I 
>>>>>>>>>>> experimented
>>>>>>>>>>> an idea: masking at LAPIC level instead of IO-APIC, by using the 
>>>>>>>>>>> "task
>>>>>>>>>>> priority" register. This seems to improve latencies on my atom:
>>>>>>>>>>>
>>>>>>>>>>> http://sisyphus.hd.free.fr/~gilles/core-3.4-latencies/atom.png
>>>>>>>>>>>
>>>>>>>>>>> This implies splitting the LAPIC vectors in a high priority and low
>>>>>>>>>>> priority sets, the final implementation would use 
>>>>>>>>>>> ipipe_enable_irqdesc
>>>>>>>>>>> to detect a high priority domain, and change the vector at that 
>>>>>>>>>>> time.
>>>>>>>>>>>
>>>>>>>>>>> This also improves the latencies on my old PIII with a VIA chipset, 
>>>>>>>>>>> but
>>>>>>>>>>> it generates spurious interrupts (I do not know if it really is a
>>>>>>>>>>> matter, as handling a spurious interrupt is still faster than 
>>>>>>>>>>> masking an
>>>>>>>>>>> IO-APIC interrupt), the spurious interrupts in that case are a
>>>>>>>>>>> documented behaviour of the LAPIC.
>>>>>>>>>>>
>>>>>>>>>>> Is there any interest in pursuing this idea, or are x86 with slow
>>>>>>>>>>> IO-APIC the exception more than the rule, or having to split the 
>>>>>>>>>>> vector
>>>>>>>>>>> space appears too great a restriction?
>>>>>>>>>>
>>>>>>>>>> Line-based interrupts are legacy, of decreasing relevance for PCI
>>>>>>>>>> devices - likely what we are primarily interesting in here - due to 
>>>>>>>>>> MSI.
>>>>>>>>>
>>>>>>>>> Even if I enable MSI, the kernel still uses these irqs for the
>>>>>>>>> peripherals integrated to the chipset, such as the USB HCI, or ATA
>>>>>>>>> driver (IOW, non PCI devices).
>>>>>>>>
>>>>>>>> Those are all PCI as well. And modern chipsets include variants of them
>>>>>>>> with MSI(-X) support.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> atom login: root
>>>>>>>>> # cat /proc/interrupts
>>>>>>>>>            CPU0       CPU1
>>>>>>>>>   0:         41          0   IO-APIC-edge      timer
>>>>>>>>>   4:         39          0   IO-APIC-edge      serial
>>>>>>>>>   9:          0          0   IO-APIC-fasteoi   acpi
>>>>>>>>>  14:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>  15:          0          0   IO-APIC-edge      ata_piix
>>>>>>>>>  16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
>>>>>>>>>  18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
>>>>>>>>>  19:          0          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
>>>>>>>>>  23:       6598          0   IO-APIC-fasteoi   ehci_hcd:usb1, 
>>>>>>>>> uhci_hcd:usb2
>>>>>>>>>  43:       2704          0   PCI-MSI-edge      eth0
>>>>>>>>>  44:        249          0   PCI-MSI-edge      snd_hda_intel
>>>>>>>>> NMI:          0          0   Non-maskable interrupts
>>>>>>>>> LOC:        661        644   Local timer interrupts
>>>>>>>>> SPU:          0          0   Spurious interrupts
>>>>>>>>> PMI:          0          0   Performance monitoring interrupts
>>>>>>>>> IWI:          0          0   IRQ work interrupts
>>>>>>>>> RTR:          0          0   APIC ICR read retries
>>>>>>>>> RES:       1582       2225   Rescheduling interrupts
>>>>>>>>> CAL:         26         48   Function call interrupts
>>>>>>>>> TLB:         10         19   TLB shootdowns
>>>>>>>>> ERR:          0
>>>>>>>>> MIS:          0
>>>>>>>>>
>>>>>>>>> I do not think peripherals integrated to chipsets can really be
>>>>>>>>> considered "legacy". And they tend to be used in the field...
>>>>>>>>
>>>>>>>> The good news is that, even on your low-end atom, you can avoid those
>>>>>>>> latencies by CPU assignment, i.e. isolating the Linux IRQ load on one
>>>>>>>> core and the RT on the other. That's getting easier and easier due to
>>>>>>>> the inflation of cores.
>>>>>>>
>>>>>>> What if you want to use RTUSB for instance?
>>>>>>
>>>>>> Then I will likely not worry about a few micros of additional latency
>>>>>> due to IO-APIC accesses.
>>>>>
>>>>> On my atom, taking an IO-APIC fasteoi interrupt, acking and masking it,
>>>>> takes 10us in UP, and 20us in SMP (with the tracer on).
>>>>
>>>> ...and on more appropriate chipsets? I bet the Atom is (once again) off
>>>> here.
>>>
>>> I do not know, do you care for sharing your traces with us? I only run
>>> Xenomai on atom (which I am not sure do not qualify as "modern", new
>>> atoms seem to be produced), geode (ok, this one is definitely dead, but
>>> there seem to be people still running xenomai on them), and an old
>>> pentium III with an old VIA686 chipset, where masking the IO-APIC is
>>> even slower than acking the i8259.
>>>
>>> Anyway, the IO-APIC registers accesses does not look designed for speed:
>>> it has an indirect scheme that seem more designed to save space in the
>>> processor mapping and to be configured once and for all when
>>> enabling/disabling interrupt, not at each and every interrupt.
>>>
>>> The point is: people may want to use Xenomai on atoms. We do not really
>>> know on what kind of x86 people run xenomai, knowing that would help us
>>> directing our efforts.
>>
>> We are currently investigating whether we can use Atom's for our
>> future products. We have to stick to the x86 architecture and our
>> products should work without big cooling fans. Currently running tests
>> on Atom D2700 (which I know is EOL, but for research purposes should
>> give us a good indication).
>>
>> A 20us latency gain is a lot and would be very welcome in our system!
>
>
> If you enable CONFIG_MSI, do you still see some IO-APIC-fasteoi in
> /proc/interrupts?
>

The kernel config has no CONFIG_MSI, but instead:
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y

There is still IO-APIC-fasteoi in /proc/interrupts:

# cat /proc/interrupts
           CPU0       CPU1
  0:        250          0   IO-APIC-edge      timer
  4:         71          0   IO-APIC-edge      serial
  7:         29          0   IO-APIC-edge
  8:          0          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 16:          0          0   IO-APIC-fasteoi   uhci_hcd:usb5
 18:          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 19:         41          0   IO-APIC-fasteoi   ata_piix, uhci_hcd:usb3
 23:       5440          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb2
 40:        940          0   PCI-MSI-edge      eth0
 41:         21          0   PCI-MSI-edge      xhci_hcd
 42:          0          0   PCI-MSI-edge      xhci_hcd
 43:          0          0   PCI-MSI-edge      xhci_hcd
NMI:          0          0   Non-maskable interrupts
LOC:      29559      25129   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          0          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:         20          0   Rescheduling interrupts
CAL:          0          8   Function call interrupts
TLB:          9          5   TLB shootdowns
ERR:         74
MIS:          0

> --
>                                                                 Gilles.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Reply via email to