Re: [Xenomai] IO-APIC latencies

Gilles Chanteperdrix Tue, 18 Sep 2012 02:12:45 -0700

On 09/18/2012 11:06 AM, Gilles Chanteperdrix wrote:
> On 09/18/2012 10:48 AM, Jan Kiszka wrote:
>> On 2012-09-17 23:50, Gilles Chanteperdrix wrote:
>>> On 09/17/2012 08:54 PM, Jan Kiszka wrote:
>>>
>>>> On 2012-09-17 20:37, Gilles Chanteperdrix wrote:
>>>>> On 09/17/2012 08:29 PM, Jan Kiszka wrote:
>>>>>
>>>>>> On 2012-09-17 20:18, Gilles Chanteperdrix wrote:
>>>>>>> On 09/17/2012 08:15 PM, Jan Kiszka wrote:
>>>>>>>
>>>>>>>> On 2012-09-17 20:12, Jan Kiszka wrote:
>>>>>>>>> On 2012-09-17 20:08, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/17/2012 08:05 PM, Jan Kiszka wrote:
>>>>>>>>>>
>>>>>>>>>>> On 2012-09-17 19:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> ipipe_end is a nop when called from primary domain, yes, but this 
>>>>>>>>>>>> is not
>>>>>>>>>>>> very different from edge irqs. Also, fasteoi become a bit like 
>>>>>>>>>>>> MSI: in
>>>>>>>>>>>> the same way as we can not mask MSI from primary domain, we should 
>>>>>>>>>>>> not
>>>>>>>>>>>> mask IO-APIC fasteoi irqs, because the cost is too prohibitive. If 
>>>>>>>>>>>> we
>>>>>>>>>>>> can live with MSI without masking them in primary mode, I guess we 
>>>>>>>>>>>> can
>>>>>>>>>>>> do the same with fasteoi irqs.
>>>>>>>>>>>
>>>>>>>>>>> MSIs are edge triggered, fasteois are still level-based. They 
>>>>>>>>>>> require
>>>>>>>>>>> masking at the point you defer them - what we do and what Linux may 
>>>>>>>>>>> even
>>>>>>>>>>> extend beyond that. If you mask them by raising the task priority, 
>>>>>>>>>>> you
>>>>>>>>>>> have to keep it raised until Linux finally handled the IRQ.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> Or you
>>>>>>>>>>> decide to mask it at IO-APIC level again.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We do not want that.
>>>>>>>>>>
>>>>>>>>>>> If you keep the TPR raised,
>>>>>>>>>>> you will block more than what Linux wants to block.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The point is that if the TPR keeps raised, it means that primary 
>>>>>>>>>> domain
>>>>>>>>>> has preempted Linux, so, we want it to keep that way. Otherwise the 
>>>>>>>>>> TPR
>>>>>>>>>> gets lowered when Linux has handled the interrupt.
>>>>>>>>>>
>>>>>>>>>> A week-end of testing made me sure of one thing: it works. I assure 
>>>>>>>>>> you.
>>>>>>>>>
>>>>>>>>> Probably, in the absence of IRQF_ONESHOT Linux interrupts. No longer 
>>>>>>>>> if
>>>>>>>>> you face threaded IRQs - I assure you.
>>>>>>>>
>>>>>>>> Well, it may work (if mask/unmask callbacks work as native) but the
>>>>>>>> benefit is gone: masking at IO-APIC level will be done again. Given 
>>>>>>>> that
>>>>>>>> threaded IRQs become increasingly popular, it will also be hard to 
>>>>>>>> avoid
>>>>>>>> them in common setups.
>>>>>>>
>>>>>>>
>>>>>>> The thing is, if we no longer use the IO-APIC spinlock from primary
>>>>>>> domain, we may not have to turn it into an ipipe_spinlock, and may be
>>>>>>> able to preempt the IO-APIC masking.
>>>>>>
>>>>>> That might be true - but is the latency related to the lock or the
>>>>>> hardware access? In the latter case, you will still stall the CPU on it
>>>>>> and have to isolate the load on a non-RT CPU again.
>>>>>>
>>>>>> BTW, the task priority for the RT domain is a quite important parameter.
>>>>>> If you put it too low, Linux can run out of vectors. If you put it too
>>>>>> high, the same may happen to Xenomai - on bigger boxes.
>>>>>
>>>>>
>>>>> Yes, and there are only 16 levels. But Xenomai does not need to many 
>>>>> levels.
>>>>
>>>> How is telling you this? It's part of the system setup. And that may
>>>> lean toward RT or toward non-RT. This level should be adjusted according
>>>> to the current allocation of Linux and the RT domain for a particular
>>>> CPU, not hard-coded or compile-time defined.
>>>
>>>
>>> In theory, I agree, in practice, lets be crasy, assume someone would
>>> want an RT serial driver with 4 irqs, an RT USB driver with 2 irqs, an
>>> RT CAN driver, and say, 4 RTnet boards. That is still less than the 16
>>> vectors that a single level provides, so, we can probably get along with
>>> 2 levels. Or we can use a kernel parameter.
>>
>> Linux - and so should we do - allocates separate levels first as that
>> provides better performance for external interrupts (need to look up the
>> precise reason, should be documented in the x86 code). Only if levels
>> are used up, interrupts will share them.
> 
> I have seen this code, and I wondered if it was not, in fact, only
> useful, where the irq flow handler were reenabling irqs (that is, before
> the removal of IRQF_DISABLED), but am really not sure.
> 
> Also, some additional results on my atom:
> the IO-APIC is on IO controller HUB, which is... an ICH4 if I read lspci
> and the datasheets correctly. And what is more, its registers are
> accessed through the (slow) LPC bus, the ISA bus replacement. It is
> probably the reason why it is so slow.
> 
> And last but not least, it is not really a multi-core processor, it has
> hyper-threading. Booting the processor in UP mode yields a much more
> reasonable latency of 23us (still with using the TPR), whereas the usual
> latency was around 30u (running the test now, will have results at
> noon), so, the real gain of using the TPR is in fact much lower than
> what originally announced. Basically, it seems with hyper threading,
> everything is doubled.
> 
> http://sisyphus.hd.free.fr/core-3.4-latencies/atom.png


The results are ready actually, so, the net gain is 6.5us over 30us,
that is around 20%.

-- 
                                            Gilles.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] IO-APIC latencies

Reply via email to