Re: [Xenomai-help] kernel oopses when killing realtime task

Jan Kiszka Fri, 12 Nov 2010 07:43:13 -0800

Am 12.11.2010 14:57, Philippe Gerum wrote:
> On Fri, 2010-11-12 at 10:14 +0100, Jan Kiszka wrote:
>> Am 12.11.2010 09:48, Philippe Gerum wrote:
>>> On Tue, 2010-11-09 at 14:12 +0100, Jan Kiszka wrote:
>>>> Am 09.11.2010 10:36, Philippe Gerum wrote:
>>>>> On Tue, 2010-11-09 at 09:39 +0100, Jan Kiszka wrote:
>>>>>> Am 09.11.2010 09:26, Philippe Gerum wrote:
>>>>>>> On Tue, 2010-11-09 at 09:01 +0100, Jan Kiszka wrote:
>>>>>>>> Am 07.11.2010 17:22, Jan Kiszka wrote:
>>>>>>>>> Am 07.11.2010 16:15, Philippe Gerum wrote:
>>>>>>>>>> The following patches implements the teardown approach. The basic 
>>>>>>>>>> idea
>>>>>>>>>> is:
>>>>>>>>>> - neither break nor improve old setups with legacy I-pipe patches not
>>>>>>>>>> providing the revised ipipe_control_irq call.
>>>>>>>>>> - fix the SMP race when detaching interrupts.
>>>>>>>>>
>>>>>>>>> Looks good.
>>>>>>>>
>>>>>>>> This actually causes one regression: I've just learned that people are
>>>>>>>> already happily using MSIs with Xenomai in the field. This is perfectly
>>>>>>>> fine as long as you don't fiddle with rtdm_irq_disable/enable in
>>>>>>>> non-root contexts or while hard IRQs are disable. The latter 
>>>>>>>> requirement
>>>>>>>> would be violated by this fix now.
>>>>>>>
>>>>>>> What we could do is handle this corner-case in the ipipe directly, going
>>>>>>> for a nop when IRQs are off on a per-arch basis only to please those
>>>>>>> users,
>>>>>>
>>>>>> Don't we disable hard IRQs also then the root domain is the only
>>>>>> registered one? I'm worried about pushing regressions around, then to
>>>>>> plain Linux use-cases of MSI (which are not broken in anyway - except
>>>>>> for powerpc).
>>>>>
>>>>> The idea is to provide an ad hoc ipipe service for this, to be used by
>>>>> the HAL. A service that would check the controller for the target IRQ,
>>>>> and handle MSI ones conditionally. For sure, we just can't put those
>>>>> conditionally bluntly into the chip mask handler and expect the kernel
>>>>> to be happy.
>>>>>
>>>>> In fact, we already have __ipipe_enable/disable_irq from the internal
>>>>> Adeos interface avail, but they are mostly wrappers for now. We could
>>>>> make them a bit more smart, and handle the MSI issue as well. We would
>>>>> then tell the HAL to switch to using those arch-agnostic helpers
>>>>> generally, instead of peeking directly into the chip controller structs
>>>>> like today.
>>>>
>>>> This belongs to I-pipe, like we already have ipipe_end, just properly
>>>> wrapped to avoid descriptor access. That's specifically important if we
>>>> want to emulate MSI masking in software. I've the generic I-pipe
>>>> infrastructure ready, but the backend, so far consisting of x86 MSI
>>>> hardening, unfortunately needs to be rewritten.
>>>>
>>>>>
>>>>> If that ipipe "feature" is not detected by the HAL, then we would
>>>>> refrain from disabling the IRQ in xnintr_detach. In effect, this would
>>>>> leave the SMP race window open, but since we need recent ipipes to get
>>>>> it plugged already anyway (for the revised ipipe_control_irq), we would
>>>>> still remain in the current situation:
>>>>> - old patches? no SMP race fix, no regression
>>>>> - new patches? SMP race fix avail, no regression
>>>>
>>>> Sounds good.
>>>
>>> Now that I slept on it, I find the approach of working around pipeline
>>> limitations this way, to be incorrect.
>>>
>>> Basically, the issue is that we still don't have 100% reliable handling
>>> of MSI interrupts (actually, we only have partial handling, and solely
>>> for x86), but this is no reason to introduce code in the pipeline
>>> interface which would perpetuate this fact. I see this as a "all or
>>> nothing" issue: either MSI is fully handled and there shall be no
>>> restriction on applying common operations such as masking/unmasking on
>>> the related IRQs, or it is not, and we should not export "conditionally
>>> working" APIs.
>>>
>>> In the latter case, the responsibility to rely on MSI support belongs to
>>> the user, which then should know about the pending restrictions, and
>>> decides for himself whether to use MSI. So I'm heading to this solution
>>> instead:
>>>
>>> - when detaching the last handler for a given IRQ, instead of forcibly
>>> disabling the IRQ line, the nucleus would just make sure that such IRQ
>>> is already in a disabled state, and bail out on error if not (probably
>>> with a kernel warning to make the issue obvious).
>>
>> Fiddling with the IRQ "line" state is a workaround for the missing
>> synchronize_irq service in Xenomai/I-pipe. If we had this, all this
>> disabling become unneeded.
> 
> Actually, no. The synchronize irq service in the pipeline could have
> been introduced weeks ago in a snap, provided Xenomai did not call
> ipipe_virtualize_irq holding a lock with irqs off - this is in essence
> what my first patch to Pavel did, but could not work for the reason
> mentioned. That is what the patch we are discussing now is all about:
> clearing issues in Xenomai first.
> 
> Additionally, having an IRQ disabled is a requirement to properly
> synchronize that IRQ later, it is not an option. It is part of the sync
> protocol actually. If you don't do this, it lacks a basic guarantee, it
> can't work.


The required pattern from driver POV is
 - disable IRQ at device level
 - flush pending IRQs
 - deregister handler

Linux does the last two steps in reverse order as it synchronizes
descriptor changes against descriptor usage internally. I-pipe requires
the above ordering (unless my patch is used to harden
ipipe_virtualize_irq against NULL handlers). Still, the driver should
find step 2 and 3 as part of rtdm_irq_free. That's all.

> 
>>
>>>
>>> - track the IRQ line state from xnintr_enable/xnintr_disable routines,
>>> so that xnintr_detach can determine whether the call is legit. Of
>>> course, this also means that any attempt to take sideways to
>>> enable/disable nucleus managed interrupts at PIC level would break that
>>> logic, but doing so would be the root bug anyway.
>>>
>>> The advantage of doing so would be three-fold:
>>>
>>> - no pipeline code to acknowledge (or even perpetuate) the fact that MSI
>>> support is half working, half broken. We need to fix it properly, so
>>> that we can use it 100% reliably, from whatever context commonly allowed
>>> for enabling/disabling IRQs (and not "from root domain with IRQs on"
>>> only). Typically, I fail to see how one would cope with such limitation,
>>> if a real-time handler detects that some device is going wild and really
>>> needs to shut it down before the whole system crashes.
>>
>> MSIs are edge-triggered. Only broken hardware continuously sending bogus
>> messages can theoretically cause troubles. In practice (ie. in absence
>> of broken HW), we see a single spurious IRQ at worst.
>>
> 
> The driver should be able to work the same way whether it deals with MSI
> support, or is given pinned (level) IRQs.

Exactly my point, thus rtdm_irq_disable in driver code is a no-go.

> 
>>>
>>> - we enforce the API usage requirement to disable an interrupt line with
>>> rtdm_irq_disable(), before eventually detaching the last IRQ handler for
>>> it, which is common sense anyway.
>>
>> That's an easy-to-get-wrong API. It would apply to non-shared IRQs only
>> (aka MSIs). No-go IMHO.
> 
> - the software does not want to share MSIs, or something is quite wrong
> - xnintr_detach() is broken by design in its current implementation for
> shared handlers, because it leaves the decision to mask the IRQ line to
> the user. This could not work for multiple non-cooperating drivers
> sharing an IRQ, at least not in a raceless way.

It works (minus the missing synchronize_irq) if you disable the IRQ at
device level as drivers normally do.

> 
> So the issue is not that much about MSI vs pinned-type IRQs, it is
> rather about shared vs non-shared interrupts. What we want is a sane API
> and behavior for both types, which happens to fit the limited support we
> have now for MSI, not the other way around.
> 
> Therefore, until the MSI issue is fixed in the pipeline:
> 
> - non-shared mode users should call xnintr_disable(irq) then
> xnintr_detach(irq) explicitly. This is what everybody should do today
> already (via rtdm_irq_disable() or whatever), since xnintr_detach() does
> not disable the IRQ in the current implementation.

No one should call rtdm_irq_disable today for whatever IRQ unless in
very few exceptional cases. I think I should extend the documentation of
rtdm_irq_enabled/disable ASAP to clarify their exceptional nature.

> 
> - shared mode users should only call xnintr_detach(irq), which would
> then decide racelessly whether to disable the line. In essence, this
> could never affect MSIs.
> 
> When the MSI support is generally available with no limitation on
> masking/unmasking contexts, then xnintr_detach() could be updated to
> forcibly disable the detached IRQ in the non-shared case as well. At
> worst, the legacy code would end up calling xnintr_disable() uselessly
> when the pipeline fix is merged.
> 
> What we can do in the meantime, is 1) fixing xnintr_detach() in the
> shared case, 2) introducing an error check to prevent detaching
> non-shared IRQs that are still enabled. That won't have any impact on
> the partial MSI support we have today.
> 
>>
>>>
>>> - absolutely no change for people who currently rely on partial MSI
>>> support, provided they duly disable IRQ lines before detaching their
>>> last handler via the appropriate RTDM interface.
>>>
>>> Can we deal on this?
>>>
>>
>> Nope, don't think so. The only option I see (besides using my original
>> proposal of a dummy handler for deregistering - still much simpler than
>> the current patches) is to emulate MSI masking in the same run, thus
>> providing solutions for both issues.
> 
> The dummy handler solution is incomplete because it does not address the
> IRQ disabling issue in xnintr_detach() for shared IRQs. This is the gist
> of the patch I sent, actually.

It is complete in this regard as it does not touch the line state. It
actually implements the Linux pattern.

> 
> Besides, I'm still scratching my head: how would introducing a dummy
> handler help dealing with that sequence?
> 
> CPU0                                  CPU1
> 
> xnintr_disable => mask irq            <irq>

No disable here, mask must be at device level.

> xnintr_detach                         real-time handler
>       <install dummy handler>                 <touch device>
>                                       ipipe_end_irq() => unmask irq

xnarch_synchronize_irq

> 
> Then, on whichever CPU, we could receive a device IRQ as we touched the
> hardware before leaving the handler, passing that IRQ to the root domain
> which has no proper handler for it:
> 
> - lucky scenario: the device is in a state that won't cause it to kick
> yet another interrupt even if the last one was not serviced, and we just
> end up with a nifty spurious IRQ message in the kernel log.
> 
> - out of luck: we touched the device on CPU1 in a way that claims for a
> specific action by the software, which won't be able to perform it
> because it is now branching to the dummy handler. Whether we have some
> code to silence the device (and not only the IRQ line) before or after
> we detach the IRQ on CPU0 does not fix the issue, since we have no
> synchronization between the CPUs whatsoever.
> 

The driver is responsible for synchronizing its shutdown sequence
between the cleanup code and any of its in-flight IRQs. If that handler
reenables the IRQ that the shutdown code just disabled (all at device
level, of course), something is seriously broken - but not on Xenomai
side. As I explained before: That potential in-flight IRQ becomes
spurious at the point the drivers starts the shutdown, thus can safely
be ignored.

Jan

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Re: [Xenomai-help] kernel oopses when killing realtime task

Reply via email to