On Fri, 2010-10-29 at 14:46 +0200, Philippe Gerum wrote: > On Fri, 2010-10-29 at 14:09 +0200, Jan Kiszka wrote: > > Am 29.10.2010 14:00, Philippe Gerum wrote: > > > On Fri, 2010-10-29 at 11:05 +0200, Jan Kiszka wrote: > > >> Am 29.10.2010 10:27, Philippe Gerum wrote: > > >>> On Fri, 2010-10-29 at 09:00 +0200, Jan Kiszka wrote: > > >>>> Am 28.10.2010 21:34, Philippe Gerum wrote: > > >>>>> On Thu, 2010-10-28 at 21:15 +0200, Gilles Chanteperdrix wrote: > > >>>>>> Jan Kiszka wrote: > > >>>>>>> Gilles, > > >>>>>>> > > >>>>>>> I happened to come across rthal_mark_irq_disabled/enabled on arm. On > > >>>>>>> first glance, it looks like these helpers manipulate > > >>>>>>> irq_desc::status > > >>>>>>> non-atomically, i.e. without holding irq_desc::lock. Isn't this > > >>>>>>> fragile? > > >>>>>> > > >>>>>> I have no idea. How do the other architectures do? As far as I know, > > >>>>>> this code has been copied from there. > > >>>>> > > >>>>> Other archs do the same, simply because once an irq is managed by the > > >>>>> hal, it may not be shared in any way with the regular kernel. So > > >>>>> locking > > >>>>> is pointless. > > >>>> > > >>>> Indeed, I missed that all the other archs have this uninlined in hal.c. > > >>>> > > >>>> However, this leaves at least a race between xnintr_disable/enable and > > >>>> XN_ISR_PROPAGATE (ie. the related Linux path) behind. > > >>> > > >>> I can't see why XN_ISR_PROPAGATE would be involved here. This service > > >>> pends an interrupt in the pipeline log. > > >> > > >> And this finally lets Linux code run that fiddles with irq_desc::status > > >> as well - potentially in parallel to an unsyncrhonized > > >> xnintr_irq_disable in a different context. That's the problem. > > > > > > Propagation happens in primary domain. When is this supposed to conflict > > > on the same CPU with linux? > > > > The propagation triggers the delivery of this IRQ to the Linux domain, > > thus at some point there will be Linux accessing the descriptor while > > there might be xnintr_irq_enable/disable running on some other CPU (or > > it was preempted at the wrong point on the very same CPU). > > The point is that XN_ISR_PROPAGATE, as a mean to force sharing of an IRQ > between both domains is plain wrong. Remove this, and no conflict > remains; this is what needs to be addressed. The potential issue between > xnintr_enable/disable and the hal routines does not exist, if those > callers handle locking properly. >
In any case, I don't think we could accept that sharing, so flipping the bits in the hal is in fact pointless. To match the linux locking, we should iron the irq_desc::lock, which we won't since this would cause massive jittery. We should stick to the basic logic: no sharing, therefore no tracking need for the irqflags. I'll kill XN_ISR_PROPAGATE in forge at some point, for sure. > > > > > > > >> > > >>> > > >>>> Not sure if it > > >>>> matters practically - but risking silent breakage for this micro > > >>>> optimization? > > >>> > > >>> It was not meant as an optimization; we may not grab the linux > > >>> descriptor lock in this context because we may enter it in primary mode. > > >> > > >> Oh, that lock isn't harden as I somehow assumed. This of course > > >> complicates things. > > >> > > >>> > > >>>> Is disabling/enabling really in the highly > > >>>> latency-critical anywhere? Otherwise, I would suggest to just plug this > > >>>> by adding the intended lock for this field. > > >>> > > >>> The caller is expected to manage locking; AFAICS the only one who does > > >>> not is the RTAI skin, which is obsolete and removed in 2.6.x, so no big > > >>> deal. > > >> > > >> The problem is that IRQ forwarding to Linux may let this manipulation > > >> race with plain Linux code, thus has to synchronize with it. It is a > > >> corner case (no one is supposed to pass IRQs down blindly anyway - if at > > >> all), but it should at least be documented ("Don't use disable/enable > > >> together with IRQ forwarding unless you acquire the descriptor lock > > >> properly!"). > > >> > > >> BTW, do we need to track the descriptor state in primary mode at all? > > >> > > > > > > That is the real issue. I don't see the point of doing this with the > > > current kernel code. > > > > Do we need to keep the status in synch with the hardware state for the > > case Linux may take over the descriptor again? Or will Linux test the > > state when processing a forwarded IRQ? These are the two potential > > scenarios that come to my mind. For former could be deferred, but the > > latter would be critical again. > > > > Jan > > > > -- > Philippe. > > > > _______________________________________________ > Xenomai-core mailing list > Xenomai-core@gna.org > https://mail.gna.org/listinfo/xenomai-core -- Philippe. _______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core