On 2011-07-16 10:52, Philippe Gerum wrote:
On Sat, 2011-07-16 at 10:13 +0200, Jan Kiszka wrote:
On 2011-07-15 15:10, Jan Kiszka wrote:
But... right now it looks like we found our primary regression:
"nucleus/shadow: shorten the uninterruptible path to secondary mode".
It opens a short windows during relax where the migrated task may be
active under both schedulers. We are currently evaluating a revert
(looks good so far), and I need to work out my theory in more
details.

Looks like this commit just made a long-standing flaw in Xenomai's
interrupt handling more visible: We reschedule over the interrupt stack
in the Xenomai interrupt handler tails, at least on x86-64. Not sure if
other archs have interrupt stacks, the point is Xenomai's design wrongly
assumes there are no such things.

Fortunately, no, this is not a design issue, no such assumption was ever
made, but the Xenomai core expects this to be handled on a per-arch
basis with the interrupt pipeline.

And that's already the problem: If Linux uses interrupt stacks, relying on ipipe to disable this during Xenomai interrupt handler execution is at best a workaround. A fragile one unless you increase the pre-thread stack size by the size of the interrupt stack. Lacking support for a generic rescheduling hook became a problem by the time Linux introduced interrupt threads.

As you pointed out, there is no way
to handle this via some generic Xenomai-only support.

ppc64 now has separate interrupt stacks, which is why I disabled
IRQSTACKS which became the builtin default at some point. Blackfin goes
through a Xenomai-defined irq tail handler as well, because it may not
reschedule over nested interrupt stacks.

How does this arch prevent that xnpod_schedule in the generic interrupt handler tail does its normal work?

Fact is that such pending
problem with x86_64 was overlooked since day #1 by /me.

  We were lucky so far that the values
saved on this shared stack were apparently "compatible", means we were
overwriting them with identical or harmless values. But that's no longer
true when interrupts are hitting us in the xnpod_suspend_thread path of
a relaxing shadow.


Makes sense. It would be better to find a solution that does not make
the relax path uninterruptible again for a significant amount of time.
On low end platforms we support (i.e. non-x86* mainly), this causes
obvious latency spots.

I agree. Conceptually, the interruptible relaxation should be safe now after recent fixes.


Likely the only possible fix is establishing a reschedule hook for
Xenomai in the interrupt exit path after the original stack is restored
- - just like Linux works. Requires changes to both ipipe and Xenomai
unfortunately.

__ipipe_run_irqtail() is in the I-pipe core for such purpose. If
instantiated properly for x86_64, and paired with xnarch_escalate() for
that arch as well, it could be an option for running the rescheduling
procedure when safe.

Nope, that doesn't work. The stack is switched later in the return path in entry_64.S. We need a hook there, ideally a conditional one, controlled by some per-cpu variable that is set by Xenomai on return from its interrupt handlers to signal the rescheduling need.

Jan

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to