Philippe Gerum wrote:
> Philippe Gerum wrote:
>> Jan Kiszka wrote:
>>
>>> Gilles Chanteperdrix wrote:
>>>
>>>> Jeroen Van den Keybus wrote:
>>>> > Hello,
>>>> > > > I'm currently not at a level to participate in your
>>>> discussion. Although I'm
>>>> > willing to supply you with stresstests, I would nevertheless like
>>>> to learn
>>>> > more from task migration as this debugging session proceeds. In
>>>> order to do
>>>> > so, please confirm the following statements or indicate where I
>>>> went wrong.
>>>> > I hope others may learn from this as well.
>>>> > > xn_shadow_harden(): This is called whenever a Xenomai thread
>>>> performs a
>>>> > Linux (root domain) system call (notified by Adeos ?).
>>>> xnshadow_harden() is called whenever a thread running in secondary
>>>> mode (that is, running as a regular Linux thread, handled by Linux
>>>> scheduler) is switching to primary mode (where it will run as a Xenomai
>>>> thread, handled by Xenomai scheduler). Migrations occur for some system
>>>> calls. More precisely, Xenomai skin system calls tables associates a
>>>> few
>>>> flags with each system call, and some of these flags cause migration of
>>>> the caller when it issues the system call.
>>>>
>>>> Each Xenomai user-space thread has two contexts, a regular Linux
>>>> thread context, and a Xenomai thread called "shadow" thread. Both
>>>> contexts share the same stack and program counter, so that at any time,
>>>> at least one of the two contexts is seen as suspended by the scheduler
>>>> which handles it.
>>>>
>>>> Before xnshadow_harden is called, the Linux thread is running, and its
>>>> shadow is seen in suspended state with XNRELAX bit by Xenomai
>>>> scheduler. After xnshadow_harden, the Linux context is seen suspended
>>>> with INTERRUPTIBLE state by Linux scheduler, and its shadow is seen as
>>>> running by Xenomai scheduler.
>>>>
>>>> The migrating thread
>>>> > (nRT) is marked INTERRUPTIBLE and run by the Linux kernel
>>>> > wake_up_interruptible_sync() call. Is this thread actually run or
>>>> does it
>>>> > merely put the thread in some Linux to-do list (I assumed the
>>>> first case) ?
>>>>
>>>> Here, I am not sure, but it seems that when calling
>>>> wake_up_interruptible_sync the woken up task is put in the current CPU
>>>> runqueue, and this task (i.e. the gatekeeper), will not run until the
>>>> current thread (i.e. the thread running xnshadow_harden) marks
>>>> itself as
>>>> suspended and calls schedule(). Maybe, marking the running thread as
>>>
>>>
>>>
>>> Depends on CONFIG_PREEMPT. If set, we get a preempt_schedule already
>>> here - and a switch if the prio of the woken up task is higher.
>>>
>>> BTW, an easy way to enforce the current trouble is to remove the "_sync"
>>> from wake_up_interruptible. As I understand it this _sync is just an
>>> optimisation hint for Linux to avoid needless scheduler runs.
>>>
>>
>> You could not guarantee the following execution sequence doing so
>> either, i.e.
>>
>> 1- current wakes up the gatekeeper
>> 2- current goes sleeping to exit the Linux runqueue in schedule()
>> 3- the gatekeeper resumes the shadow-side of the old current
>>
>> The point is all about making 100% sure that current is going to be
>> unlinked from the Linux runqueue before the gatekeeper processes the
>> resumption request, whatever event the kernel is processing
>> asynchronously in the meantime. This is the reason why, as you already
>> noticed, preempt_schedule_irq() nicely breaks our toy by stealing the
>> CPU from the hardening thread whilst keeping it linked to the
>> runqueue: upon return from such preemption, the gatekeeper might have
>> run already,  hence the newly hardened thread ends up being seen as
>> runnable by both the Linux and Xeno schedulers. Rainy day indeed.
>>
>> We could rely on giving "current" the highest SCHED_FIFO priority in
>> xnshadow_harden() before waking up the gk, until the gk eventually
>> promotes it to the Xenomai scheduling mode and downgrades this
>> priority back to normal, but we would pay additional latencies induced
>> by each aborted rescheduling attempt that may occur during the atomic
>> path we want to enforce.
>>
>> The other way is to make sure that no in-kernel preemption of the
>> hardening task could occur after step 1) and until step 2) is
>> performed, given that we cannot currently call schedule() with
>> interrupts or preemption off. I'm on it.
>>
> 
> Could anyone interested in this issue test the following couple of patches?
> 
> atomic-switch-state.patch is to be applied against Adeos-1.1-03/x86 for
> 2.6.15
> atomic-wakeup-and-schedule.patch is to be applied against Xeno 2.1-rc2
> 
> Both patches are needed to fix the issue.
> 
> TIA,
> 

Looks good. I tried Jeroen's test-case and I was not able to reproduce
the crash anymore. I think it's time for a new ipipe-release. ;)

At this chance: any comments on the panic-freeze extension for the
tracer? I need to rework the Xenomai patch, but the ipipe side should be
ready for merge.

Jan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core

Reply via email to