Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Hi,
>>>>
>>>> the watchdog is currently broken in trunk ("zombie [...] would not
>>>> die..."). In fact, it should also be broken in older versions, but only
>>>> recent thread termination rework made this visible.
>>>>
>>>> When a Xenomai CPU hog is caught by the watchdog,
>>>> xnpod_delete_thread is
>>>> invoked, causing the current thread to be set in zombie state and
>>>> scheduled out. But as its Linux mate still exist, hell breaks loose
>>>> once
>>>> Linux tries to get rid of it (the Xenomai zombie is scheduled in
>>>> again).
>>>> In short: calling xnpod_delete_thread(<self>) for a shadow thread is
>>>> not
>>>> working, probably never worked cleanly.
>>> Nak, it is a regression introduced by the scheduler changes in 2.5.x.
>>> We should detect _any_ shadow thread that schedules out in primary
>>> mode then regains control in secondary mode like we do in the 2.4.x
>>> series, not only _relaxing_ shadow threads. It is perfectly valid to
>>> have the Linux task orphaned from the deletion of its shadow TCB
>>> until Xenomai notices the issue and reaps it; problem was that such
>>> regression prevented the nucleus to get the memo.
>>>
>>> The following patch should fix the issue:
>>>
>>> Index: include/asm-generic/system.h
>>> ===================================================================
>>> --- include/asm-generic/system.h (revision 4676)
>>> +++ include/asm-generic/system.h (working copy)
>>> @@ -311,6 +311,11 @@
>>> return !!s;
>>> }
>>>
>>> +static inline int xnarch_root_domain_p(void)
>>> +{
>>> + return rthal_current_domain == rthal_root_domain;
>>> +}
>>> +
>>> #ifdef CONFIG_SMP
>>>
>>> #define xnlock_get(lock) __xnlock_get(lock XNLOCK_DBG_CONTEXT)
>>> Index: ksrc/nucleus/pod.c
>>> ===================================================================
>>> --- ksrc/nucleus/pod.c (revision 4676)
>>> +++ ksrc/nucleus/pod.c (working copy)
>>> @@ -2137,7 +2137,7 @@
>>> void __xnpod_schedule(struct xnsched *sched)
>>> {
>>> struct xnthread *prev, *next, *curr = sched->curr;
>>> - int zombie, switched = 0, need_resched, relaxing;
>>> + int zombie, switched = 0, need_resched, shadow;
>>> spl_t s;
>>>
>>> if (xnarch_escalate())
>>> @@ -2174,9 +2174,9 @@
>>> next, xnthread_name(next));
>>>
>>> #ifdef CONFIG_XENO_OPT_PERVASIVE
>>> - relaxing = xnthread_test_state(prev, XNRELAX);
>>> + shadow = xnthread_test_state(prev, XNSHADOW);
>>> #else
>>> - (void)relaxing;
>>> + (void)shadow;
>>> #endif /* CONFIG_XENO_OPT_PERVASIVE */
>>>
>>> if (xnthread_test_state(next, XNROOT)) {
>>> @@ -2204,12 +2204,18 @@
>>>
>>> #ifdef CONFIG_XENO_OPT_PERVASIVE
>>> /*
>>> - * Test whether we are relaxing a thread. In such a case, we
>>> - * are here the epilogue of Linux' schedule, and should skip
>>> - * xnpod_schedule epilogue.
>>> + * Test whether we transitioned from primary mode to secondary
>>> + * over a shadow thread. This may happen in two cases:
>>> + *
>>> + * 1) the shadow thread just relaxed.
>>> + * 2) the shadow TCB has just been deleted, in which case
>>> + * we have to reap the mated Linux side as well.
>>> + *
>>> + * In both cases, we are running over the epilogue of Linux's
>>> + * schedule, and should skip our epilogue code.
>>> */
>>> - if (relaxing)
>>> - goto relax_epilogue;
>>> + if (shadow && xnarch_root_domain_p())
>>> + goto shadow_epilogue;
>>> #endif /* CONFIG_XENO_OPT_PERVASIVE */
>>>
>>> switched = 1;
>>> @@ -2252,7 +2258,7 @@
>>> return;
>>>
>>> #ifdef CONFIG_XENO_OPT_PERVASIVE
>>> - relax_epilogue:
>>> + shadow_epilogue:
>>> {
>>> spl_t ignored;
>>
>> Finally makes sense and works (but your posting was corrupted). Great.
>>
>>>> There are basically two approaches to fix it: The first one is to
>>>> find a
>>>> different way to kill (or only suspend?)
>>> Suspending the hog won't work, particularly when GDB is involved,
>>> because a pending non-lethal Linux signal may cause the suspended
>>> shadow to resume immediately for processing the signal, therefore
>>> defeating the purpose of the watchdog, leading to an infinite loop.
>>> This is why we moved from suspension to deletion upon watchdog
>>> trigger in 2.3 (2.2 used to suspend only).
>>
>> Yes, that became clear to me in the meantime, too.
>>
>>> the current shadow thread when
>>>> the watchdog strikes. The second one brought me to another issue: Raise
>>>> SIGKILL for the current thread and make sure that it can be
>>>> processed by
>>>> Linux (e.g. via xnpod_suspend_thread(<cpu-hog>). Unfortunately,
>>>> there is
>>>> no way to force a shadow thread into secondary mode to handle pending
>>>> Linux signals unless that thread issues a syscall once in a while. And
>>>> that raises the question if we shouldn't improve this as well while we
>>>> are on it.
>>>>
>>>> Granted, non-broken Xenomai user space threads always issue frequent
>>>> syscalls, otherwise the system would starve (and the watchdog would
>>>> come
>>>> around). On the other hand, delaying signals till syscall prologues is
>>>> different from plain Linux behaviour...
>>>>
>>>> Comments, ideas?
>>>>
>>> We probably need a two-stage approach: first record the thread was
>>> bumped out and suspend it from the watchdog handler to give Linux a
>>> chance to run again, then finish the work, killing it for good, next
>>> time the root thread is scheduled in on the same CPU.
>>
>> That confuses me again: The watchdog issue is solved now, no? We are
>> only left with the scenario of breaking out of a user space loop of some
>> Xenomai thread via a Linux signal (which implies SMP - otherwise there
>> is no chance to raise the signal...).
>>
>
> If you first suspend the hog, then send it a lethal signal, you solve
> both issues: first Linux is allowed to run eventually, then your task
> won't be able to resume running the faulty code, but solely to process
> SIGKILL, which can be made pending early enough because the nucleus
> decides when Linux resumes.
I'm not interested in SIGKILL here, rather in SIGSTOP to do debugging.
That is currently impossible.
>
>> Meanwhile I played with some light-weight approach to relax a thread
>> that received a signal (according to do_sigwake_event). Worked, but only
>> once due to a limitation (if not bug) of I-pipe x86: in __ipipe_run_isr,
>> it does not handle the case that a non-root handler may alter the
>> current domain, causing corruptions to the IPIPE_SYNC_FLAG states of the
>> involved domains.
>
> It is not a bug, this is wanted. ISR must neither change the current
> domain nor migrate CPU; allowing this would open Pandora's box.
OK, then please elaborate on this a bit more in the adeos-main thread
and explain why __ipipe_sync_stage currently reloads the domain.
Jan
--
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
_______________________________________________
Xenomai-core mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-core