Re: [Xenomai] forge: Panic with reduced supported_cpus mask

Philippe Gerum Mon, 17 Feb 2014 09:31:25 -0800

On 02/17/2014 05:29 PM, Jan Kiszka wrote:

On 2014-02-13 16:26, Philippe Gerum wrote:

On 02/10/2014 05:40 PM, Jan Kiszka wrote:

On 2014-02-08 17:00, Philippe Gerum wrote:

On 02/08/2014 03:44 PM, Gilles Chanteperdrix wrote:

On 02/08/2014 10:57 AM, Philippe Gerum wrote:

there should be no point in instantiating scheduler slots for
non-RT CPUs anymore, I agree.


Are you sure this will not break xnintr_core_clock_handler? On some
architectures, the tick handler is called on all cpus, and simply
forwards the host tick when a cpu is not supported, but in order to do
this, it seems to use the xnsched structure.


Yes, the change is incomplete. Either we initialize the ->cpu member in
all slots, including the non-RT ones, or we will need something along
these lines:

diff --git a/kernel/cobalt/intr.c b/kernel/cobalt/intr.c
index b162d22..4758c6b 100644
--- a/kernel/cobalt/intr.c
+++ b/kernel/cobalt/intr.c
@@ -94,10 +94,10 @@ void xnintr_host_tick(struct xnsched *sched) /*
Interrupts off. */
    */
   void xnintr_core_clock_handler(void)
   {
-    struct xnsched *sched = xnsched_current();
-    int cpu  __maybe_unused = xnsched_cpu(sched);
+    int cpu = ipipe_processor_id();
       struct xnirqstat *statp;
       xnstat_exectime_t *prev;
+    struct xnsched *sched;

       if (!xnsched_supported_cpu(cpu)) {
   #ifdef XNARCH_HOST_TICK_IRQ
@@ -106,6 +106,7 @@ void xnintr_core_clock_handler(void)
           return;
       }

+    sched = xnsched_struct(cpu);
       statp = __this_cpu_ptr(nktimer.stats);
       prev = xnstat_exectime_switch(sched, &statp->account);
       xnstat_counter_inc(&statp->hits);


There is more:

[    1.963540] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000480
[    1.966360] IP: [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.967482] PGD 0
[    1.970784] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    1.970784] Modules linked in:
[    1.970784] CPU: 3 PID: 53 Comm: init Not tainted 3.10.28+ #102
[    1.970784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[    1.970784] task: ffff88003b106300 ti: ffff88003ace8000 task.ti:
ffff88003ace8000
[    1.970784] RIP: 0010:[<ffffffff81123bdf>]  [<ffffffff81123bdf>]
xnshadow_private_get+0x1f/0x40
[    1.970784] RSP: 0018:ffff88003acebb78  EFLAGS: 00010246
[    1.970784] RAX: 0000000000000000 RBX: ffff88003acd78c0 RCX:
ffffffff81671250
[    1.970784] RDX: 0000000000000000 RSI: ffffffff818504f8 RDI:
0000000000000000
[    1.970784] RBP: ffff88003acebb78 R08: 0000000000000000 R09:
00000000002a1220
[    1.970784] R10: 0000000000000003 R11: ffff88003e000000 R12:
ffff88003acebfd8
[    1.970784] R13: ffff88003e00da00 R14: 0000000000000000 R15:
0000000000000005
[    1.970784] FS:  0000000000000000(0000) GS:ffff88003e000000(0000)
knlGS:0000000000000000
[    1.970784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.970784] CR2: 0000000000000480 CR3: 000000003ac54000 CR4:
00000000000006e0
[    1.970784] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    1.970784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[    1.970784] I-pipe domain Linux
[    1.970784] Stack:
[    1.970784]  ffff88003acebc08 ffffffff811275cf ffff88003acebba8
ffffffff81374edf
[    1.970784]  ffff88003acd78c0 0000000000000079 ffff88003acebc58
ffffffff8119ac39
[    1.970784]  0000000000000000 0000000000000082 0000000000000246
0000000000016a90
[    1.970784] Call Trace:
[    1.970784]  [<ffffffff811275cf>] ipipe_kevent_hook+0x72f/0x10b0
[    1.970784]  [<ffffffff81374edf>] ? __percpu_counter_add+0x5f/0x80
[    1.970784]  [<ffffffff8119ac39>] ? exit_mmap+0x139/0x170
[    1.970784]  [<ffffffff810d8d9c>] __ipipe_notify_kevent+0x9c/0x130
[    1.970784]  [<ffffffff8103f4ff>] mmput+0x6f/0x120
[    1.970784]  [<ffffffff811cd20c>] flush_old_exec+0x32c/0x770
[    1.970784]  [<ffffffff8121982f>] load_elf_binary+0x31f/0x19f0
[    1.970784]  [<ffffffff81218d38>] ? load_script+0x18/0x260
[    1.970784]  [<ffffffff81177ce9>] ? put_page+0x9/0x50
[    1.970784]  [<ffffffff811cc052>] search_binary_handler+0x142/0x3a0
[    1.970784]  [<ffffffff81219510>] ? elf_map+0x120/0x120
[    1.970784]  [<ffffffff811cdfec>] do_execve_common+0x4dc/0x590
[    1.970784]  [<ffffffff811ce0d7>] do_execve+0x37/0x40
[    1.970784]  [<ffffffff811ce35d>] SyS_execve+0x3d/0x60
[    1.970784]  [<ffffffff8165b7a9>] stub_execve+0x69/0xa0
[    1.970784] Code: eb d6 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
e8 77 71 53 00 48 c7 c0 80 12 2a 00 65 48 03 04 25 30 cd 00 00 48 8b
50 10 31 c0 <48> f7 82 80 04 00 00 00 00 05 00 75 04 c9 c3 66 90 e8 6b
ff ff
[    1.970784] RIP  [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.970784]  RSP <ffff88003acebb78>
[    1.970784] CR2: 0000000000000480
[    2.042762] ---[ end trace 5020365fdf0eba4b ]---

Taken with current forge next. Seems like we try to obtain
xnsched_current_thread() on an unsupported CPU.

So, which path to take? Partially initialize sched or change the places
that try to use it over unsupported CPUs?


Due to the not-so-few exception cases which exist, I eventually came to
the conclusion that we need to have a valid scheduler slot for each
online CPU, but we should detect and act upon inconsistent user
behavior, like moving a thread deliberately to a non-RT processor.

This way, we can still handle situations that might arise because of
user error in a graceful manner (kernel-wise, I mean), but we also
complain loudly to the offender when that happens (well, actually we
kick it out now).

Incidentally, a bug in the pipeline core was uncovered that prevented
Xenomai from skipping CPU0 from the realtime processor set, causing the
kernel to hang at boot up when attempting to do so. This bug affects any
Xenomai release using the newest pipeline core architecture (i.e. any
release which defines CONFIG_IPIPE_CORE).

http://git.xenomai.org/ipipe.git/commit/?h=ipipe-3.10&id=ce5bfa8a3282c8f97d5ec4b36802333b29e8c241


Other fixes in the -next branch from the 3.x tree. I still have some
testing ahead wrt to restricting the CPU set in -forge, but the
situation looks better already.


Seems to work now - except for the bogus corner case of
supported_cpus=0: Xenomai gets half-initialized and then locks up.


This case is detected, so we likely have a broken error path.


I'm debugging "do_IRQ: 3.52 No irq handler for vector (irq -1)" while
offlining excluded CPUs and was trying to disable Xenomai this way. Now
I found "disable", and setting it to 1 triggers the same issue:

[    1.917634] [Xenomai] stuck on lock ffffffff81ead890
[    1.917634]            waiter = /data/linux-ipipe/kernel/xenomai/heap.c:596 
(xnheap_alloc(), CPU #2)
[    1.917634]            owner  = (null):0 ((null)(), CPU #0)
[    1.929991]  0000000000000254 ffff88003fa57d88 ffffffff8113bc36 
0000000000000000
[    1.929995]  0000000000000000 0000000000000000 ffffffff00000000 
ffffffff81671b35
[    1.929998]  0000000000000254 0000000000000002 ffffffff81ead890 
0000000000000000
[    1.930000] Call Trace:
[    1.930326]  [<ffffffff8113bc36>] xnlock_dbg_spinning+0x86/0x90
[    1.930332]  [<ffffffff81114bb9>] __xnlock_spin+0x69/0x90
[    1.930336]  [<ffffffff81111576>] xnheap_alloc+0x3c6/0x450
[    1.930340]  [<ffffffff8113b2c0>] xnmap_create+0x60/0xd0
[    1.930345]  [<ffffffff81535207>] iddp_init+0x17/0x40
[    1.930351]  [<ffffffff81cdac1d>] __rtipc_init+0x22/0x40
[    1.930354]  [<ffffffff81cdabfb>] ? __switchtest_init+0x5c/0x5c
[    1.930359]  [<ffffffff81000272>] do_one_initcall+0x42/0x1a0
[    1.930364]  [<ffffffff81c92f84>] kernel_init_freeable+0x142/0x1d7
[    1.930367]  [<ffffffff81c92780>] ? loglevel+0x31/0x31
[    1.930372]  [<ffffffff8163c250>] ? rest_init+0xa0/0xa0
[    1.930375]  [<ffffffff8163c25e>] kernel_init+0xe/0xf0
[    1.930380]  [<ffffffff8165b5ed>] ret_from_fork+0x7d/0xb0
[    1.930383]  [<ffffffff8163c250>] ? rest_init+0xa0/0xa0

Built-in RT-IPC... We seem to lack some "is available" check for xnheap
or even something more general here.

Yes, I don't think the disabling is handled in any of the new initcallsthat may have popped up recently. Ok, will check.


--
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] forge: Panic with reduced supported_cpus mask

Reply via email to