On 02/11/2014 03:43 PM, Jan Kiszka wrote:
On 2014-02-11 15:30, Philippe Gerum wrote:
On 02/10/2014 05:40 PM, Jan Kiszka wrote:
On 2014-02-08 17:00, Philippe Gerum wrote:
On 02/08/2014 03:44 PM, Gilles Chanteperdrix wrote:
On 02/08/2014 10:57 AM, Philippe Gerum wrote:
there should be no point in instantiating scheduler slots for
non-RT CPUs anymore, I agree.

Are you sure this will not break xnintr_core_clock_handler? On some
architectures, the tick handler is called on all cpus, and simply
forwards the host tick when a cpu is not supported, but in order to do
this, it seems to use the xnsched structure.


Yes, the change is incomplete. Either we initialize the ->cpu member in
all slots, including the non-RT ones, or we will need something along
these lines:

diff --git a/kernel/cobalt/intr.c b/kernel/cobalt/intr.c
index b162d22..4758c6b 100644
--- a/kernel/cobalt/intr.c
+++ b/kernel/cobalt/intr.c
@@ -94,10 +94,10 @@ void xnintr_host_tick(struct xnsched *sched) /*
Interrupts off. */
    */
   void xnintr_core_clock_handler(void)
   {
-    struct xnsched *sched = xnsched_current();
-    int cpu  __maybe_unused = xnsched_cpu(sched);
+    int cpu = ipipe_processor_id();
       struct xnirqstat *statp;
       xnstat_exectime_t *prev;
+    struct xnsched *sched;

       if (!xnsched_supported_cpu(cpu)) {
   #ifdef XNARCH_HOST_TICK_IRQ
@@ -106,6 +106,7 @@ void xnintr_core_clock_handler(void)
           return;
       }

+    sched = xnsched_struct(cpu);
       statp = __this_cpu_ptr(nktimer.stats);
       prev = xnstat_exectime_switch(sched, &statp->account);
       xnstat_counter_inc(&statp->hits);


There is more:

[    1.963540] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000480
[    1.966360] IP: [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.967482] PGD 0
[    1.970784] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    1.970784] Modules linked in:
[    1.970784] CPU: 3 PID: 53 Comm: init Not tainted 3.10.28+ #102
[    1.970784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[    1.970784] task: ffff88003b106300 ti: ffff88003ace8000 task.ti:
ffff88003ace8000
[    1.970784] RIP: 0010:[<ffffffff81123bdf>]  [<ffffffff81123bdf>]
xnshadow_private_get+0x1f/0x40
[    1.970784] RSP: 0018:ffff88003acebb78  EFLAGS: 00010246
[    1.970784] RAX: 0000000000000000 RBX: ffff88003acd78c0 RCX:
ffffffff81671250
[    1.970784] RDX: 0000000000000000 RSI: ffffffff818504f8 RDI:
0000000000000000
[    1.970784] RBP: ffff88003acebb78 R08: 0000000000000000 R09:
00000000002a1220
[    1.970784] R10: 0000000000000003 R11: ffff88003e000000 R12:
ffff88003acebfd8
[    1.970784] R13: ffff88003e00da00 R14: 0000000000000000 R15:
0000000000000005
[    1.970784] FS:  0000000000000000(0000) GS:ffff88003e000000(0000)
knlGS:0000000000000000
[    1.970784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.970784] CR2: 0000000000000480 CR3: 000000003ac54000 CR4:
00000000000006e0
[    1.970784] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    1.970784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[    1.970784] I-pipe domain Linux
[    1.970784] Stack:
[    1.970784]  ffff88003acebc08 ffffffff811275cf ffff88003acebba8
ffffffff81374edf
[    1.970784]  ffff88003acd78c0 0000000000000079 ffff88003acebc58
ffffffff8119ac39
[    1.970784]  0000000000000000 0000000000000082 0000000000000246
0000000000016a90
[    1.970784] Call Trace:
[    1.970784]  [<ffffffff811275cf>] ipipe_kevent_hook+0x72f/0x10b0
[    1.970784]  [<ffffffff81374edf>] ? __percpu_counter_add+0x5f/0x80
[    1.970784]  [<ffffffff8119ac39>] ? exit_mmap+0x139/0x170
[    1.970784]  [<ffffffff810d8d9c>] __ipipe_notify_kevent+0x9c/0x130
[    1.970784]  [<ffffffff8103f4ff>] mmput+0x6f/0x120
[    1.970784]  [<ffffffff811cd20c>] flush_old_exec+0x32c/0x770
[    1.970784]  [<ffffffff8121982f>] load_elf_binary+0x31f/0x19f0
[    1.970784]  [<ffffffff81218d38>] ? load_script+0x18/0x260
[    1.970784]  [<ffffffff81177ce9>] ? put_page+0x9/0x50
[    1.970784]  [<ffffffff811cc052>] search_binary_handler+0x142/0x3a0
[    1.970784]  [<ffffffff81219510>] ? elf_map+0x120/0x120
[    1.970784]  [<ffffffff811cdfec>] do_execve_common+0x4dc/0x590
[    1.970784]  [<ffffffff811ce0d7>] do_execve+0x37/0x40
[    1.970784]  [<ffffffff811ce35d>] SyS_execve+0x3d/0x60
[    1.970784]  [<ffffffff8165b7a9>] stub_execve+0x69/0xa0
[    1.970784] Code: eb d6 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5
e8 77 71 53 00 48 c7 c0 80 12 2a 00 65 48 03 04 25 30 cd 00 00 48 8b
50 10 31 c0 <48> f7 82 80 04 00 00 00 00 05 00 75 04 c9 c3 66 90 e8 6b
ff ff
[    1.970784] RIP  [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.970784]  RSP <ffff88003acebb78>
[    1.970784] CR2: 0000000000000480
[    2.042762] ---[ end trace 5020365fdf0eba4b ]---

Taken with current forge next. Seems like we try to obtain
xnsched_current_thread() on an unsupported CPU.

So, which path to take? Partially initialize sched or change the places
that try to use it over unsupported CPUs?


I'll be digging this soon. This uncovers a general issue with the
restricted rt CPU set: a relaxed thread which moved to an unsupported rt
CPU may well trigger pipelined events on this CPU (faults, linux
syscalls, etc).

That must not crash us but should remain an exceptional case: shadowed
threads should have a Linux affinity mask that excludes unsupported CPUs.

However we resolve it, one thing should be kept in mind: unsupported
CPUs should not take Xenomai locks shared with the RT CPUs. That's the
key point about this mask.


The more I think of if, the more I'm convinced that 72ab52f is wrong. Logically speaking, we do have a scheduling state for _every_ CPU, including non-RT ones, even if that state does not exceed maintaining the root thread forever running on those CPUs.

--
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Reply via email to