Re: [Xenomai] forge: Panic with reduced supported_cpus mask

Philippe Gerum Thu, 13 Feb 2014 07:29:43 -0800

On 02/10/2014 05:40 PM, Jan Kiszka wrote:

On 2014-02-08 17:00, Philippe Gerum wrote:

On 02/08/2014 03:44 PM, Gilles Chanteperdrix wrote:

On 02/08/2014 10:57 AM, Philippe Gerum wrote:

there should be no point in instantiating scheduler slots for
non-RT CPUs anymore, I agree.


Are you sure this will not break xnintr_core_clock_handler? On some
architectures, the tick handler is called on all cpus, and simply
forwards the host tick when a cpu is not supported, but in order to do
this, it seems to use the xnsched structure.


Yes, the change is incomplete. Either we initialize the ->cpu member in
all slots, including the non-RT ones, or we will need something along
these lines:

diff --git a/kernel/cobalt/intr.c b/kernel/cobalt/intr.c
index b162d22..4758c6b 100644
--- a/kernel/cobalt/intr.c
+++ b/kernel/cobalt/intr.c
@@ -94,10 +94,10 @@ void xnintr_host_tick(struct xnsched *sched) /*
Interrupts off. */
   */
  void xnintr_core_clock_handler(void)
  {
-    struct xnsched *sched = xnsched_current();
-    int cpu  __maybe_unused = xnsched_cpu(sched);
+    int cpu = ipipe_processor_id();
      struct xnirqstat *statp;
      xnstat_exectime_t *prev;
+    struct xnsched *sched;

      if (!xnsched_supported_cpu(cpu)) {
  #ifdef XNARCH_HOST_TICK_IRQ
@@ -106,6 +106,7 @@ void xnintr_core_clock_handler(void)
          return;
      }

+    sched = xnsched_struct(cpu);
      statp = __this_cpu_ptr(nktimer.stats);
      prev = xnstat_exectime_switch(sched, &statp->account);
      xnstat_counter_inc(&statp->hits);


There is more:

[    1.963540] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000480
[    1.966360] IP: [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.967482] PGD 0
[    1.970784] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[    1.970784] Modules linked in:
[    1.970784] CPU: 3 PID: 53 Comm: init Not tainted 3.10.28+ #102
[    1.970784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011
[    1.970784] task: ffff88003b106300 ti: ffff88003ace8000 task.ti: 
ffff88003ace8000
[    1.970784] RIP: 0010:[<ffffffff81123bdf>]  [<ffffffff81123bdf>] 
xnshadow_private_get+0x1f/0x40
[    1.970784] RSP: 0018:ffff88003acebb78  EFLAGS: 00010246
[    1.970784] RAX: 0000000000000000 RBX: ffff88003acd78c0 RCX: ffffffff81671250
[    1.970784] RDX: 0000000000000000 RSI: ffffffff818504f8 RDI: 0000000000000000
[    1.970784] RBP: ffff88003acebb78 R08: 0000000000000000 R09: 00000000002a1220
[    1.970784] R10: 0000000000000003 R11: ffff88003e000000 R12: ffff88003acebfd8
[    1.970784] R13: ffff88003e00da00 R14: 0000000000000000 R15: 0000000000000005
[    1.970784] FS:  0000000000000000(0000) GS:ffff88003e000000(0000) 
knlGS:0000000000000000
[    1.970784] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.970784] CR2: 0000000000000480 CR3: 000000003ac54000 CR4: 00000000000006e0
[    1.970784] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    1.970784] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    1.970784] I-pipe domain Linux
[    1.970784] Stack:
[    1.970784]  ffff88003acebc08 ffffffff811275cf ffff88003acebba8 
ffffffff81374edf
[    1.970784]  ffff88003acd78c0 0000000000000079 ffff88003acebc58 
ffffffff8119ac39
[    1.970784]  0000000000000000 0000000000000082 0000000000000246 
0000000000016a90
[    1.970784] Call Trace:
[    1.970784]  [<ffffffff811275cf>] ipipe_kevent_hook+0x72f/0x10b0
[    1.970784]  [<ffffffff81374edf>] ? __percpu_counter_add+0x5f/0x80
[    1.970784]  [<ffffffff8119ac39>] ? exit_mmap+0x139/0x170
[    1.970784]  [<ffffffff810d8d9c>] __ipipe_notify_kevent+0x9c/0x130
[    1.970784]  [<ffffffff8103f4ff>] mmput+0x6f/0x120
[    1.970784]  [<ffffffff811cd20c>] flush_old_exec+0x32c/0x770
[    1.970784]  [<ffffffff8121982f>] load_elf_binary+0x31f/0x19f0
[    1.970784]  [<ffffffff81218d38>] ? load_script+0x18/0x260
[    1.970784]  [<ffffffff81177ce9>] ? put_page+0x9/0x50
[    1.970784]  [<ffffffff811cc052>] search_binary_handler+0x142/0x3a0
[    1.970784]  [<ffffffff81219510>] ? elf_map+0x120/0x120
[    1.970784]  [<ffffffff811cdfec>] do_execve_common+0x4dc/0x590
[    1.970784]  [<ffffffff811ce0d7>] do_execve+0x37/0x40
[    1.970784]  [<ffffffff811ce35d>] SyS_execve+0x3d/0x60
[    1.970784]  [<ffffffff8165b7a9>] stub_execve+0x69/0xa0
[    1.970784] Code: eb d6 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e8 77 71 53 00 
48 c7 c0 80 12 2a 00 65 48 03 04 25 30 cd 00 00 48 8b 50 10 31 c0 <48> f7 82 80 
04 00 00 00 00 05 00 75 04 c9 c3 66 90 e8 6b ff ff
[    1.970784] RIP  [<ffffffff81123bdf>] xnshadow_private_get+0x1f/0x40
[    1.970784]  RSP <ffff88003acebb78>
[    1.970784] CR2: 0000000000000480
[    2.042762] ---[ end trace 5020365fdf0eba4b ]---

Taken with current forge next. Seems like we try to obtain
xnsched_current_thread() on an unsupported CPU.

So, which path to take? Partially initialize sched or change the places
that try to use it over unsupported CPUs?

Due to the not-so-few exception cases which exist, I eventually came tothe conclusion that we need to have a valid scheduler slot for eachonline CPU, but we should detect and act upon inconsistent userbehavior, like moving a thread deliberately to a non-RT processor.

This way, we can still handle situations that might arise because ofuser error in a graceful manner (kernel-wise, I mean), but we alsocomplain loudly to the offender when that happens (well, actually wekick it out now).

Incidentally, a bug in the pipeline core was uncovered that preventedXenomai from skipping CPU0 from the realtime processor set, causing thekernel to hang at boot up when attempting to do so. This bug affects anyXenomai release using the newest pipeline core architecture (i.e. anyrelease which defines CONFIG_IPIPE_CORE).


http://git.xenomai.org/ipipe.git/commit/?h=ipipe-3.10&id=ce5bfa8a3282c8f97d5ec4b36802333b29e8c241

Other fixes in the -next branch from the 3.x tree. I still have sometesting ahead wrt to restricting the CPU set in -forge, but thesituation looks better already.

PS: note that switchtest won't work with -forge when restricting the CPUset, since this code assumes that all CPUs are available for business.Maybe we should fix this by adapting the processor enumeration for thistest to /proc/xenomai/affinity, or just make sure that the latterenables all CPUs so that the current switchtest implementation fits.


--
Philippe.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] forge: Panic with reduced supported_cpus mask

Reply via email to