On Tue, 2010-08-17 at 10:03 -0500, Steve Deiters wrote: > > -----Original Message----- > > From: Gilles Chanteperdrix [mailto:gilles.chanteperd...@xenomai.org] > > Sent: Saturday, August 14, 2010 8:20 AM > > To: Steve Deiters > > Cc: xenomai-help@gna.org > > Subject: Re: [Xenomai-help] Page fault in real time task causes lockup > > > > Gilles Chanteperdrix wrote: > > > Steve Deiters wrote: > > >>> -----Original Message----- > > >>> From: xenomai-help-boun...@gna.org > > >>> [mailto:xenomai-help-boun...@gna.org] On Behalf Of Steve Deiters > > >>> Sent: Friday, August 13, 2010 5:15 PM > > >>> To: xenomai-help@gna.org > > >>> Subject: [Xenomai-help] Page fault in real time task causes lockup > > >>> > > >>> I'm trying to track down a problem where it seems that a > > page fault > > >>> is causing a lockup on my machine. I am running on a > > PowerPC with > > >>> Linux version 2.6.33.5 and Xenomai 2.5.4, but also saw the same > > >>> thing with Xenomai 2.5.3. > > >>> > > >>> What I am doing is mmaping a FPGA on the parallel bus in my task > > >>> initialization. Later on I have a interrupt loop which uses > > >>> rt_intr_wait to service some FPGA stuff. On access to some of my > > >>> FPGA mapped registers I get a page fault which causes a > > lockup. I'm > > >>> guessing there is some interaction going on with the rt_intr_wait > > >>> and the fault exception. If I prefault the map by > > reading some of > > >>> the registers before the loop it is ok. If I change the > > >>> rt_intr_wait to a timed loop using rt_wait_period and > > don't prefault > > >>> the registers it is ok. > > >>> > > >>> If I enable T_WARNSW I get a SIGXCPU when it tries to access the > > >>> mapped registers. I don't necessarily care that it > > faults there so > > >>> I don't want to have to prefault like I am doing. > > >>> > > >>> If I enable some of the debugging options I end up with the > > >>> following exception dump: > > >>> > > >>> ----------- > > >>> > > >>> [ 23.623184] Xenomai: Switching to secondary mode > > after exception > > >>> #769 from user-space at 0xff187ac (pid 586) > > >>> [ 23.634273] Xenomai: Switching to secondary mode > > after exception > > >>> #769 from user-space at 0xff187ac (pid 587) > > >>> [ 23.653414] Xenomai: Switching to secondary mode > > after exception > > >>> #769 from user-space at 0xff187ac (pid 592) > > >>> [ 23.675243] Xenomai: Switching dsp_task to secondary mode after > > >>> exception #769 from user-space at 0x10016634 (pid 595) > > >>> [ 24.456360] Xenomai: Switching dsp_task to secondary mode after > > >>> exception #769 from user-space at 0x10002d28 (pid 595) > > >>> [ 24.467285] I-pipe: Detected illicit call from domain 'Xenomai' > > >>> [ 24.467300] <3> into a service reserved for domain > > >>> 'Linux' and > > >>> below. > > >>> [ 24.480199] Xenomai: Switching dsp_task to secondary mode after > > >>> exception #1792 in kernel-space at 0xc0062f48 (pid 595) > > >>> [ 24.491109] Oops: Exception in kernel mode, sig: 5 [#1] > > >>> [ 24.496258] PREEMPT MPC5121 BE > > >>> [ 24.499300] Modules linked in: lpcmem axe immmem > > >>> [ 24.503912] NIP: c0062f48 LR: c0025b0c CTR: c01be5b0 > > >>> [ 24.508870] REGS: c7bc3c60 TRAP: 0700 Not tainted (2.6.33.5) > > >>> [ 24.514775] MSR: 00021032 <ME,CE,IR,DR> CR: 24000422 > > >>> XER: 20000000 > > >>> [ 24.521127] TASK = c7b30550[595] 'dsp_task' THREAD: c7bc2000 > > >>> [ 24.526600] GPR00: 00000001 c7bc3d10 c7b30550 c03ac1c0 00002a39 > > >>> ffffffff c0360000 c03ac1c0 > > >>> [ 24.534946] GPR08: 00000000 000028ff 00002900 c0360000 82000442 > > >>> 1003c7b8 00000001 c0360000 > > >>> [ 24.543292] GPR16: c03b0000 c7bc3f50 00008000 c0300000 c03b0000 > > >>> c0360000 00000003 c0360000 > > >>> [ 24.551638] GPR24: c0360000 c7bc3d3c 0000009c c7bc2000 0000000f > > >>> c7bc3d4b c039d918 00000001 > > >>> [ 24.560180] NIP [c0062f48] __ipipe_unstall_root+0x34/0x80 > > >>> [ 24.565564] LR [c0025b0c] vprintk+0x340/0x444 > > >>> [ 24.569895] Call Trace: > > >>> [ 24.572336] [c7bc3d10] [c7bc3d4b] 0xc7bc3d4b (unreliable) > > >>> [ 24.577729] [c7bc3d20] [c0025b0c] vprintk+0x340/0x444 > > >>> [ 24.582770] [c7bc3db0] [c0026304] printk+0xb8/0x1f8 > > >>> [ 24.587640] [c7bc3e00] [c006256c] ipipe_check_context+0xc4/0xcc > > >>> [ 24.593555] [c7bc3e10] [c0299538] > > __down_interruptible+0xb4/0x148 > > >>> [ 24.599643] [c7bc3e40] [c004799c] down_interruptible+0xcc/0xdc > > >>> [ 24.605470] [c7bc3e60] [c0075acc] xnshadow_harden+0x64/0x248 > > >>> [ 24.611114] [c7bc3e80] [c0075d4c] losyscall_event+0x9c/0x374 > > >>> [ 24.616766] [c7bc3ed0] [c0063bc0] > > __ipipe_dispatch_event+0x98/0x1f0 > > >>> [ 24.623025] [c7bc3f20] [c000bcf0] > > __ipipe_syscall_root+0x60/0x170 > > >>> [ 24.629108] [c7bc3f40] [c00133e4] DoSyscall+0x20/0x5c > > >>> [ 24.634151] --- Exception: c01 at 0xff19c94 > > >>> [ 24.634158] LR = 0xff19c08 > > >>> [ 24.641360] Instruction dump: > > >>> [ 24.644318] 7c0802a6 90010014 7c0000a6 5400045e > > 7c000124 3d60c036 > > >>> 3d20c03b 814b2858 > > >>> [ 24.652055] 3929c1c0 7d4a4a78 312affff 7c095110 > > <0f000000> 3d60c036 > > >>> 38600000 392b14f8 > > >>> [ 24.660058] ------------[ cut here ]------------ > > >>> [ 24.664600] kernel BUG at kernel/ipipe/core.c:311! > > >>> [ 24.669413] ---[ end trace ca02c1a54b14d664 ]--- > > >>> [ 24.674021] note: dsp_task[595] exited with preempt_count 1 > > >>> > > >> > > >> If this gives any more clues, if I comment out the section in > > >> __rt_intr_wait in native/syscall.c where it raises the priority to > > >> XNSCHED_IRQ_PRIO it does not lock up. > > > > > > This is strange, it looks like the thread wants to move > > from secondary > > > mode to primary mode while it is already running in primary mode. > > > > > The most probable reason being that the previous call to > > xnshadow_relax went in fact wrong. The thing that could go > > wrong would be xnpod_suspend_thread in xnshadow_relax not > > suspending the thread. > > It turns out my problem was caused by an interrupt storm. I had set up > the interrupt to propagate to the Linux domain. When my rt task > transferred to the Linux domain from the page fault it wasn't able to > clear the device interrupt flag. The interrupt was reenabled at the PIC > level after Linux was done with it, and as soon as that happened it got > interrupted again.
Which caused a stack overflow and now explains the weird behavior in harden/relax, with the ipipe assertion triggering with no apparent reason. This is a collateral damage of trashing the kernel memory this way (observed at least once here as well). > > My fix was to disable the interrupt at the device level as soon as > rt_intr_wait returns, and reenable it before calling rt_intr_wait. I'm > still not sure why I was getting that exception. > Likely because there is no page table entry available in the MMU hash table for your mmaped pages until you fault them in. The e300 core requires software-assistance to handle TLB misses. (I'm referring to the 0x300 exceptions here, not to the program check one (0x700) which is clearly unexpected. > > _______________________________________________ > Xenomai-help mailing list > Xenomai-help@gna.org > https://mail.gna.org/listinfo/xenomai-help -- Philippe. _______________________________________________ Xenomai-help mailing list Xenomai-help@gna.org https://mail.gna.org/listinfo/xenomai-help