I could reproduce the kernel crashes when compiling ruby1.9 on hppa,
even with kernel 2.6.28-rc8.

My current analysis showed, that the "miniruby" process creates a
"memory protection ID trap" when trying to push back the results of the
poll() syscall to userspace. This means, that (sometimes) when the
kernel calls __put_user() in fs/select.c:do_sys_poll() it suddenly
access some kind of invalid memory and creates a data protection ID fault.

My assumption is, that the linuxthreads implementation on hppa is still
buggy (or it's another still hidden kernel bug).

Nevertheless, the attached patch below fixes the kernel crashes at least.
This means, that you can compile ruby1.9 and run miniruby as often as
you want. The patch will make it possible to kill the miniruby process
and makes further debugging of the problem possible.

The patch applies to kernel 2.6.28-rc8, but should similar apply to
older kernel versions (e.g. 2.6.26) as well.

I'm not sure yet, if this will be the final version of the patch and
I'll continue to try find the real cause of the problem of course...

Any feedback and testing results very much welcome.

Helge

Patch is
Signed-off-by: Helge Deller <del...@gmx.de>
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 4c771cd..70eabfe 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -43,6 +43,8 @@
 
 #include "../math-emu/math-emu.h"	/* for handle_fpe() */
 
+DECLARE_PER_CPU(struct exception_data, exception_data);
+
 #define PRINT_USER_FAULTS /* (turn this on if you want user faults to be */
 			  /*  dumped to the console via printk)          */
 
@@ -745,6 +747,41 @@ void handle_interruption(int code, struct pt_regs *regs)
 		/* Fall Through */
 	case 27: 
 		/* Data memory protection ID trap */
+		if (code == 27 && !user_mode(regs)) {
+			const struct exception_table_entry *fix;
+
+			/* mostly copied from:
+ 			   arch/parisc/mm/fault.c:do_page_fault()
+			 */
+			fix = search_exception_tables(regs->iaoq[0]);
+			printk(KERN_CRIT "BUG: Kernel Data memory protection ID"
+				" trap at %p (%pS), fix=%p\n",
+				(void*)regs->iaoq[0], (void*)regs->iaoq[0], fix);
+			if (fix) {
+				struct exception_data *d;
+
+				d = &__get_cpu_var(exception_data);
+				d->fault_ip = regs->iaoq[0];
+				d->fault_space = regs->isr;
+				d->fault_addr = regs->ior;
+
+				regs->iaoq[0] = ((fix->fixup) & ~3);
+
+				/*
+				 * NOTE: In some cases the faulting instruction
+				 * may be in the delay slot of a branch. We
+				 * don't want to take the branch, so we don't
+				 * increment iaoq[1], instead we set it to be
+				 * iaoq[0]+4, and clear the B bit in the PSW
+				 */
+
+				regs->iaoq[1] = regs->iaoq[0] + 4;
+				regs->gr[0] &= ~PSW_B; /* IPSW in gr[0] */
+
+				return;
+			}
+		}
+
 		die_if_kernel("Protection id trap", regs, code);
 		si.si_code = SEGV_MAPERR;
 		si.si_signo = SIGSEGV;

Reply via email to