I've run into an issue with gdb on 5.1, and ktrace leads me to think it's likely a kernel issue (hence this list). It wouldn't surprise me too much if I were wrong, though; feel free to point me elsewhere if appropriate.
The surface manifestation is straightforward: % cat gdbtest.c int main(void); int main(void) { return(0); } % cc -o gdbtest gdbtest.c -g % gdb gdbtest GNU gdb 6.5 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386--netbsdelf"... (gdb) run Starting program: /home/mouse/gdbtest at which point nothing I've tried will wake it up, except for SIGKILLing gdb from another shell, which produces a "sorry, pid %d was killed: orphaned traced process" message from the kernel and a "Killed" from my shell, neither of which is surprising. ps shows the gdb process, a copy of my shell, and a dead zombie, as in 10467 ttyp7 ZW+ 0:00.00 (linktarget) 10702 ttyp7 I 0:00.01 gdb gdbtest 24466 ttyp7 IX+ 0:00.00 -local/bin/mcsh -c exec /home/mouse/gdbtest My shell does run linktarget as part of its startup script, so its presence is not that surprising; its presence as a zombie for more than the barest moment is what's surprising. Runing gdb under ktrace -i makes me think the SIGCHLD the shell is wiating for is getting lost: 25022 1 linktarget CALL write(1,0xbb902000,7) 25022 1 linktarget GIO fd 1 wrote 7 bytes "/local\n" 25022 1 linktarget RET write 7 25022 1 linktarget CALL exit(0) 24312 1 mcsh GIO fd 4 read 7 bytes "/local\n" 24312 1 mcsh RET read 7 24312 1 mcsh CALL read(4,0xbfbf1c90,0x4000) 24312 1 mcsh GIO fd 4 read 0 bytes "" 24312 1 mcsh RET read 0 24312 1 mcsh CALL close(4) 24312 1 mcsh RET close 0 24312 1 mcsh CALL __sigprocmask14(1,0xbfbf1c50,0xbfbf1c40) 24312 1 mcsh RET __sigprocmask14 0 24312 1 mcsh CALL __sigprocmask14(3,0xbfbf1c40,0) 24312 1 mcsh RET __sigprocmask14 0 24312 1 mcsh CALL __sigprocmask14(1,0xbfbf1bf4,0xbfbf1be4) 24312 1 mcsh RET __sigprocmask14 0 24312 1 mcsh CALL __sigprocmask14(1,0xbfbf1bf4,0) 24312 1 mcsh RET __sigprocmask14 0 24312 1 mcsh CALL __sigsuspend14(0xbfbf1bf4) 10674 1 gdb RET wait4 24312/0x5ef8 10674 1 gdb CALL ptrace(PT_GETREGS,0x5ef8,0xbfbfe19c,0) 10674 1 gdb RET ptrace 0 10674 1 gdb CALL ptrace(PT_CONTINUE,0x5ef8,1,0x14) 10674 1 gdb RET ptrace 0 24312 1 mcsh RET __sigsuspend14 -1 errno 4 Interrupted system call 24312 1 mcsh CALL __sigprocmask14(1,0xbfbf1bf4,0) 24312 1 mcsh RET __sigprocmask14 0 24312 1 mcsh CALL __sigsuspend14(0xbfbf1bf4) 10674 1 gdb CALL wait4(0xffffffff,0xbfbfe408,0,0) (I SIGKILL gdb at this point) 10674 1 gdb RET wait4 RESTART 10674 1 gdb PSIG SIGKILL SIG_DFL: code=SI_USER sent by pid=14918, uid=101) 24312 1 mcsh RET __sigsuspend14 -1 errno 4 Interrupted system call 24312 1 mcsh PSIG SIGKILL SIG_DFL: code=SI_NOINFO The PT_CONTINUE call does make it look as though gdb is doing the right thing here but signal delivery isn't happening. Running that mcsh -c exec command under control of ktrace _without_ gdb being involved produces 25339 1 linktarget CALL write(1,0xbb902000,7) 25339 1 linktarget GIO fd 1 wrote 7 bytes "/local\n" 25339 1 linktarget RET write 7 25339 1 linktarget CALL exit(0) 25061 1 mcsh GIO fd 4 read 7 bytes "/local\n" 25061 1 mcsh RET read 7 25061 1 mcsh CALL read(4,0xbfbf1cb0,0x4000) 25061 1 mcsh GIO fd 4 read 0 bytes "" 25061 1 mcsh RET read 0 25061 1 mcsh CALL close(4) 25061 1 mcsh RET close 0 25061 1 mcsh CALL __sigprocmask14(1,0xbfbf1c70,0xbfbf1c60) 25061 1 mcsh RET __sigprocmask14 0 25061 1 mcsh CALL __sigprocmask14(3,0xbfbf1c60,0) 25061 1 mcsh RET __sigprocmask14 0 25061 1 mcsh CALL __sigprocmask14(1,0xbfbf1c14,0xbfbf1c04) 25061 1 mcsh RET __sigprocmask14 0 25061 1 mcsh CALL __sigprocmask14(1,0xbfbf1c14,0) 25061 1 mcsh RET __sigprocmask14 0 25061 1 mcsh CALL __sigsuspend14(0xbfbf1c14) 25061 1 mcsh RET __sigsuspend14 -1 errno 4 Interrupted system call 25061 1 mcsh PSIG SIGCHLD caught handler=0x806a110 mask=(2,20): code=CLD_EXITED child pid=25339, uid=101, status=0, utime=0, stime=0) 25061 1 mcsh CALL wait4(0xffffffff,0xbfbf1818,3,0xbfbf17d0) 25061 1 mcsh RET wait4 25339/0x62fb and everything carries on correctly. So it looks to me as though something's busted somewhere around PT_CONTINUE and signal delivery, at least in the cas eof SIGCHLD. Any thoughts? I have a workaround - "env SHELL=/bin/sh gdb ..." - that presumably works because I have no startup script for /bin/sh, so it doesn't need SIGCHLD to work. (In passing, is there an equivalent setting from within gdb? I haven't found one, but gdb's documentation is remarkably difficult to use. The most I've found is a variable that says whether to use a shell, not what shell to use. I tried gdb's environment setting but that didn't help.) /~\ The ASCII Mouse \ / Ribbon Campaign X Against HTML mo...@rodents-montreal.org / \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B