Hi - itrace is a utrace engine included in the systemtap runtime that activates UTRACE_SINGLESTEP / BLOCKSTEP on targeted userspace processes. It seems to work OK on ppc64, but crashes lots on recent fedora (inc. rawhide) level utrace on x86/x86-64. (Lately I've been unable to boot a utrace/master branch version of the kernel to try it against.)
The systemtap code is at git://sourceware.org/git/systemtap.git, master branch for the near-original code, pr10653-itrace-crashes branch for a partial rewrite. The key file is runtime/itrace.c. tapsets-itrace.cxx generates code that calls into this for new or dying threads. The crash is now intermittent (an "improvement" over a near-100% crash on rawhide x86 machines), and is triggered with a systemtap testsuite run thusly: % configure ... ; make; make install # build systemtap % sudo make installcheck RUNTESTFLAGS=itrace.exp While hacking up my near-rewritten form of runtime/itrace.c, I came across several cases where utrace made calls on our engine seemingly long after a UTRACE_DETACH was to take effect. This included cases where a signal or other reporting callback had already returned UTRACE_DETACH, or after a debugging task seemingly completed a utrace_control(DETACH). So I added hack after hack, using engine->data as an indication that someone else has already requested a DETACH, and to keep echoing that back to utrace until it "takes". (Maybe it's a bug in x86 single-stepping logic that didn't appear on ppc. Dunno. We've also had to disable itrace on ia64 for suspicious crashes that may also be the same problem. sourceware.org/PR10171 and /PR10653.) Could you guys take a look at old/new runtime/itrace.c, to see what the code's trying to do with utrace? Do you have a utrace/master-level kernel handy to try it on? I don't understand why any of these hacks are supposed to be necessary. A typical kernel crash log follows (3-cpu virtual machine). Sometimes this triggers somewhat *after* the systemtap session has completed, up to and including unloading of the runtime/itrace.c-containing module. [ 634.676957] stap_198c84f334a07f5b8157be1fd1be6bec_1461: systemtap: 1.0/0.143, base: fa97b000, memory: 16305+20172+1344+13600 data+text+ctx+net, probes: 5 [ 634.926718] [ 634.926719] ============================================= [ 634.927113] [ INFO: possible recursive locking detected ] [ 634.927113] 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 634.927113] --------------------------------------------- [ 634.927113] ls_5066/5705 is trying to acquire lock: [ 634.927113] (&sighand->siglock){......}, at: [<c044e6e0>] get_signal_to_deliver+0x5c/0x344 [ 634.927113] [ 634.927113] but task is already holding lock: [ 634.927113] (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d [ 634.927113] [ 634.927113] other info that might help us debug this: [ 634.927113] 1 lock held by ls_5066/5705: [ 634.927113] #0: (&sighand->siglock){......}, at: [<c048407d>] utrace_get_signal+0x50b/0x53d [ 634.927113] [ 634.927113] stack backtrace: [ 634.927113] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 634.927113] Call Trace: [ 634.927113] [<c07b3339>] ? printk+0x14/0x1b [ 634.927113] [<c04690df>] __lock_acquire+0x833/0xb6f [ 634.927113] [<c048407d>] ? utrace_get_signal+0x50b/0x53d [ 634.927113] [<c04694ae>] lock_acquire+0x93/0xb1 [ 634.927113] [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344 [ 634.927113] [<c07b578b>] _spin_lock_irq+0x2e/0x5e [ 634.927113] [<c044e6e0>] ? get_signal_to_deliver+0x5c/0x344 [ 634.927113] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344 [ 634.927113] [<c04028dc>] do_signal+0x5f/0x703 [ 634.927113] [<c044e0b3>] ? force_sig_info+0xa2/0xac [ 634.927113] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c [ 634.927113] [<c040acee>] ? send_sigtrap+0x6d/0x74 [ 634.927113] [<c0402fa7>] do_notify_resume+0x27/0x77 [ 634.927113] [<c0403718>] work_notifysig+0x13/0x1b [ 639.001030] BUG: spinlock lockup on CPU#3, ls_5066/5705, ec094704 (Not tainted) [ 818.430111] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 818.433097] Call Trace: [ 818.435096] [<c07b3339>] ? printk+0x14/0x1b [ 818.437097] [<c05d451a>] _raw_spin_lock+0x102/0x128 [ 818.439096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 818.441097] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344 [ 818.541097] [<c04028dc>] do_signal+0x5f/0x703 [ 818.543096] [<c044e0b3>] ? force_sig_info+0xa2/0xac [ 818.545097] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c [ 818.547096] [<c040acee>] ? send_sigtrap+0x6d/0x74 [ 818.549097] [<c0402fa7>] do_notify_resume+0x27/0x77 [ 818.551312] [<c0403718>] work_notifysig+0x13/0x1b [ 818.553097] sending NMI to all CPUs: [ 818.555096] NMI backtrace for cpu 3 [ 818.555096] [ 818.555096] Pid: 5705, comm: ls_5066 Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) [ 818.555096] EIP: 0060:[<c0418cb1>] EFLAGS: 00210046 CPU: 3 [ 818.555096] EIP is at __default_send_IPI_dest_field+0x5f/0x64 [ 818.555096] EAX: 00000c00 EBX: 0f000000 ECX: 00000800 EDX: fffff000 [ 818.555096] ESI: 00000002 EDI: 00000c00 EBP: eb601e08 ESP: eb601dfc [ 818.555096] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 818.555096] CR0: 8005003b CR2: 00ac3eac CR3: 2a44d000 CR4: 000006d0 [ 818.555096] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 818.555096] DR6: ffff4ff0 DR7: 00000400 [ 818.555096] Call Trace: [ 818.555096] [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac [ 818.555096] [<c0418ad3>] default_send_IPI_all+0x27/0x68 [ 818.555096] [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54 [ 818.555096] [<c05d451f>] _raw_spin_lock+0x107/0x128 [ 818.555096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 818.555096] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344 [ 818.555096] [<c04028dc>] do_signal+0x5f/0x703 [ 818.555096] [<c044e0b3>] ? force_sig_info+0xa2/0xac [ 818.555096] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c [ 818.555096] [<c040acee>] ? send_sigtrap+0x6d/0x74 [ 818.555096] [<c0402fa7>] do_notify_resume+0x27/0x77 [ 818.555096] [<c0403718>] work_notifysig+0x13/0x1b [ 818.555096] Pid: 5705, comm: ls_5066 Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 818.555096] Call Trace: [ 818.555096] [<c0402656>] ? show_regs+0x24/0x2a [ 818.555096] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171 [ 818.555096] [<c07b619f>] do_nmi+0xa8/0x289 [ 818.555096] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34 [ 818.555096] [<c041007b>] ? do_machine_check+0x683/0x6ec [ 818.555096] [<c04100e0>] ? do_machine_check+0x6e8/0x6ec [ 818.555096] [<c0418cb1>] ? __default_send_IPI_dest_field+0x5f/0x64 [ 818.555096] [<c0418d36>] default_send_IPI_mask_logical+0x80/0xac [ 818.555096] [<c0418ad3>] default_send_IPI_all+0x27/0x68 [ 818.555096] [<c0419050>] arch_trigger_all_cpu_backtrace+0x32/0x54 [ 818.555096] [<c05d451f>] _raw_spin_lock+0x107/0x128 [ 818.555096] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 818.555096] [<c044e6e0>] get_signal_to_deliver+0x5c/0x344 [ 818.555096] [<c04028dc>] do_signal+0x5f/0x703 [ 818.555096] [<c044e0b3>] ? force_sig_info+0xa2/0xac [ 818.555096] [<c07b6fda>] ? kprobe_exceptions_notify+0xe/0x43c [ 818.555096] [<c040acee>] ? send_sigtrap+0x6d/0x74 [ 818.555096] [<c0402fa7>] do_notify_resume+0x27/0x77 [ 818.555096] [<c0403718>] work_notifysig+0x13/0x1b [ 818.556078] NMI backtrace for cpu 2 [ 818.556078] [ 818.556078] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) [ 818.556078] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 2 [ 818.556078] EIP is at native_safe_halt+0xa/0xc [ 818.556078] EAX: c0409645 EBX: c0a3301c ECX: f6d02ba0 EDX: 00000000 [ 818.556078] ESI: 00000002 EDI: 00000000 EBP: f6cfff78 ESP: f6cfff78 [ 818.556078] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 818.556078] CR0: 8005003b CR2: b76be000 CR3: 27f2e000 CR4: 000006d0 [ 818.556078] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 818.556078] DR6: ffff0ff0 DR7: 00000400 [ 818.556078] Call Trace: [ 818.556078] [<c040964a>] default_idle+0x44/0x6e [ 818.556078] [<c04026f7>] cpu_idle+0x9b/0xb4 [ 818.556078] [<c07afcd0>] start_secondary+0x204/0x242 [ 818.556078] Pid: 0, comm: swapper Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 818.556078] Call Trace: [ 818.556078] [<c0402656>] ? show_regs+0x24/0x2a [ 818.556078] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171 [ 818.556078] [<c07b619f>] do_nmi+0xa8/0x289 [ 818.556078] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34 [ 818.556078] [<c0409645>] ? default_idle+0x3f/0x6e [ 818.556078] [<c0409645>] ? default_idle+0x3f/0x6e [ 818.556078] [<c04201dc>] ? native_safe_halt+0xa/0xc [ 818.556078] [<c040964a>] default_idle+0x44/0x6e [ 818.556078] [<c04026f7>] cpu_idle+0x9b/0xb4 [ 818.556078] [<c07afcd0>] start_secondary+0x204/0x242 [ 634.927053] NMI backtrace for cpu 1 [ 634.927053] [ 634.927053] Pid: 5698, comm: stapio Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) [ 634.927053] EIP: 0060:[<c05d0ad5>] EFLAGS: 00200096 CPU: 1 [ 634.927053] EIP is at delay_tsc+0x1f/0x66 [ 634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e [ 634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c [ 634.927053] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0 [ 634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 634.927053] DR6: ffff0ff0 DR7: 00000400 [ 634.927053] Call Trace: [ 634.927053] [<c05d0a58>] __delay+0xe/0x10 [ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128 [ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75 [ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8 [ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e [ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96 [ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16 [ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0 [ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4 [ 634.927053] [<c04e5b71>] sys_write+0x40/0x62 [ 634.927053] [<c0403610>] syscall_call+0x7/0xb [ 634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 634.927053] EAX: b833d718 EBX: 00000001 ECX: 00000001 EDX: 00033e0e [ 634.927053] ESI: 00000001 EDI: 00000000 EBP: ea53fe6c ESP: ea53fe5c [ 634.927053] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 634.927053] CR0: 8005003b CR2: 08f862b8 CR3: 27f2e000 CR4: 000006d0 [ 634.927053] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 634.927053] DR6: ffff0ff0 DR7: 00000400 [ 634.927053] Call Trace: [ 634.927053] [<c05d0a58>] __delay+0xe/0x10 [ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128 [ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75 [ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8 [ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e [ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96 [ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16 [ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0 [ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4 [ 634.927053] [<c04e5b71>] sys_write+0x40/0x62 [ 634.927053] [<c0403610>] syscall_call+0x7/0xb [ 634.927053] Pid: 5698, comm: stapio Not tainted 2.6.32-0.48.rc7.git1.fc13.i686 #1 [ 634.927053] Call Trace: [ 634.927053] [<c0402656>] ? show_regs+0x24/0x2a [ 634.927053] [<c07b6b26>] nmi_watchdog_tick+0x9d/0x171 [ 634.927053] [<c07b619f>] do_nmi+0xa8/0x289 [ 634.927053] [<c07b5d5c>] nmi_stack_correct+0x2f/0x34 [ 634.927053] [<c046007b>] ? ktime_get_ts+0x33/0xa2 [ 634.927053] [<c05d0ad5>] ? delay_tsc+0x1f/0x66 [ 634.927053] [<c05d0a58>] __delay+0xe/0x10 [ 634.927053] [<c05d44d3>] _raw_spin_lock+0xbb/0x128 [ 634.927053] [<c07b57aa>] _spin_lock_irq+0x4d/0x5e [ 634.927053] [<c0484239>] utrace_do_stop+0x1e/0x75 [ 634.927053] [<c0484359>] utrace_control+0xc9/0x1b8 [ 634.927053] [<fa97cc8b>] stap_utrace_detach+0x92/0xef [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97cd12>] stap_utrace_detach_ops+0x2a/0x81 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d361>] stap_stop_task_finder+0x5e/0xf2 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97d7f8>] _stp_cleanup_and_exit+0x6b/0x1b3 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<fa97fad9>] _stp_ctl_write_cmd+0x790/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e667e>] ? rcu_read_unlock+0x0/0x1e [ 634.927053] [<c059b0ed>] ? selinux_file_permission+0x4f/0x96 [ 634.927053] [<c059242f>] ? security_file_permission+0x14/0x16 [ 634.927053] [<c04e56a5>] ? rw_verify_area+0x9d/0xc0 [ 634.927053] [<fa97f349>] ? _stp_ctl_write_cmd+0x0/0x909 [stap_198c84f334a07f5b8157be1fd1be6bec_1461] [ 634.927053] [<c04e5a74>] vfs_write+0x85/0xe4 [ 634.927053] [<c04e5b71>] sys_write+0x40/0x62 [ 634.927053] [<c0403610>] syscall_call+0x7/0xb [ 819.261347] NMI backtrace for cpu 0 [ 819.261347] [ 819.261347] Pid: 0, comm: swapper Not tainted (2.6.32-0.48.rc7.git1.fc13.i686 #1) [ 819.261347] EIP: 0060:[<c04201dc>] EFLAGS: 00000246 CPU: 0 [ 819.261347] EIP is at native_safe_hal