Hi, guys - I'm running into a recurrent problem on my gdbstub+uprobes prototype. (git://web.elastic.org/~fche/utrace-ext.git utrace-gdbstub-uprobes)
The gist is that running a unregister_uprobe upon a stopped process (single-threaded, stoped via a UTRACE_STOP event) appears to hang indefinitely. There appear to be no contended locks in this case, just this loop in uprobes_core.c:unregister_uprobe: ppt->state = UPROBE_REMOVING; list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); (void) quiesce_all_threads(uproc, &cur_utask_quiescing); up_write(&uproc->rwsem); if (cur_utask_quiescing) /* Current task is probing its own process. */ (void) utask_fake_quiesce(cur_utask_quiescing); else wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING); >From a systemtap script that probes the goings-on nearby: .... 15505 gdb(7435): unregister_uprobe u=0xffff8800220c1660 15520 gdb(7435): uprobe_get_tg_leader p=0x546 15528 gdb(7435): uprobe_get_tg_leader return=0xffff880078e8d680 15536 gdb(7435): uprobe_find_utask tsk=0xffff8800375c48e0 15543 gdb(7435): uprobe_find_utask return=0x0 15716 gdb(7435): uprobe_find_process tg_leader=0xffff880078e8d680 15726 gdb(7435): uprobe_find_process return=0xffff880075bc77e0 15736 gdb(7435): uprobe_free_kimg uk=0xffff8800220c1088 15762 gdb(7435): uprobe_free_kimg 15770 gdb(7435): list_add_tail new=0xffff8800377a42b8 head=0xffff880075bc7960 15775 gdb(7435): list_add_tail 15782 gdb(7435): quiesce_all_threads uproc=0xffff880075bc77e0 cur_utask_quiescing=0xffff88004c5f1dc0 15790 gdb(7435): check_uproc_quiesced uproc=0xffff880075bc77e0 tsk=0xffff880037d0a470 15795 gdb(7435): check_uproc_quiesced return=0x0 15800 gdb(7435): quiesce_all_threads return=0x1 .... hang here .... The target process side is also hung: zsh D 0000000000000002 3568 1288 1259 0x00000080 ffff880078c57b98 0000000000000046 0000000000000000 0000000000011d4a ffff88006d4b8000 0000000000000000 ffff880078c57f50 00007fff138b0858 ffff88006d4b83b8 000000000000f968 ffff88006d4b83b8 00000000001d5740 Call Trace: [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 [<ffffffff814225af>] __mutex_lock_common+0x207/0x38a [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 [<ffffffff814227f0>] mutex_lock_nested+0x3c/0x41 [<ffffffff81097f6a>] gdb_uprobe_handler+0x79/0xc2 [<ffffffff810c8d09>] uprobe_report_signal+0x4e5/0x959 [<ffffffff81077e53>] ? lock_release_holdtime+0x2c/0x11d [<ffffffff814239a0>] ? _spin_unlock+0x2b/0x30 [<ffffffff81097293>] utrace_get_signal+0x33d/0x5bd [<ffffffff81061304>] ? get_signal_to_deliver+0x4c/0x3b6 [<ffffffff810613ae>] get_signal_to_deliver+0xf6/0x3b6 [<ffffffff8102e1f2>] ? kvm_clock_read+0x21/0x23 [<ffffffff81010ffa>] do_notify_resume+0x89/0x746 [<ffffffff812088de>] ? __up_read+0x76/0x7f [<ffffffff8142454a>] ? paranoid_userspace+0x39/0x5a [<ffffffff8107963a>] ? trace_hardirqs_on_caller+0x126/0x14a [<ffffffff8142455c>] paranoid_userspace+0x4b/0x5a and the corresponding function call graph was: 3854 zsh(1350): uprobe_report_signal action=0x0 engine=0xffff88003ac36078 tsk=0xffff880037d0a470 regs=0xffff880059097f58 info 3867 zsh(1350): utrace_attach_pid pid=0xffff880078e8d680 flags=0x1 ops=0xffffffff81443be0 data=0x0 3881 zsh(1350): utrace_attach_task target=0xffff880037d0a470 flags=0x1 ops=0xffffffff81443be0 data=0x0 3890 zsh(1350): matching_engine utrace=0xffff880037d0aad8 flags=0x1 ops=0xffffffff81443be0 data=0x0 3901 zsh(1350): matching_engine return=0xffff880078d190f0 3907 zsh(1350): utrace_attach_task return=0xffff880078d190f0 3914 zsh(1350): put_task_struct t=0xffff880037d0a470 3918 zsh(1350): put_task_struct 3923 zsh(1350): utrace_attach_pid return=0xffff880078d190f0 3931 zsh(1350): uretprobe_set_trampoline uproc=0xffff880075bc77e0 tsk=0xffff880037d0a470 3947 zsh(1350): kzalloc").call size=0x40 flags=? 3955 zsh(1350): kzalloc").return return=0xffff88001ec27ee0 3986 zsh(1350): uretprobe_set_trampoline 3999 zsh(1350): clear_utrace_quiesce utask=0xffff880078f53290 resume=0x0 4007 zsh(1350): utask_adjust_flags utask=0xffff880078f53290 set=0x0 flags=0x1 4012 zsh(1350): utask_adjust_flags 4017 zsh(1350): clear_utrace_quiesce 4028 zsh(1350): uprobe_find_probept uproc=0xffff880075bc77e0 vaddr=0x419dd0 4034 zsh(1350): uprobe_find_probept return=0xffff8800377a4248 4041 zsh(1350): gdb_uprobe_handler up=0xffff8800220c1660 regs=0xffff880059097f58 4060 zsh(1350): gdb_uprobe_handler 4073 zsh(1350): utask_adjust_flags utask=0xffff880078f53290 set=0x1 flags=0x1 4080 zsh(1350): utrace_set_events_pid pid=0xffff880078e8d680 engine=0xffff88003ac36078 eventmask=0x131d 4088 zsh(1350): utrace_set_events target=0xffff880037d0a470 engine=0xffff88003ac36078 events=0x131d 4095 zsh(1350): get_utrace_lock target=0xffff880037d0a470 engine=0xffff88003ac36078 attached=0x1 4108 zsh(1350): get_utrace_lock return=0xffff880037d0aad8 4210 zsh(1350): utrace_set_events return=0x0 4215 zsh(1350): utrace_set_events_pid return=0x0 4220 zsh(1350): utask_adjust_flags 4227 zsh(1350): uprobe_put_process uproc=0xffff880075bc77e0 in_callback=0x1 4232 zsh(1350): uprobe_put_process return=0x0 4237 zsh(1350): uprobe_report_signal return=0x13 4242 zsh(1350): finish_callback utrace=0xffff880037d0aad8 report=0xffff880059097d98 engine=0xffff88003ac36078 ret=0x13 4248 zsh(1350): finish_callback return=0x0 4254 zsh(1350): finish_resume_report report=0xffff880059097d98 task=0xffff880037d0a470 utrace=0xffff880037d0aad8 4261 zsh(1350): utrace_reset task=0xffff880037d0a470 utrace=0xffff880037d0aad8 action=0xffff880059097d98 4268 zsh(1350): put_detached_list list=0xffff880059097cc8 4273 zsh(1350): put_detached_list 4278 zsh(1350): utrace_reset 4284 zsh(1350): utrace_stop task=0xffff880037d0a470 utrace=0xffff880037d0aad8 report=0x0 So it's as if there may be some sort of conflict between send_sig() and unregister_uprobes(), but I'm not sure. Do you have any initial advice about what's supposed to work / not work, or what else to instrument? - FChE