On Fri, 2009-07-31 at 18:08 -0400, Frank Ch. Eigler wrote: > Hi, guys - > > I'm running into a recurrent problem on my gdbstub+uprobes prototype. > (git://web.elastic.org/~fche/utrace-ext.git utrace-gdbstub-uprobes) > > The gist is that running a unregister_uprobe upon a stopped process > (single-threaded, stoped via a UTRACE_STOP event) appears to hang > indefinitely. There appear to be no contended locks in this case, > just this loop in uprobes_core.c:unregister_uprobe: > > ppt->state = UPROBE_REMOVING; > list_add_tail(&ppt->pd_node, &uproc->pending_uprobes); > > (void) quiesce_all_threads(uproc, &cur_utask_quiescing); > up_write(&uproc->rwsem); > if (cur_utask_quiescing) > /* Current task is probing its own process. */ > (void) utask_fake_quiesce(cur_utask_quiescing); > else > wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING); > > >From a systemtap script that probes the goings-on nearby: > > .... > 15505 gdb(7435): unregister_uprobe u=0xffff8800220c1660 > 15520 gdb(7435): uprobe_get_tg_leader p=0x546 > 15528 gdb(7435): uprobe_get_tg_leader return=0xffff880078e8d680 > 15536 gdb(7435): uprobe_find_utask tsk=0xffff8800375c48e0 > 15543 gdb(7435): uprobe_find_utask return=0x0 > 15716 gdb(7435): uprobe_find_process tg_leader=0xffff880078e8d680 > 15726 gdb(7435): uprobe_find_process return=0xffff880075bc77e0 > 15736 gdb(7435): uprobe_free_kimg uk=0xffff8800220c1088 > 15762 gdb(7435): uprobe_free_kimg > 15770 gdb(7435): list_add_tail new=0xffff8800377a42b8 > head=0xffff880075bc7960 > 15775 gdb(7435): list_add_tail > 15782 gdb(7435): quiesce_all_threads uproc=0xffff880075bc77e0 > cur_utask_quiescing=0xffff88004c5f1dc0 > 15790 gdb(7435): check_uproc_quiesced uproc=0xffff880075bc77e0 > tsk=0xffff880037d0a470 > 15795 gdb(7435): check_uproc_quiesced return=0x0 > 15800 gdb(7435): quiesce_all_threads return=0x1 > .... hang here ....
check_uproc_quiesced() returning 0 indicates that uprobes thinks that the probed thread is not yet quiesced. So unregister_uprobe() is waiting for that to happen. But you say the probed thread is already stopped via a UTRACE_STOP event. So I'd think that uprobe_stop_thread() (which should be called by quiesce_all_threads()) should notice this and mark the thread as quiesced. Does uprobe_stop_thread() actually get called? If so, what does utrace_control_pid() return to it? > > The target process side is also hung: > > zsh D 0000000000000002 3568 1288 1259 0x00000080 > ffff880078c57b98 0000000000000046 0000000000000000 0000000000011d4a > ffff88006d4b8000 0000000000000000 ffff880078c57f50 00007fff138b0858 > ffff88006d4b83b8 000000000000f968 ffff88006d4b83b8 00000000001d5740 > Call Trace: > [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 > [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 > [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 > [<ffffffff814225af>] __mutex_lock_common+0x207/0x38a > [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2 > [<ffffffff814227f0>] mutex_lock_nested+0x3c/0x41 > [<ffffffff81097f6a>] gdb_uprobe_handler+0x79/0xc2 > [<ffffffff810c8d09>] uprobe_report_signal+0x4e5/0x959 > [<ffffffff81077e53>] ? lock_release_holdtime+0x2c/0x11d > [<ffffffff814239a0>] ? _spin_unlock+0x2b/0x30 > [<ffffffff81097293>] utrace_get_signal+0x33d/0x5bd > [<ffffffff81061304>] ? get_signal_to_deliver+0x4c/0x3b6 > [<ffffffff810613ae>] get_signal_to_deliver+0xf6/0x3b6 > [<ffffffff8102e1f2>] ? kvm_clock_read+0x21/0x23 > [<ffffffff81010ffa>] do_notify_resume+0x89/0x746 > [<ffffffff812088de>] ? __up_read+0x76/0x7f > [<ffffffff8142454a>] ? paranoid_userspace+0x39/0x5a > [<ffffffff8107963a>] ? trace_hardirqs_on_caller+0x126/0x14a > [<ffffffff8142455c>] paranoid_userspace+0x4b/0x5a > > and the corresponding function call graph was: > > 3854 zsh(1350): uprobe_report_signal action=0x0 engine=0xffff88003ac36078 > tsk=0xffff880037d0a470 regs=0xffff880059097f58 info > 3867 zsh(1350): utrace_attach_pid pid=0xffff880078e8d680 flags=0x1 > ops=0xffffffff81443be0 data=0x0 > 3881 zsh(1350): utrace_attach_task target=0xffff880037d0a470 flags=0x1 > ops=0xffffffff81443be0 data=0x0 > 3890 zsh(1350): matching_engine utrace=0xffff880037d0aad8 flags=0x1 > ops=0xffffffff81443be0 data=0x0 > 3901 zsh(1350): matching_engine return=0xffff880078d190f0 > 3907 zsh(1350): utrace_attach_task return=0xffff880078d190f0 > 3914 zsh(1350): put_task_struct t=0xffff880037d0a470 > 3918 zsh(1350): put_task_struct > 3923 zsh(1350): utrace_attach_pid return=0xffff880078d190f0 I can't think of why utrace_attach_pid() would get called here. > 3931 zsh(1350): uretprobe_set_trampoline uproc=0xffff880075bc77e0 > tsk=0xffff880037d0a470 > 3947 zsh(1350): kzalloc").call size=0x40 flags=? > 3955 zsh(1350): kzalloc").return return=0xffff88001ec27ee0 > 3986 zsh(1350): uretprobe_set_trampoline > 3999 zsh(1350): clear_utrace_quiesce utask=0xffff880078f53290 resume=0x0 > 4007 zsh(1350): utask_adjust_flags utask=0xffff880078f53290 set=0x0 > flags=0x1 > 4012 zsh(1350): utask_adjust_flags > 4017 zsh(1350): clear_utrace_quiesce > 4028 zsh(1350): uprobe_find_probept uproc=0xffff880075bc77e0 > vaddr=0x419dd0 > 4034 zsh(1350): uprobe_find_probept return=0xffff8800377a4248 > 4041 zsh(1350): gdb_uprobe_handler up=0xffff8800220c1660 > regs=0xffff880059097f58 > 4060 zsh(1350): gdb_uprobe_handler > 4073 zsh(1350): utask_adjust_flags utask=0xffff880078f53290 set=0x1 > flags=0x1 > 4080 zsh(1350): utrace_set_events_pid pid=0xffff880078e8d680 > engine=0xffff88003ac36078 eventmask=0x131d > 4088 zsh(1350): utrace_set_events target=0xffff880037d0a470 > engine=0xffff88003ac36078 events=0x131d > 4095 zsh(1350): get_utrace_lock target=0xffff880037d0a470 > engine=0xffff88003ac36078 attached=0x1 > 4108 zsh(1350): get_utrace_lock return=0xffff880037d0aad8 > 4210 zsh(1350): utrace_set_events return=0x0 > 4215 zsh(1350): utrace_set_events_pid return=0x0 > 4220 zsh(1350): utask_adjust_flags > 4227 zsh(1350): uprobe_put_process uproc=0xffff880075bc77e0 > in_callback=0x1 > 4232 zsh(1350): uprobe_put_process return=0x0 > 4237 zsh(1350): uprobe_report_signal return=0x13 > 4242 zsh(1350): finish_callback utrace=0xffff880037d0aad8 > report=0xffff880059097d98 engine=0xffff88003ac36078 ret=0x13 > 4248 zsh(1350): finish_callback return=0x0 > 4254 zsh(1350): finish_resume_report report=0xffff880059097d98 > task=0xffff880037d0a470 utrace=0xffff880037d0aad8 > 4261 zsh(1350): utrace_reset task=0xffff880037d0a470 > utrace=0xffff880037d0aad8 action=0xffff880059097d98 > 4268 zsh(1350): put_detached_list list=0xffff880059097cc8 > 4273 zsh(1350): put_detached_list > 4278 zsh(1350): utrace_reset > 4284 zsh(1350): utrace_stop task=0xffff880037d0a470 > utrace=0xffff880037d0aad8 report=0x0 I'm confused here. Is the traced task getting hung in gdb_uprobe_handler() -- per your call trace -- or utrace_stop() (or elsewhere) -- per your call graph? > > So it's as if there may be some sort of conflict between send_sig() > and unregister_uprobes(), but I'm not sure. Do you have any initial > advice about what's supposed to work / not work, or what else to > instrument? > > > - FChE >