On Fri, 2009-07-31 at 18:08 -0400, Frank Ch. Eigler wrote:
> Hi, guys -
> 
> I'm running into a recurrent problem on my gdbstub+uprobes prototype.
> (git://web.elastic.org/~fche/utrace-ext.git utrace-gdbstub-uprobes)
> 
> The gist is that running a unregister_uprobe upon a stopped process
> (single-threaded, stoped via a UTRACE_STOP event) appears to hang
> indefinitely.  There appear to be no contended locks in this case,
> just this loop in uprobes_core.c:unregister_uprobe:
> 
>         ppt->state = UPROBE_REMOVING;
>         list_add_tail(&ppt->pd_node, &uproc->pending_uprobes);
> 
>         (void) quiesce_all_threads(uproc, &cur_utask_quiescing);
>         up_write(&uproc->rwsem);
>         if (cur_utask_quiescing)
>                 /* Current task is probing its own process. */
>                 (void) utask_fake_quiesce(cur_utask_quiescing);
>         else
>                 wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING);
> 
> >From a systemtap script that probes the goings-on nearby:
> 
> ....
>  15505 gdb(7435): unregister_uprobe u=0xffff8800220c1660 
>  15520 gdb(7435):  uprobe_get_tg_leader p=0x546 
>  15528 gdb(7435):  uprobe_get_tg_leader return=0xffff880078e8d680 
>  15536 gdb(7435):  uprobe_find_utask tsk=0xffff8800375c48e0 
>  15543 gdb(7435):  uprobe_find_utask return=0x0 
>  15716 gdb(7435):  uprobe_find_process tg_leader=0xffff880078e8d680 
>  15726 gdb(7435):  uprobe_find_process return=0xffff880075bc77e0 
>  15736 gdb(7435):  uprobe_free_kimg uk=0xffff8800220c1088 
>  15762 gdb(7435):  uprobe_free_kimg 
>  15770 gdb(7435):  list_add_tail new=0xffff8800377a42b8 
> head=0xffff880075bc7960 
>  15775 gdb(7435):  list_add_tail 
>  15782 gdb(7435):  quiesce_all_threads uproc=0xffff880075bc77e0 
> cur_utask_quiescing=0xffff88004c5f1dc0 
>  15790 gdb(7435):   check_uproc_quiesced uproc=0xffff880075bc77e0 
> tsk=0xffff880037d0a470 
>  15795 gdb(7435):   check_uproc_quiesced return=0x0 
>  15800 gdb(7435):  quiesce_all_threads return=0x1 
> .... hang here ....

check_uproc_quiesced() returning 0 indicates that uprobes thinks that
the probed thread is not yet quiesced.  So unregister_uprobe() is
waiting for that to happen.

But you say the probed thread is already stopped via a UTRACE_STOP
event.  So I'd think that uprobe_stop_thread() (which should be called
by quiesce_all_threads()) should notice this and mark the thread as
quiesced.  Does uprobe_stop_thread() actually get called?  If so, what
does utrace_control_pid() return to it?

> 
> The target process side is also hung:
> 
>  zsh           D 0000000000000002  3568  1288   1259 0x00000080
>   ffff880078c57b98 0000000000000046 0000000000000000 0000000000011d4a
>   ffff88006d4b8000 0000000000000000 ffff880078c57f50 00007fff138b0858
>   ffff88006d4b83b8 000000000000f968 ffff88006d4b83b8 00000000001d5740
>  Call Trace:
>   [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2
>   [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2
>   [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2
>   [<ffffffff814225af>] __mutex_lock_common+0x207/0x38a
>   [<ffffffff81097f6a>] ? gdb_uprobe_handler+0x79/0xc2
>   [<ffffffff814227f0>] mutex_lock_nested+0x3c/0x41
>   [<ffffffff81097f6a>] gdb_uprobe_handler+0x79/0xc2
>   [<ffffffff810c8d09>] uprobe_report_signal+0x4e5/0x959
>   [<ffffffff81077e53>] ? lock_release_holdtime+0x2c/0x11d
>   [<ffffffff814239a0>] ? _spin_unlock+0x2b/0x30
>   [<ffffffff81097293>] utrace_get_signal+0x33d/0x5bd
>   [<ffffffff81061304>] ? get_signal_to_deliver+0x4c/0x3b6
>   [<ffffffff810613ae>] get_signal_to_deliver+0xf6/0x3b6
>   [<ffffffff8102e1f2>] ? kvm_clock_read+0x21/0x23
>   [<ffffffff81010ffa>] do_notify_resume+0x89/0x746
>   [<ffffffff812088de>] ? __up_read+0x76/0x7f
>   [<ffffffff8142454a>] ? paranoid_userspace+0x39/0x5a
>   [<ffffffff8107963a>] ? trace_hardirqs_on_caller+0x126/0x14a
>   [<ffffffff8142455c>] paranoid_userspace+0x4b/0x5a
> 
> and the corresponding function call graph was:
> 
>   3854 zsh(1350): uprobe_report_signal action=0x0 engine=0xffff88003ac36078 
> tsk=0xffff880037d0a470 regs=0xffff880059097f58 info
>   3867 zsh(1350):  utrace_attach_pid pid=0xffff880078e8d680 flags=0x1 
> ops=0xffffffff81443be0 data=0x0 
>   3881 zsh(1350):   utrace_attach_task target=0xffff880037d0a470 flags=0x1 
> ops=0xffffffff81443be0 data=0x0 
>   3890 zsh(1350):    matching_engine utrace=0xffff880037d0aad8 flags=0x1 
> ops=0xffffffff81443be0 data=0x0 
>   3901 zsh(1350):    matching_engine return=0xffff880078d190f0 
>   3907 zsh(1350):   utrace_attach_task return=0xffff880078d190f0 
>   3914 zsh(1350):   put_task_struct t=0xffff880037d0a470 
>   3918 zsh(1350):   put_task_struct 
>   3923 zsh(1350):  utrace_attach_pid return=0xffff880078d190f0 

I can't think of why utrace_attach_pid() would get called here.

>   3931 zsh(1350):  uretprobe_set_trampoline uproc=0xffff880075bc77e0 
> tsk=0xffff880037d0a470 
>   3947 zsh(1350):   kzalloc").call size=0x40 flags=? 
>   3955 zsh(1350):   kzalloc").return return=0xffff88001ec27ee0 
>   3986 zsh(1350):  uretprobe_set_trampoline 
>   3999 zsh(1350):  clear_utrace_quiesce utask=0xffff880078f53290 resume=0x0 
>   4007 zsh(1350):   utask_adjust_flags utask=0xffff880078f53290 set=0x0 
> flags=0x1 
>   4012 zsh(1350):   utask_adjust_flags 
>   4017 zsh(1350):  clear_utrace_quiesce
>   4028 zsh(1350):  uprobe_find_probept uproc=0xffff880075bc77e0 
> vaddr=0x419dd0 
>   4034 zsh(1350):  uprobe_find_probept return=0xffff8800377a4248 
>   4041 zsh(1350):  gdb_uprobe_handler up=0xffff8800220c1660 
> regs=0xffff880059097f58 
>   4060 zsh(1350):  gdb_uprobe_handler 
>   4073 zsh(1350):  utask_adjust_flags utask=0xffff880078f53290 set=0x1 
> flags=0x1 
>   4080 zsh(1350):   utrace_set_events_pid pid=0xffff880078e8d680 
> engine=0xffff88003ac36078 eventmask=0x131d 
>   4088 zsh(1350):    utrace_set_events target=0xffff880037d0a470 
> engine=0xffff88003ac36078 events=0x131d 
>   4095 zsh(1350):     get_utrace_lock target=0xffff880037d0a470 
> engine=0xffff88003ac36078 attached=0x1 
>   4108 zsh(1350):     get_utrace_lock return=0xffff880037d0aad8 
>   4210 zsh(1350):    utrace_set_events return=0x0 
>   4215 zsh(1350):   utrace_set_events_pid return=0x0 
>   4220 zsh(1350):  utask_adjust_flags 
>   4227 zsh(1350):  uprobe_put_process uproc=0xffff880075bc77e0 
> in_callback=0x1 
>   4232 zsh(1350):  uprobe_put_process return=0x0 
>   4237 zsh(1350): uprobe_report_signal return=0x13 
>   4242 zsh(1350): finish_callback utrace=0xffff880037d0aad8 
> report=0xffff880059097d98 engine=0xffff88003ac36078 ret=0x13 
>   4248 zsh(1350): finish_callback return=0x0 
>   4254 zsh(1350): finish_resume_report report=0xffff880059097d98 
> task=0xffff880037d0a470 utrace=0xffff880037d0aad8 
>   4261 zsh(1350):  utrace_reset task=0xffff880037d0a470 
> utrace=0xffff880037d0aad8 action=0xffff880059097d98 
>   4268 zsh(1350):   put_detached_list list=0xffff880059097cc8 
>   4273 zsh(1350):   put_detached_list 
>   4278 zsh(1350):  utrace_reset 
>   4284 zsh(1350):  utrace_stop task=0xffff880037d0a470 
> utrace=0xffff880037d0aad8 report=0x0 

I'm confused here.  Is the traced task getting hung in
gdb_uprobe_handler() -- per your call trace -- or utrace_stop() (or
elsewhere) -- per your call graph?

> 
> So it's as if there may be some sort of conflict between send_sig()
> and unregister_uprobes(), but I'm not sure.  Do you have any initial
> advice about what's supposed to work / not work, or what else to
> instrument?
> 
> 
> - FChE
> 

Reply via email to