Hi Jini,

2018年12月13日(木) 12:14 Jini George <jini.geo...@oracle.com>:
>
> Thank you very much for looking into this, Yasumasa!
>
> The 'pid' used in process_doesnt_exist() is actually the lwpid of the
> thread to be attached to.
>
>  From Pgrab(), we call ptrace_attach() first for the pid at line 448. In
> which case, we end up calling process_doesnt_exist() through
> ptrace_attach() with the pid. In this case, we completely error out if
> the 'pid' doesn't exist. Then we go on to discover the lwpids of this
> process through libpthread_db or by going through the
> /proc/<pid>/task/<lwpid> in case of the process running in a container,
> and we then  invoke ptrace_attach() again on all these newly discovered
> lwpids at line 503. (we have already attached to the main thread (where
> the pid and the lwpid are the same). This time when
> process_doesnt_exist() gets called inside ptrace_attach(), we are
> dealing with the lwpids. And we would not error out if the thread is
> missing or is in an 'exiting' state when we try to attach.

Ok, it seems good.

I think zombie thread(s) will not exist in thread_info list at this point
because they will be removed at thread_db_callback().


>  From the proc man page, /proc/<pid>/task/<lwpid>/* and
> /proc/<lwpid-treated-as-a-pid>/* files would have the same content for
> the same lwpid.
>
> ============= < Man Page Snippet > ================================
>         /proc/[pid]/task (since Linux 2.6.0-test6)
>                This is a directory that contains one subdirectory for
> each thread in the process.  The name of each subdirectory is the
> numerical thread ID ([tid]) of the thread (see gettid(2)). Within each
> of these subdirectories, there is a set of files with the same names and
> contents as under the /proc/[pid] directories.
> ============= < Man Page Snippet End> =============================
>
> Let me know if you are not Ok with this.
>
> Going forward, we should remove the libpthread_db dependency for the
> threads discovery and only depend on /proc/<pid>/task/<lwpid>s for this
> (https://bugs.openjdk.java.net/browse/JDK-8181313).

It's great news!
I will help you :-)


Thanks,

Yasumasa


> Thank you,
> Jini.
>
> On 12/13/2018 6:15 AM, Yasumasa Suenaga wrote:
> > Hi Jini,
> >
> > I have a comment for your webrev.02 .
> >
> >
> > You added process_doesnt_exist() to check process / thread liveness from
> > /proc/<PID>, but it is not enough.
> > Information of threads (LWP) will be stored in /proc/<PID>/task/<LWPID>.
> > So you should check /proc/<PID>/task/status for threads.
> >
> >
> > Thanks,
> >
> > Yasumasa
> >
> >
> > On 2018/12/12 21:15, Jini George wrote:
> >> Thank you very much for looking into this, JC!
> >>
> >> I have a revised webrev addressing your comments at:
> >>
> >> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.02/index.html
> >>
> >> Requesting one more review for this. My comments inline:
> >>
> >> On 12/12/2018 2:53 AM, JC Beyler wrote:
> >>> Hi Jini,
> >>>
> >>> I saw a few nits:
> >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.h.udiff.html
> >>>
> >>>   ? -> The comments are in the third person normally it seems so it
> >>> would
> >>> become (I also removed the s from threads):
> >>>
> >>> +// deletes a thread from the thread list
> >> Done.
> >>>
> >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c.udiff.html
> >>>
> >>>   ? -> You added two empty lines it seems that could be removed
> >> Done.
> >>>
> >>>
> >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/src/jdk.hotspot.agent/linux/native/libsaproc/ps_proc.c.udiff.html
> >>>
> >>>   ? -> Is there a real reason to have both enums? We could have a single
> >>> enum it seems and not lose too much
> >>
> >> You are right. I have done away with the WAITPID* enum.
> >>
> >>>   ? -> you have a switch "
> >>>   ? ? ? ?switch (errno) {"
> >>>   ? ? ? ? -> Where really you could simplify the reading by moving the
> >>> EINTR case outside with its continue
> >>>   ? ? ? ? -> The switch could then remain as it was (though you move
> >>> print_debug to print_error)
> >>>   ? ? ? ? -> and just return in each cases
> >> I have changed this to:
> >>
> >> 206     } else {
> >> 207       switch (errno) {
> >> 208         case EINTR:
> >> 209           continue;
> >> 210           break;
> >> 211         case ECHILD:
> >> 212           print_debug("waitpid() failed. Child process pid (%d) does
> >> not exist \n", pid);
> >> 213           return ATTACH_THREAD_DEAD;
> >> 214         case EINVAL:
> >> 215           print_error("waitpid() failed. Invalid options
> >> argument.\n");
> >> 216           return ATTACH_FAIL;
> >> 217         default:
> >> 218           print_error("waitpid() failed. Unexpected error %d\n",
> >> errno);
> >> 219           return ATTACH_FAIL;
> >> 220       }
> >> 221     } // else
> >>
> >>
> >>>
> >>>   ? ?->?if (strncmp (buf, "State:", 6) == 0) {
> >>>   ? ? ? -> You use sizeof("State:") right below; perhaps you could just
> >>> use "? const char const state[] = "State:";" and use sizeof(state) and
> >>> for the string, it seems less error prone
> >>>
> >>>   ? -> A minor "bug" is here:
> >>> +? ? ? state = buf + sizeof ("State:");
> >>>   ? ? ? ? -> You did a strncmp above but that only assures the start of
> >>> the string is "State:", technically the character after the ':' is the
> >>> but it could only be that; sizeof("State:") is 7 and not 6. So you miss
> >>> one character when you are skipping spaces
> >>>   ? ? ? ? -> It was probably ok because you always had at least one
> >>> space, ie "State: "
> >>
> >> Thanks! I have made some changes here to use a const char string and a
> >> variable to store the calculated length using strlen(). And I am using
> >> isspace() now to skip spaces since tabs could also be used as a
> >> delimiter.
> >>
> >>>   ? -> Extra space here before the '(': "sizeof (buf)"
> >> Done.
> >>>
> >>> Finally your return sequence for that method could be simplified to:
> >>>
> >>> +? if (!found_state) {
> >>> +? ? print_error(" Could not find the State: string in the status file
> >>> for pid %d\n", pid);
> >>> +? }
> >>> +? fclose (fp);
> >>> +? return !found_state;
> >>
> >> I have modified this to:
> >>
> >> 257   if (!found_state) {
> >> 258     // Assuming the thread exists.
> >> 259     print_error("Could not find the 'State:' string in the
> >> /proc/%d/status file\n", pid);
> >> 260   }
> >> 261   fclose (fp);
> >> 262   return false;
> >>
> >> Thank you,
> >> Jini.
> >>
> >>>
> >>> Thanks!
> >>> Jc
> >>>
> >>> On Tue, Dec 11, 2018 at 9:30 AM Jini George <jini.george at oracle.com
> >>> <mailto:jini.george at oracle.com>> wrote:
> >>>
> >>>      Hello !
> >>>
> >>>      Requesting reviews for:
> >>>
> >>>      https://bugs.openjdk.java.net/browse/JDK-8202884
> >>>      Webrev:
> >>> http://cr.openjdk.java.net/~jgeorge/8202884/webrev.00/index.html
> >>>
> >>>      Details:
> >>>      For attaching to the threads in a process, we first go ahead and
> >>> do a
> >>>      ptrace attach to the main thread. Later, we use the libthread_db
> >>>      library
> >>>      (or, in the case of being within a container, iterate through the
> >>>      /proc/<pid>/task files) to discover the threads of the process,
> >>> and add
> >>>      them to the threads list (within SA) for this process. Once, we
> >>> have
> >>>      discovered all the threads and added these to the list of
> >>> threads, we
> >>>      then invoke ptrace attach individually on all these threads to
> >>>      attach to
> >>>      these. When we deal with an application where the threads are
> >>> exiting
> >>>      continuously, some of these threads might not exist by the time
> >>> we try
> >>>      to ptrace attach to these threads. The proposed fix includes the
> >>>      following modifications to solve this.
> >>>       ? 1. Check the state of the threads in the thread_db callback
> >>> routine,
> >>>      and skip if the state of the thread is TD_THR_UNKNOWN or
> >>> TD_THR_ZOMBIE.
> >>>      SA does not try to ptrace attach to these threads and does not
> >>> include
> >>>      these threads in the threads list.
> >>>       ? 2. While ptrace attaching to the thread, if
> >>> ptrace(PTRACE_ATTACH,
> >>>      ...)
> >>>      fails with either ESCRH or EPERM, check the state of the thread by
> >>>      checking if the /proc/<pid>/status file corresponding to that
> >>> thread
> >>>      exists and if so, reading in the 'State:' line of that file. Skip
> >>>      attaching to this thread and delete this thread from the SA list of
> >>>      threads, if the thread is dead (State: X) or is a zombie (State:
> >>> Z).
> >>>       ?From the /proc man page, "Current state of the process. One of "R
> >>>      (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T
> >>>      (tracing
> >>>      stop)", "Z (zombie)", or "X (dead)"."
> >>>       ? 3. If waitpid() on the thread is a failure, again skip this
> >>> thread
> >>>      (delete this from SA's list of threads) instead of bailing out
> >>> if the
> >>>      thread has exited or terminated.
> >>>
> >>>      Testing:
> >>>      1. Tested by attaching and detaching multiple times to a test
> >>> program
> >>>      spawning numerous short lived threads.
> >>>      2. The SA tests (under test/hotspot/jtreg/serviceability/sa) passed
> >>>      with
> >>>      100 repeats on Mach5.
> >>>      3. No new failures and no occurrences of JDK-8202884 seen with
> >>> testing
> >>>      the SA tests (tiers 1 to 5) on Mach5.
> >>>
> >>>      More details in the bug comments section.
> >>>
> >>>      Thank you,
> >>>      Jini.
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Thanks,
> >>> Jc
> >>

Reply via email to