> Date: Wed, 7 Oct 2015 11:53:32 +0100
> From: Stuart Henderson <[email protected]>
>
> On 2015/10/07 11:52, Stuart Henderson wrote:
> > monitoring-plugins has a program that checks available space on partitions.
> > Before doing this it does a stat() to check that the requested directory
> > exists and is accessible. In their devel tree they have moved to doing
> > this stat() in a thread - commit log was "don't let check_disk hang on
> > hanging file systems". However this code doesn't work for us.
> >
> > I've attached a stripped-down test program based on their code that works
> > as expected on Linux but fails on OpenBSD. I can always patch to use the
> > non-pthread code, but I wondered if anyone has an idea what's up and
> > whether the bug is theirs or ours - is pthread_kill(thread, 0) working
> > as expected?
> >
> > $ make thread LDFLAGS=-lpthread
> > cc -O2 -pipe -lpthread -o thread thread.c
> >
> > $ ./thread
> > 4
> > child
> > 3
> > 2
> > 1
> > 0
> > child thread did not return within 5s
My reading of the POSIX standard is that our implementation of
pthread_kill(3) is correct and that the program's expectations are
wrong.
POSIX says that in the "General Information" section on threads:
The lifetime of a thread ID ends after the thread terminates if it
was created with the detachstate attribute set to
PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has
been called for that thread.
At the point where the program calls pthread_kill(), pthread_join()
has not been called yet. So the thread ID is still "alive".
The pthread_exit() page says:
As in kill(), if sig is zero, error checking shall be performed but
no signal shall actually be sent.
The only "shall fail" is EINVAL for passing a bogus signal number.
The "informative" RATIONALE section mentions an additional error
condition:
If an implementation detects use of a thread ID after the end of its
lifetime, it is recommended that the function should fail and report
an [ESRCH] error.
But since the thread ID is still "alive", that doesn't apply.
Also note that POSIX explicitly mentions kill() here. And for kill()
POSIX says (again in the RATIONALE section):
Existing implementations vary on the result of a kill() with pid
indicating an inactive process (a terminated process that has not
been waited for by its parent). Some indicate success on such a call
(subject to permission checking), while others give an error of
[ESRCH]. Since the definition of process lifetime in this volume of
POSIX.1-2008 covers inactive processes, the [ESRCH] error as
described is inappropriate in this case. In particular, this means
that an application cannot have a parent process check for
termination of a particular child with kill(). (Usually this is done
with the null signal; this can be done reliably with waitpid().)
Which strongly suggests that using pthread_kill(..., 0) on a
non-detached, unjoined thread should not return ESRCH.
> #include <sys/stat.h>
> #include <errno.h>
> #include <stdlib.h>
> #include <pthread.h>
> #include <stdio.h>
>
> void do_something();
> void *child(void *);
>
> int
> main(int argc, char **argv)
> {
> do_something();
> }
>
> void do_something()
> {
> pthread_t stat_thread;
> int done = 0;
> int timer = 5;
> struct timespec req, rem;
>
> req.tv_sec = 0;
> pthread_create(&stat_thread, NULL, child, NULL);
> while (timer-- > 0) {
> printf("%u\n", timer);
> req.tv_nsec = 10000000;
> nanosleep(&req, &rem);
> if (pthread_kill(stat_thread, 0)) {
> done = 1;
> break;
> } else {
> printf("e %u\n", errno);
> req.tv_nsec = 990000000;
> nanosleep(&req, &rem);
> }
> }
> if (done == 1) {
> pthread_join(stat_thread, NULL);
> } else {
> pthread_detach(stat_thread);
> printf("child thread did not return within 5s\n");
> }
> }
>
> void *child(void *in)
> {
> printf("child\n");
> }