hi all,

the following is on openbsd 4.5, on i386.  i recently had a problem with
a program that seemed to be hanging.  top showed procs many procs in
the "fdlock" state, which i hadn't seen before.  the program that was
blocking was inferno.  inferno uses rfork (inferno is closely related
to plan 9) to create multiple procs to handle blocking i/o, with shared
file descriptor groups.  after inspection with ddb(4), it seemed the proc
that was holding the fdlock was busy closing a socket with SO_LINGER set,
sleeping (with fd_lock held) in /sys/kern/uipc_socket.c:/^soclose.

i'm not very familiar with openbsd code, so my analysis was a bit
fuzzy, but this is what i came to:

/sys/kern/kern_descrip.c:/^sys_close does the following:

        fdplock(fdp);
        error = fdrelease(p, fd);
        fdpunlock(fdp);

with fdplock:
        /usr/include/sys/filedesc.h:#define     fdplock(fdp)    
rw_enter_write(&(fdp)->fd_lock)

/sys/kern/kern_descrip.c:/^fdrelease calls /sys/kern/kern_descrip.c:/^closef.
closef() calls the struct file's f_ops' close() method,
which is /sys/kern/kern_descrip.c:/^soo_close,
which calls /sys/kern/uipc_socket.c:/^soclose,
which seems to be able to sleep when option SO_LINGER is set.
still with the fd_lock held.

other procs in the same fd group will now block when they try to
lock the fd_lock, e.g. in sys_open (or any other fd slot operation).

i had a quick look at other bsd's and linux' code, they seem to do
the closing separately from the fd slot operations.

i no longer use so_linger, so this particular problem is not
bothering me any more.

if the above is really what is happening, perhaps rthreads will run
into the same problem sooner or later?  perhaps other close() calls can
also sleep?  or other operations that hold fd_lock?

best regards,
mjl

Reply via email to