Re: [Shorewall-users] locking processes left behind

Brian J. Murrell Thu, 26 Jul 2018 07:03:01 -0700

On Thu, 2018-07-26 at 05:48 +0200, Matt Darfeuille wrote:
> 
> As illustrated by this lingering thread, issues that are only present
> on
> one platform makes me moved away from OpenWRT/LEDE.


The platform is not the problem.  The platform is just providing the
tools.

Or are you suggesting that the "lock" tool on OpenWRT/LEDE is actually
buggy?  Given that it's just a wrapper around flock() that seems
unlikely.  But I'm happy to be proven wrong if you can provide a
reproducer for the bug that I can submit upstream.  As much testing as
I have done with the "lock" tool it operates as expected when used as
expected.

Given the evidence, it seems like the file being locked is getting
removed before the lock is released.

A reboot of my router this morning has reproduced the situation and
this is what I see:

# ps -ef | grep lock
root      2700  2666  0 07:13 ?        00:00:00 lock 
/etc/shorewall-lite/state/lock
root      3234     1  0 07:13 ?        00:00:00 lock 
/etc/shorewall-lite/state/lock

# lsof -n -p 3234
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
lock    3234 root  cwd    DIR   0,15      656   258 /
lock    3234 root  rtd    DIR   0,15      656   258 /
lock    3234 root  txt    REG  254,0   308533  1786 /bin/busybox
lock    3234 root  mem    REG  254,0    77040   213 /lib/libgcc_s.so.1
lock    3234 root  mem    REG  254,0   601968   402 /lib/libc.so
lock    3234 root    0u   CHR    1,3      0t0   317 /dev/null
lock    3234 root    1u   CHR    1,3      0t0   317 /dev/null
lock    3234 root    2u   CHR    1,3      0t0   317 /dev/null
lock    3234 root    3u   REG   0,14        5 61617 
/etc/shorewall-lite/state/lock (deleted)
lock    3234 root   13w  FIFO    0,8      0t0  1732 pipe

# cat /proc/2700/fd/3
3234

# strace -f -p 3234
strace: Process 3234 attached
restart_syscall(<... resuming interrupted syscall_516 ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
nanosleep({tv_sec=1, tv_nsec=0}, ^Cstrace: Process 3234 detached
 <detached ...>

# strace -f -p 2700
strace: Process 2700 attached
flock(3, LOCK_EX^Cstrace: Process 2700 detached
 <detached ...>

Hrm.  Given:

            g_havemutex="lock -u ${lockf} && rm -f ${lockf}"

Observe this particular set of operations:

tty1# lock /tmp/mylockfile
tty1# [has the lock and returns]
tty2# lock /tmp/mylockfile
[blocks waiting for locker1 to release the lock as we can see:]
# lsof | grep /tmp/mylockfile 
lock       1249    root    3u      REG       0,13        5     352778 
/tmp/mylockfile
lock       1250    root    3u      REG       0,13        5     352778 
/tmp/mylockfile
tty1# lock -u /tmp/mylockfile && rm -f /tmp/mylockfile
tty1# [returns, releasing the lock to tty2]
tty2# [returns from blocked state, now holds the lock]
# lsof | grep /tmp/mylockfile 
lock       1404    root    3u      REG       0,13        5     352778 
/tmp/mylockfile (deleted)
tty3# lock /tmp/mylockfile 
tty3# [wait, what?  it returns even though tty2 has the lock!]
# lsof | grep /tmp/mylockfile 
lock       1404    root    3u      REG       0,13        5     352778 
/tmp/mylockfile (deleted)
lock       1439    root    3u      REG       0,13        5     362181 
/tmp/mylockfile

So at this point both tty2 and tty3 believe they have the lock and have
returned, allowing them to do their work on top of each other.

I don't think a process can simply remove the lock file just because it
has released it's lock on it.  It can only be removed if there are no
more outstanding locks on it.  Or just don't remove it.  lock seems to
function perfectly fine with the file pre-existing.

I'm not sure I can draw a line from this problem to the stale locks
problem, but it's probably a good thing to fix before continuing to try
to debug the stale locks problem.

Cheers,
b.

signature.asc
Description: This is a digitally signed message part

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Shorewall-users mailing list
Shorewall-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shorewall-users

Re: [Shorewall-users] locking processes left behind

Reply via email to