On 07/26/2018 07:01 AM, Brian J. Murrell wrote:
> On Thu, 2018-07-26 at 05:48 +0200, Matt Darfeuille wrote:
>>
>> As illustrated by this lingering thread, issues that are only present
>> on
>> one platform makes me moved away from OpenWRT/LEDE.
>
> The platform is not the problem. The platform is just providing the
> tools.
>
> Or are you suggesting that the "lock" tool on OpenWRT/LEDE is actually
> buggy? Given that it's just a wrapper around flock() that seems
> unlikely. But I'm happy to be proven wrong if you can provide a
> reproducer for the bug that I can submit upstream. As much testing as
> I have done with the "lock" tool it operates as expected when used as
> expected.
>
> Given the evidence, it seems like the file being locked is getting
> removed before the lock is released.
>
> A reboot of my router this morning has reproduced the situation and
> this is what I see:
>
> # ps -ef | grep lock
> root 2700 2666 0 07:13 ? 00:00:00 lock
> /etc/shorewall-lite/state/lock
> root 3234 1 0 07:13 ? 00:00:00 lock
> /etc/shorewall-lite/state/lock
>
> # lsof -n -p 3234
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> lock 3234 root cwd DIR 0,15 656 258 /
> lock 3234 root rtd DIR 0,15 656 258 /
> lock 3234 root txt REG 254,0 308533 1786 /bin/busybox
> lock 3234 root mem REG 254,0 77040 213 /lib/libgcc_s.so.1
> lock 3234 root mem REG 254,0 601968 402 /lib/libc.so
> lock 3234 root 0u CHR 1,3 0t0 317 /dev/null
> lock 3234 root 1u CHR 1,3 0t0 317 /dev/null
> lock 3234 root 2u CHR 1,3 0t0 317 /dev/null
> lock 3234 root 3u REG 0,14 5 61617
> /etc/shorewall-lite/state/lock (deleted)
> lock 3234 root 13w FIFO 0,8 0t0 1732 pipe
>
> # cat /proc/2700/fd/3
> 3234
>
> # strace -f -p 3234
> strace: Process 3234 attached
> restart_syscall(<... resuming interrupted syscall_516 ...>) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, ^Cstrace: Process 3234 detached
> <detached ...>
>
> # strace -f -p 2700
> strace: Process 2700 attached
> flock(3, LOCK_EX^Cstrace: Process 2700 detached
> <detached ...>
>
> Hrm. Given:
>
> g_havemutex="lock -u ${lockf} && rm -f ${lockf}"
>
> Observe this particular set of operations:
>
> tty1# lock /tmp/mylockfile
> tty1# [has the lock and returns]
> tty2# lock /tmp/mylockfile
> [blocks waiting for locker1 to release the lock as we can see:]
> # lsof | grep /tmp/mylockfile
> lock 1249 root 3u REG 0,13 5 352778
> /tmp/mylockfile
> lock 1250 root 3u REG 0,13 5 352778
> /tmp/mylockfile
> tty1# lock -u /tmp/mylockfile && rm -f /tmp/mylockfile
> tty1# [returns, releasing the lock to tty2]
> tty2# [returns from blocked state, now holds the lock]
> # lsof | grep /tmp/mylockfile
> lock 1404 root 3u REG 0,13 5 352778
> /tmp/mylockfile (deleted)
> tty3# lock /tmp/mylockfile
> tty3# [wait, what? it returns even though tty2 has the lock!]
> # lsof | grep /tmp/mylockfile
> lock 1404 root 3u REG 0,13 5 352778
> /tmp/mylockfile (deleted)
> lock 1439 root 3u REG 0,13 5 362181
> /tmp/mylockfile
>
> So at this point both tty2 and tty3 believe they have the lock and have
> returned, allowing them to do their work on top of each other.
>
> I don't think a process can simply remove the lock file just because it
> has released it's lock on it. It can only be removed if there are no
> more outstanding locks on it. Or just don't remove it. lock seems to
> function perfectly fine with the file pre-existing.
>
> I'm not sure I can draw a line from this problem to the stale locks
> problem, but it's probably a good thing to fix before continuing to try
> to debug the stale locks problem.
> Brian, Can you point me to online documentation that describes how this 'lock' utility is supposed to work? Thanks, -Tom -- Tom Eastep \ Q: What do you get when you cross a mobster with Shoreline, \ an international standard? Washington, USA \ A: Someone who makes you an offer you can't http://shorewall.org \ understand \_______________________________________________
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Shorewall-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/shorewall-users
