On 07/26/2018 07:01 AM, Brian J. Murrell wrote:
> On Thu, 2018-07-26 at 05:48 +0200, Matt Darfeuille wrote:
>>
>> As illustrated by this lingering thread, issues that are only present
>> on
>> one platform makes me moved away from OpenWRT/LEDE.
> 
> The platform is not the problem.  The platform is just providing the
> tools.
> 
> Or are you suggesting that the "lock" tool on OpenWRT/LEDE is actually
> buggy?  Given that it's just a wrapper around flock() that seems
> unlikely.  But I'm happy to be proven wrong if you can provide a
> reproducer for the bug that I can submit upstream.  As much testing as
> I have done with the "lock" tool it operates as expected when used as
> expected.
> 
> Given the evidence, it seems like the file being locked is getting
> removed before the lock is released.
> 
> A reboot of my router this morning has reproduced the situation and
> this is what I see:
> 
> # ps -ef | grep lock
> root      2700  2666  0 07:13 ?        00:00:00 lock 
> /etc/shorewall-lite/state/lock
> root      3234     1  0 07:13 ?        00:00:00 lock 
> /etc/shorewall-lite/state/lock
> 
> # lsof -n -p 3234
> COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
> lock    3234 root  cwd    DIR   0,15      656   258 /
> lock    3234 root  rtd    DIR   0,15      656   258 /
> lock    3234 root  txt    REG  254,0   308533  1786 /bin/busybox
> lock    3234 root  mem    REG  254,0    77040   213 /lib/libgcc_s.so.1
> lock    3234 root  mem    REG  254,0   601968   402 /lib/libc.so
> lock    3234 root    0u   CHR    1,3      0t0   317 /dev/null
> lock    3234 root    1u   CHR    1,3      0t0   317 /dev/null
> lock    3234 root    2u   CHR    1,3      0t0   317 /dev/null
> lock    3234 root    3u   REG   0,14        5 61617 
> /etc/shorewall-lite/state/lock (deleted)
> lock    3234 root   13w  FIFO    0,8      0t0  1732 pipe
> 
> # cat /proc/2700/fd/3
> 3234
> 
> # strace -f -p 3234
> strace: Process 3234 attached
> restart_syscall(<... resuming interrupted syscall_516 ...>) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcd900) = 0
> nanosleep({tv_sec=1, tv_nsec=0}, ^Cstrace: Process 3234 detached
>  <detached ...>
> 
> # strace -f -p 2700
> strace: Process 2700 attached
> flock(3, LOCK_EX^Cstrace: Process 2700 detached
>  <detached ...>
> 
> Hrm.  Given:
> 
>           g_havemutex="lock -u ${lockf} && rm -f ${lockf}"
> 
> Observe this particular set of operations:
> 
> tty1# lock /tmp/mylockfile
> tty1# [has the lock and returns]
> tty2# lock /tmp/mylockfile
> [blocks waiting for locker1 to release the lock as we can see:]
> # lsof | grep /tmp/mylockfile 
> lock       1249    root    3u      REG       0,13        5     352778 
> /tmp/mylockfile
> lock       1250    root    3u      REG       0,13        5     352778 
> /tmp/mylockfile
> tty1# lock -u /tmp/mylockfile && rm -f /tmp/mylockfile
> tty1# [returns, releasing the lock to tty2]
> tty2# [returns from blocked state, now holds the lock]
> # lsof | grep /tmp/mylockfile 
> lock       1404    root    3u      REG       0,13        5     352778 
> /tmp/mylockfile (deleted)
> tty3# lock /tmp/mylockfile 
> tty3# [wait, what?  it returns even though tty2 has the lock!]
> # lsof | grep /tmp/mylockfile 
> lock       1404    root    3u      REG       0,13        5     352778 
> /tmp/mylockfile (deleted)
> lock       1439    root    3u      REG       0,13        5     362181 
> /tmp/mylockfile
> 
> So at this point both tty2 and tty3 believe they have the lock and have
> returned, allowing them to do their work on top of each other.
> 
> I don't think a process can simply remove the lock file just because it
> has released it's lock on it.  It can only be removed if there are no
> more outstanding locks on it.  Or just don't remove it.  lock seems to
> function perfectly fine with the file pre-existing.
> 
> I'm not sure I can draw a line from this problem to the stale locks
> problem, but it's probably a good thing to fix before continuing to try
> to debug the stale locks problem.
> 

Brian,

Can you point me to online documentation that describes how this 'lock'
utility is supposed to work?

Thanks,
-Tom

-- 
Tom Eastep        \   Q: What do you get when you cross a mobster with
Shoreline,         \     an international standard?
Washington, USA     \ A: Someone who makes you an offer you can't
http://shorewall.org \   understand
                      \_______________________________________________

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Shorewall-users mailing list
Shorewall-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shorewall-users

Reply via email to