Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Mouse
>> It's worth reminding that -o tcp is an option. > Not for NFS through a (stateful) filtering router, no. True. But then, not over network hops that drop port 2049, either. Break the assumptions underlying the 'net and you have to expect breakage from stuff built atop it. /~\ The ASCII

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-08-01 Thread Hauke Fath
At 10:45 Uhr +0200 31.07.2019, Edgar Fuß wrote: >Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS >mount combined with a mis-configured IPFilter that blocked all but the >first fragment of a fragmented NFS reply (e.g., readdir) combined with a >NetBSD design error (or so

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya
On Wed, Jul 31, 2019 at 07:11:54AM -0700, Jason Thorpe wrote: > > > On Jul 31, 2019, at 1:45 AM, Edgar Fuß wrote: > > > > NetBSD design error (or so Taylor says) that a vnode lock may be held > > accross I/O > > 100% > > NetBSD's VFS locking protocol needs a serious overhaul. At least one

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread maya
On Wed, Jul 31, 2019 at 11:42:26AM -0500, Don Lee wrote: > If you go back a few years, you can find a thread where I reported tstile > lockups on PPC. I don’t remember the details, but it was back in 6.1 as I > recall. This is not a new problem, and not limited to NFS. I still have a > similar

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Don Lee
If you go back a few years, you can find a thread where I reported tstile lockups on PPC. I don’t remember the details, but it was back in 6.1 as I recall. This is not a new problem, and not limited to NFS. I still have a similar problem with my 7.2 system, usually triggered when I do backups

Re: NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Jason Thorpe
> On Jul 31, 2019, at 1:45 AM, Edgar Fuß wrote: > > NetBSD design error (or so Taylor says) that a vnode lock may be held accross > I/O 100% NetBSD's VFS locking protocol needs a serious overhaul. At least one other BSD-family VFS (the one in XNU) completely eliminated locking of vnodes

NFS lockup after UDP fragments getting lost (was: 8.1 tstile lockup after nfs send error 51)

2019-07-31 Thread Edgar Fuß
Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) HFS mount combined with a mis-configured IPFilter that blocked all but the first fragment of a fragmented NFS reply (e.g., readdir) combined with a NetBSD design error (or so Taylor says) that a vnode lock may be held accross

Re: 8.1 tstile lockup after nfs send error 51

2019-07-29 Thread Edgar Fuß
Here are stack traces of all the frozen processes (with a few newlines inserted manually): Crash version 8.1_STABLE, image version 8.1_STABLE. Output from a running system is unreliable. crash> trace/t 0t16306 trace: pid 16306 lid 1 at 0x8001578daa20 sleepq_block() at sleepq_block+0x97

8.1 tstile lockup after nfs send error 51

2019-07-29 Thread Edgar Fuß
I experiened an "nfs send error 51" on an NFS-imported file system, and after that, any process accessing that FS seems to be frozen in tstile. Any way out short or re-booting? Anything to analyze before?

Re: tstile lockup

2012-11-27 Thread Lars Heidieker
On 11/23/2012 05:06 PM, Edgar Fuß wrote: Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more information on lock congestion. Ouch! I overlooked this post of yours until a colleague asked me about it. Elapsed time: 18.33 seconds. -- RW lock sleep (reader)

Re: tstile lockup

2012-11-24 Thread haad
Can you use addr2line with that wapbl address to find out whal line is it. On Nov 23, 2012 5:06 PM, Edgar Fuß e...@math.uni-bonn.de wrote: Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more information on lock congestion. Ouch! I overlooked this post of yours

Re: tstile lockup

2012-11-23 Thread Edgar Fuß
Try running `svn ...' as `lockstat -T rwlock svn ...'. By chance we get more information on lock congestion. Ouch! I overlooked this post of yours until a colleague asked me about it. Elapsed time: 18.33 seconds. -- RW lock sleep (reader) Total% Count Time/ms Lock

Re: tstile lockup

2012-11-19 Thread Edgar Fuß
On Wed, Oct 31, 2012 at 05:42:12PM +0100, Edgar Fuß wrote: Invoke crash(8), then just perform ps and t/a address on each LWP which seems to be stuck (on tstile or elsewhere). So it seems I can sort of lock up the machine for minutes with a simple dd if=/dev/zero of=/dev/dk14 bs=64k

Re: tstile lockup

2012-11-19 Thread Martin Husemann
On Mon, Nov 19, 2012 at 12:31:47PM +0100, Edgar Fuß wrote: The problem is that this lock-up, artificial as the dd to the block device may seem, appears to happen real-world during an svn update command: the other nfsd threads get stuck to the point where other clients get nfs server not

Re: tstile lockup

2012-11-19 Thread Edgar Fuß
Why do you think both lockups are related? Because the real world problem also involves large amounts of metadata being written and also results in nfsd's stuck in tstile. Should I try to get crash(8) outputs of the real world situation?

Re: tstile lockup

2012-11-19 Thread Martin Husemann
On Mon, Nov 19, 2012 at 12:59:13PM +0100, Edgar Fuß wrote: Should I try to get crash(8) outputs of the real world situation? I guess that would be good - even if only to verify this is related or not. Martin

Re: tstile lockup

2012-11-19 Thread Edgar Fuß
OK, this is the svn process (directly running on the file server, not operating via NFS) tstile-ing: crash ps | grep \(vnode\|tstile\) 250511 3 0 0 fe82ec17d200svn tstile crash t/a fe82ec17d200 trace: pid 25051 lid 1 at 0xfe811e901700 sleepq_block() at

Re: tstile lockup

2012-11-19 Thread J. Hannken-Illjes
On Nov 19, 2012, at 4:53 PM, Edgar Fuß e...@math.uni-bonn.de wrote: OK, this is the svn process (directly running on the file server, not operating via NFS) tstile-ing: crash ps | grep \(vnode\|tstile\) 250511 3 0 0 fe82ec17d200svn tstile crash t/a

Re: tstile lockup

2012-11-19 Thread Edgar Fuß
Do you get a deadlock No. will the system come back to work after some time? Yes. At least for appropriate values of some time. This may take minutes (at least in the dd case, I haven't seen this in the svn case).

Re: tstile lockup

2012-11-19 Thread J. Hannken-Illjes
On Nov 19, 2012, at 6:40 PM, Edgar Fuß e...@math.uni-bonn.de wrote: Do you get a deadlock No. will the system come back to work after some time? Yes. At least for appropriate values of some time. This may take minutes (at least in the dd case, I haven't seen this in the svn case). Try

RAIDframe level 5 write performance(was: tstile lockup)

2012-11-02 Thread Edgar Fuß
There seems to be a fundamental problem with writing to a level 5 RAIDframe set, at least to the block device. I've created five small wedges in the spared-out region of my 3TB SAS discs. In case it matters, they are connected to an mpt(4) controller. Then I configured a 5-component, 32-SpSU,

Re: RAIDframe level 5 write performance(was: tstile lockup)

2012-11-02 Thread Thor Lancelot Simon
On Fri, Nov 02, 2012 at 06:02:01PM +0100, Edgar Fu? wrote: Writing to that RAID's block device (raid2d) in 64k blocks gives me a dazzling troughput of 2.4MB/s and a dd mostly waiting in vnode. Writing to the block device from userspace is not a good idea. How is performance through the

tstile lockup (was: Serious WAPL performance problems)

2012-10-31 Thread Edgar Fuß
Invoke crash(8), then just perform ps and t/a address on each LWP which seems to be stuck (on tstile or elsewhere). So it seems I can sort of lock up the machine for minutes with a simple dd if=/dev/zero of=/dev/dk14 bs=64k count=1000 (In case it matters, dk14 is on a RAID5 on 4+1 mpt(4)