On Oct 23, 6:51pm, Edgar =?iso-8859-1?B?RnXf?= wrote: } Subject: Serious WAPL performance problems } We are facing some very serious file system performance problems on 6.0 which } we attribute to WAPL. Comparable 4.0.1 machines with softdep are performing } much, much better. Having essentially skipped 5, I cannot easily compare log } to softdep on identical hardware. } } The most prominent way to trigger the problem is running an svn update command } on a certain repository (having a large number of files) with the working copy } mounted over NFS. This will stall the file server's discs to the point where } you get "NFS server not responding, still trying" messages. } Tracing that svn update (both ktrace and tcpdump) reveals the unusual thing it } does ist creating some 2,500 .lock files scattered around the directory tree } only to unlink all of them just seconds later. } If you run that command with the working copy on a local (WAPL) file system, } it finishes in under 2 seconds, but running iostat shows that some seconds } later, the disc (actually a RAID) the fs holding the wc is on is 100% busy for } 18 seconds. } If you access the same working copy over NFS, the update takes 20 to 30 } seconds. During that period, the discs are initially silent for 5-10 seconds, } then 100% busy for 8-15 seconds, then silent for 5-7 seconds, busy for 5-10s, } silent for 7-9s, busy for 17s. In case you didn't add the times: that too } extends to after the command has finished. } Running the same command on a 4.0.1 system with the wc on a (local, I didn't } try NFS) fs with softdeps, it also takes under 2 seconds, but after that, the } discs are completely silent save a two-second period some ten seconds later. } There are similar issues (again, on 6 but not on 4) with svn checkout or a } rm -rf of the wc. } } How to debug/analyze/tune this? While we can move our svn working copies from } NFS to local storage, this sounds like a problem that can hit other users, too. } } Btw, PenguinOS's logging seems also not to have this issue: Having the wc on an } ext3 fs also makes the disc busy for just a second or two. >-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
Hello. If possible, I suggest trying the latest 5.1 sources, which contain the namei fixes David Hollan put into NetBSD-6 as well as allowing you to compare WAPBL and softdep performance directly. Having said that, is it possible for you to get the output of ps -lax on the NFS server during the 18-20 second window of complete busyness? Perhaps that will tell us why it is that NFS processing ceases while all of the logs are being played and written to disk. -thanks -Brian