Serious WAPL performance problems

Edgar Fuß Tue, 23 Oct 2012 09:51:32 -0700

We are facing some very serious file system performance problems on 6.0 which 
we attribute to WAPL. Comparable 4.0.1 machines with softdep are performing 
much, much better. Having essentially skipped 5, I cannot easily compare log 
to softdep on identical hardware.


The most prominent way to trigger the problem is running an svn update command 
on a certain repository (having a large number of files) with the working copy 
mounted over NFS. This will stall the file server's discs to the point where 
you get "NFS server not responding, still trying" messages.
Tracing that svn update (both ktrace and tcpdump) reveals the unusual thing it 
does ist creating some 2,500 .lock files scattered around the directory tree 
only to unlink all of them just seconds later.
If you run that command with the working copy on a local (WAPL) file system, 
it finishes in under 2 seconds, but running iostat shows that some seconds 
later, the disc (actually a RAID) the fs holding the wc is on is 100% busy for 
18 seconds.
If you access the same working copy over NFS, the update takes 20 to 30 
seconds. During that period, the discs are initially silent for 5-10 seconds, 
then 100% busy for 8-15 seconds, then silent for 5-7 seconds, busy for 5-10s, 
silent for 7-9s, busy for 17s. In case you didn't add the times: that too 
extends to after the command has finished.
Running the same command on a 4.0.1 system with the wc on a (local, I didn't 
try NFS) fs with softdeps, it also takes under 2 seconds, but after that, the 
discs are completely silent save a two-second period some ten seconds later.
There are similar issues (again, on 6 but not on 4) with svn checkout or a 
rm -rf of the wc.

How to debug/analyze/tune this? While we can move our svn working copies from 
NFS to local storage, this sounds like a problem that can hit other users, too.

Btw, PenguinOS's logging seems also not to have this issue: Having the wc on an 
ext3 fs also makes the disc busy for just a second or two.

Serious WAPL performance problems

Reply via email to