On Sat, 12 May 2012, Edgar Fu? wrote: > > In general it won't access just one filesystem block. > > It will try to readahead 64KB > Oh, so this declustering seems to make matters even more > complicated^Winteresting. > > Staying with my example of a 16K fsbsize FFS on a 4+1 disc Level 5 > RAIDframe with a stripe size of 4*16k=64k: > > Suppose a process does something that could immediately be satisfied > by reading one fs block (probably it matters whether that's a small file, > a small portion of a large file, a small directory, a portion of a large > directory, inodes, free list or whatever?). Now, if that, as I understand, > always causes FFS to in fact issue a 64k request to RAIDframe, this would > need to read a full stripe and so need all but one disc. So it can't be > parallelised with another process' request, can it? Does this mean I'm better > off with a stripe size of 4*64k if I'm after low latency for concurrent > access?
The problem here is NFS, which requires writes to be persistent before returning status to the caller. Under normal operation, ufs will attempt to use the buffer cache in the most efficient manner, doing readahead and delaying writes as much as possible to be able to do maximize the number of clustered operations as it can. Now if NFS does not do similar clustering on writes (I don't know NFS that well, especially V3 and V4 which allegedly have write optimizations) then you get the situation where the underlying ufs will try to cluster reads (satisfying reads out of the buffer cache is much faster than hitting the platters) but write out only single filesystem blocks (to satisfy the NFS consistency requirements.) My understanding is that later versions of NFS (v3+) have a mechanism for the client side to request writes without the consistency guarantee and a separate explicit sync operation. But using those is the responsibility of the NFS client machine. Of course, if all the files are on the order of one filesystem block, clustering won't happen at all. I think you should attempt to characterize your workload here to determine the size of the I/O operations the clients are requesting so you can decide if clustering is a benefit to you, and if not, turn it off. (I think it can be tweaked with tunefs(8).) Eduardo