I'm exploring the possibility of using CTDB to synchronize mandatory file lock states across a 9-server Samba cluster that re-exports a set of about 20 (large-ish) NFS v3 mount points. Yes, I know the Samba team highly discourages re-exporting anything, including NFS mounts, but they also keep republishing old NFS information found in old Samba guides in the current Samba 3.5.0 guides. But I digress.
I ran the ping_pong.c test against NFS and got much lower file locking performance than I thought I would. Specifically, I got almost 4000 locks/sec from a single client over gigabit ethernet. This seems really slow compared to 1.4M locks/sec from a local filesystem and 400k locks/sec if mounted using the NFS client-side-only nolock option. When I add the second ping_pong locking client I get between 0 and 40 locks/sec at best, but usually both clients hang indefinitely (or at least until one of them is killed). Before anyone suggests it, yes I know I can get more better Samba throughput, latency and locking performance by putting Samba on the file server itself; but I don't for the following reasons: - I would rather reboot Samba separately when it decides to blow chunks in memory, - I can handle a 30 second reconnect timeout interval for a few users much easier than a full 15 minute fsck repair for a lot of users, - Samba updates are easier to test and less disruptive this way, - Samba can't handle nearly the number of active, concurrent users NFS can, - Getting the resulting active users/server ratio down is easily accomplished by adding more Samba servers to my Samba cluster (as opposed to splitting off yet another file server and disk space just because one server can't manage all the active, chatty SMB sessions alone), - The existing throughput is 4 times faster than what most of the windows clients have available, and about 40 times faster than any of them actually use at peak demand. Increasing that to 5 times faster isn't likely to help them when Samba appears to be a processor- and memory-bound application. - This way, I don't have to run DFS to have a unified file hierarchy. I still suffer from last-write-wins problems like DFS with NT-FSR/DFS-R (and which CTDB might resolve anyway), but I can still scale-out without replication delays and issues. I think improving the back-end NFS locking performance will have the greatest over-all benefit for the windows clients. So has anyone had to improve NFS' lockd performance? Is it possible, or is lockd the major driving reason for clustered filesystems and/or parallel-NFS development? And does anyone have any experience with CTDB, good or bad? Grazie, Daniel Fussell -------------------- BYU Unix Users Group http://uug.byu.edu/ The opinions expressed in this message are the responsibility of their author. They are not endorsed by BYU, the BYU CS Department or BYU-UUG. ___________________________________________________________________ List Info (unsubscribe here): http://uug.byu.edu/mailman/listinfo/uug-list
