Chad Leigh -- Shire.Net LLC wrote:
I have set up a Solaris 10 U2 06/06 system that has basic patches to the latest -19 kernel patch and latest zfs genesis etc as recommended. I have set up a basic pool (local) and a bunch of sub- pools (local/mail, local/mail/shire.net, local/mail/shire.net/o, local/jailextras/shire.net/irsfl, etc). I am exporting these with [EMAIL PROTECTED],[EMAIL PROTECTED] and then mounting a few of these pools on a FreeBSD system using nfsv3.The FreeBSD has about 4 of my 10 or so subpools mounted. 2 are email imap account tests, 1 is generic storage, and one is a FreeBSD jail root. FreeBSD mounts them with using TCP/sbin/mount_nfs -s -i -3 -T foo-i1:/local/mail/shire.net/o/obar / local/2/hobbiton/local/mail/shire.net/o/obarThe systems are both directly connected to a gigabit switch using 1000btx-fdx and both have an MTU set at 9000. The Solaris side is an e1000g port (the system has 2 bge and 2 e1000g ports all configured) and the FreeBSD is a bge port.etc.I have heard that there are some ZFS/NFS sync performance problems etc that will be fixed in U3 or are fixed in OpenSolaris. I do not think my issue is related to that. I have also seen some of that with sometimes having pisspoor performance on writing.I have experienced the following issue several times since I started experimenting with this a few days ago. I periodically will get NFS server not responding errors on the FreeBSD machine for one of the mounted pools, and it will last 4-8 minutes or so and then come alive again and be fine for many hours. When this happens, access to the other mounted pools still works fine and logged directly in to the Solaris machine I am able to access the file systems (pools) just fine.Example error message:Sep 24 03:09:44 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not responding Sep 24 03:10:15 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not respondingSep 24 03:12:19 freebsdclient last message repeated 4 times Sep 24 03:14:54 freebsdclient last message repeated 5 timesI would be interested in getting feedback on what might be the problem and also ways to track this down etc. Is this a know issue? Have others seen the nfs server sharing ZFS time out (but not for all pools)? Etc.
Offline someone suggested I put some snoop files up so people can look at them
The following is a snoop file created on the Solaris ZFS server. The incident happens at the very end.
http://www.shire.net/snoop-solzfs.txt.gzThe following is a tcpdump on the BSD nfs client. The system clocks are pretty much in sync so it should be easy to find the location in this file to match up. I am not sure about the various checksum issues tcpdump shows but I think they are a tcpdump artifact as they are throughout the file and not associated with the issue I am having
http://www.shire.net/bsd-tcpdump.txt.gzThe system log messages that show the issue are: (help you find the location in the snoop file) (this was a short lived one or I caught it in the middle of an episode when I did a "df -h" on the client.
Sep 25 21:11:59 bywater kernel: nfs server bagend-i1:/local/ jailextras/shire.net/irsfl: not responding Sep 25 21:12:02 bywater kernel: nfs server bagend-i1:/local/ jailextras/shire.net/irsfl: is alive again
--- Chad Leigh -- Shire.Net LLC Your Web App and Email hosting provider chad at shire.net
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss