On Sep 25, 2006, at 12:18 PM, eric kustarz wrote:

Chad Leigh wrote:

I have set up a Solaris 10 U2 06/06 system that has basic patches to the latest -19 kernel patch and latest zfs genesis etc as recommended. I have set up a basic pool (local) and a bunch of sub-pools (local/mail, local/mail/shire.net, local/mail/shire.net/ o, local/jailextras/shire.net/irsfl, etc). I am exporting these with [EMAIL PROTECTED],[EMAIL PROTECTED] and then mounting a few of these pools on a FreeBSD system using nfsv3.

The FreeBSD has about 4 of my 10 or so subpools mounted. 2 are email imap account tests, 1 is generic storage, and one is a FreeBSD jail root. FreeBSD mounts them with using TCP

/sbin/mount_nfs -s -i -3 -T foo-i1:/local/mail/shire.net/o/obar / local/2/hobbiton/local/mail/shire.net/o/obar

The systems are both directly connected to a gigabit switch using 1000btx-fdx and both have an MTU set at 9000. The Solaris side is an e1000g port (the system has 2 bge and 2 e1000g ports all configured) and the FreeBSD is a bge port.

etc.

I have heard that there are some ZFS/NFS sync performance problems etc that will be fixed in U3 or are fixed in OpenSolaris. I do not think my issue is related to that. I have also seen some of that with sometimes having pisspoor performance on writing.

I have experienced the following issue several times since I started experimenting with this a few days ago. I periodically will get NFS server not responding errors on the FreeBSD machine for one of the mounted pools, and it will last 4-8 minutes or so and then come alive again and be fine for many hours. When this happens, access to the other mounted pools still works fine and logged directly in to the Solaris machine I am able to access the file systems (pools) just fine.

Example error message:

Sep 24 03:09:44 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not responding Sep 24 03:10:15 freebsdclient kernel: nfs server solzfs-i1:/local/ jailextras/shire.net/irsfl: not responding
Sep 24 03:12:19 freebsdclient last message repeated 4 times
Sep 24 03:14:54 freebsdclient last message repeated 5 times

I would be interested in getting feedback on what might be the problem and also ways to track this down etc. Is this a know issue? Have others seen the nfs server sharing ZFS time out (but not for all pools)? Etc.


Could be lots of things - network partition, bad hardware, overloaded server, bad routers, etc.

What's the server's load like (vmstat, prstat)? If you're banging on the server too hard and using up the server's resources then nfsd may not be able to respond to your client's requests.

The server is not doing anything except this ZFS / NFS serving and only 1 client is attached to it (the one with the problems). prstat shows a load of 0.00 continually and vmstat is typically like

# vmstat
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s1 s2 -- -- in sy cs us sy id 0 0 0 10640580 691412 0 1 0 0 0 0 2 0 11 0 0 421 85 120 0 0 100
#



You can also grab a snoop trace to see what packets are not being responded too?

If I can catch it happening. Most of the time I am not around and I just see it in the logs. Sometimes it happens when I do a "df -h" on the client for example.


What are clients and local apps doing to the machine?

Almost nothing. No local apps are running on the server. It is basically just doing ZFS and NFS.

The client has 4 mounts from ZFS, all of them very low usage. 2 email accounts storage (imap maildir) are mounted for testing. Each receives 10-100 messages a day. 1 extra storage space is mounted and once a day rsync copies 2 files to it in the middle of the night -- one around 70mb and one 7mb. The other is being used as the root for a FreeBSD jail which is not being used for anything. Just proof of concept. No processes are running in the jail that are doing much of anything to the NFS mounted fiel system -- occasional log writes.


What is your server hardware (# processors, memory) - is it underprovisioned for what you're doing to it?

Tyan 2892 MB with a single dual core Opteron at 2.0 GHZ.  2GB memory.

Single Areca 1130 raid card with 1gb RAM cache. Works very well with ZFS without the NFS component. (Has a 9 disk RAID 6 array on it). I have done lots of testing with this card and Solaris with and without ZFS and it has held up very well without any sort of IO issues. (Except the fact that it does not get a flush when the system powers down with init 5). The ZFS pools are currently on this single "disk" (to be augmented later this year when more funding comes through to buy more stuff)

A dual port e1000g intel server card over PCIe is the Solaris side of the network.


How is the freeBSD NFS client code  - robust?

I have not had issues with it over the last 10 years when mounting from other FreeBSD boxes. It seems to be robust.


Are there any disk errors on the server (iostat -E, check /var/adm/ messages, zpool iostat -x)?

Nothing in /var/adm/messages .

zpool iostat shows nothing

iostat -E shows one illegal request error (I have had 3+ nfs episodes per day over the last 3 or 4 days)

# iostat -E
sd1       Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: Areca    Product: ARC-1130-VOL#00  Revision: R001 Serial No:
Size: 22.00GB <21999648256 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd2       Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: Areca    Product: ARC-1130-VOL#01  Revision: R001 Serial No:
Size: 1898.00GB <1897998581248 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
#


Is the network being flaky?

Nothing else is running on it but when I do tests across it it seems to be running fine. I don't get any errors. FreeBSD box is directly connected to the GB switch (which supports jumbo frames) and the Solaris box is directly connected.

While an episode is happening, I have no problems at all accessing the other nfs shares on the FreeBSD box from the same ZFS pool.

Nor do I have problems on the Solaris side with direct access to the ZFS pool being affected (ie, I can log in to the solaris box and cd to the space and do thing with no issues).

Thanks
Chad



eric

Is there any functional difference with setting up the ZFS pools as legacy mounts and using a traditional share command to share them over nfs?

I am mostly a Solaris noob and am happy to learn and can try anything people want me to test.

Thanks in advance for any comments or help.
thanks
Chad
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net



Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to