[zfs-discuss] strange performance drop of solaris 10/zfs
Hi We have been using a Solaris 10 system (Sun-Fire-V245) for a while as our primary file server. This is based on Solaris 10 06/06, plus patches up to approx May 2007. It is a production machine, and until about a week ago has had few problems. Attached to the V245 is a SCSI RAID array, which presents one LUN to the OS. On this lun is a zpool (tank), and within that 300+ zfs file systems (one per user for automounted home directories). The system is connected to our LAN via gigabit Ethernet,. most of our NFS clients have just 100FD network connection. In recent days performance of the file server seems to have gone off a cliff. I don't know how to troubleshoot what might be wrong? Typical "zpool iostat 120" output is shown below. If I run "truss -D df" I see each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself is healthy, and all disks are reporting as OK. I have tried to establish if some client or clients are thrashing the server via nfslogd, but without seeing anything obvious. Is there some kind of per-zfs-filesystem iostat? End users are reporting just saving small files can take 5-30 seconds? prstat/top shows no process using significant CPU load. The system has 8GB of RAM, vmstat shows nothing interesting. I have another V245, with the same SCSI/RAID/zfs setup, and a similar (though a bit less) load of data and users where this problem is NOT apparent there? Suggestions? Kevin Thu Jan 29 11:32:29 CET 2009 capacity operationsbandwidth pool used avail read write read write -- - - - - - - tank2.09T 640G 10 66 825K 1.89M tank2.09T 640G 39 5 4.80M 126K tank2.09T 640G 38 8 4.73M 191K tank2.09T 640G 40 5 4.79M 126K tank2.09T 640G 39 5 4.73M 170K tank2.09T 640G 40 3 4.88M 43.8K tank2.09T 640G 40 3 4.87M 54.7K tank2.09T 640G 39 4 4.81M 111K tank2.09T 640G 39 9 4.78M 134K tank2.09T 640G 37 5 4.61M 313K tank2.09T 640G 39 3 4.89M 32.8K tank2.09T 640G 35 7 4.31M 629K tank2.09T 640G 28 13 3.47M 1.43M tank2.09T 640G 5 51 433K 4.27M tank2.09T 640G 6 51 450K 4.23M tank2.09T 639G 5 52 543K 4.23M tank2.09T 640G 26 57 3.00M 1.15M tank2.09T 640G 39 6 4.82M 107K tank2.09T 640G 39 3 4.80M 119K tank2.09T 640G 38 8 4.64M 295K tank2.09T 640G 40 7 4.82M 102K tank2.09T 640G 43 5 4.79M 103K tank2.09T 640G 39 4 4.73M 193K tank2.09T 640G 39 5 4.87M 62.1K tank2.09T 640G 40 3 4.88M 49.3K tank2.09T 640G 40 3 4.80M 122K tank2.09T 640G 42 4 4.83M 82.0K tank2.09T 640G 40 3 4.89M 42.0K ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool lost
Hi I'm using fairly stock S10, but this is really just a zfs/zpool question. # uname -a SunOS peach 5.10 Generic_118833-36 sun4u sparc SUNW,Sun-Blade-100 Misremembering the option to file for handling special files (-s), I executed the following: # file -m /dev/dsk/c*s2 My shell would have expanded this as follows: # ls /dev/dsk/c*s2 /dev/dsk/c0t0d0s2 /dev/dsk/c0t1d0s2 /dev/dsk/c0t2d0s2 This is a SB100 with 2 disks. c0t2d0 has the OS and other partitions, c0t2d0 is the CDROM. The other disk was one large, single disk, zpool. It's not mountable now. # zpool status -v pool: tank state: FAULTED status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAMESTATE READ WRITE CKSUM tankUNAVAIL 0 0 0 insufficient replicas c0t0d0UNAVAIL 0 0 0 cannot open Not knowing what the "file -m ..." command might have done, I wonder if this situation is recoverable? Looking at the smf logs I see that my pool (tank) is at least recognized a bit: # fmdump -eV Feb 09 2007 16:41:46.082264521 ereport.fs.zfs.vdev.open_failed nvlist version: 0 class = ereport.fs.zfs.vdev.open_failed ena = 0x10163875741 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x347b6d721b99340d vdev = 0xa7f43f56535e14a8 (end detector) pool = tank pool_guid = 0x347b6d721b99340d pool_context = 1 vdev_guid = 0xa7f43f56535e14a8 vdev_type = disk vdev_path = /dev/dsk/c0t0d0s0 vdev_devid = id1,[EMAIL PROTECTED]/a parent_guid = 0x347b6d721b99340d parent_type = root prev_state = 0x1 __ttl = 0x1 __tod = 0x45cc963a 0x4e741c9 Feb 09 2007 16:41:46.082265280 ereport.fs.zfs.zpool nvlist version: 0 class = ereport.fs.zfs.zpool ena = 0x10163875741 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x347b6d721b99340d (end detector) pool = tank pool_guid = 0x347b6d721b99340d pool_context = 1 __ttl = 0x1 __tod = 0x45cc963a 0x4e744c0 Any ideas/suggestions? Kevin This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] commercial backup software and zfs
Hi Is the following an accurate sstatment of the current status with (for me) the 3 main commercial ackup software solutions out there (1) Legato Networker - support coming in 7.3.2 patch due soon. Until then backups of ZFS filesystems wont work at all as the acl(8) calls will fail. But will work if you use "legacy" mouning of zfs filesytsem (but even then wont include any ALCs). Or backup via NFS mounts. (2) IBM's TSM - no current or official support from IBM. Will/wont work? (3) Veritas NetBackup - client v5.1+ works out of the box, but without full/any ACL support. ? Kevin PS: I'm particularly interested in (2) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss