Re: [zfs-discuss] ZFS extremely slow performance
Hello again, I swapped out the PSU and replaced the cables and ran scrubs almost every day (after hours) with no reported faults. I also upgraded to SNV_130 thanks to Brock changed cables and PSU after the suggestion from Richard. I owe you two both beers! We thought our troubles were resolved but I'm noticing alot of the messages above from my /var/adm/messages and I'm starting to worry. I tailed the log whilst we streamed some MPEG2 captures (it was about 12Gb) and the log went crazy! Jan 9 23:52:04 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() failed: repository server unavailable Jan 9 23:52:04 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: failed to refresh SMF instance svc:/network/smb/server:default Jan 9 23:52:04 razor last message repeated 11 times Jan 9 23:52:04 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() failed: repository server unavailable Jan 9 23:52:04 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: failed to refresh SMF instance svc:/network/smb/server:default Jan 9 23:52:05 razor last message repeated 4 times Jan 9 23:52:05 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() failed: repository server unavailable Jan 9 23:52:05 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: failed to refresh SMF instance svc:/network/smb/server:default Any ideas why this may be happenning? I'm really starting to worry, is it a ZFS issue or SMB again? Cheers, Emily On Dec 31, 2009, at 11:38 PM, Emily Grettel wrote: Hi Richard, This is my zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 5h15m with 0 errors on Fri Jan 1 17:39:57 2010 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 2 51.5K repaired c7t4d0 ONLINE 0 0 2 52K repaired c0t1d0 ONLINE 0 0 3 77.5K repaired c7t5d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 1 26K repaired c7t2d0 ONLINE 0 0 0 errors: No known data errors I might swap the SATA cables to some better quality ones that are shielded I think (ACRyan has some) and see if its that. Cheers, Em From: richard.ell...@gmail.com To: emilygrettelis...@hotmail.com Subject: Re: [zfs-discuss] ZFS extremely slow performance Date: Thu, 31 Dec 2009 19:58:24 -0800 hmmm... might be something other than the disk, like cables or vibration. Let's see what happens after the scrub completes. -- richard On Dec 31, 2009, at 5:34 PM, Emily Grettel wrote: Hello! This could be a broken disk, or it could be some other hardware/software/firmware issue. Check the errors on the device with iostat -En Heres the output: c7t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00L Revision: 1A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD740GD-00FL Revision: 8F33 Serial No: Size: 74.36GB 74355769344 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 6 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB
Re: [zfs-discuss] ZFS extremely slow performance
On Dec 31, 2009, at 2:49 AM, Robert Milkowski wrote: judging by a *very* quick glance it looks like you have an issue with c3t0d0 device which is responding very slowly. Yes, there is an I/O stuck on the device which is not getting serviced. See below... -- Robert Milkowski http://milek.blogspot.com On 31/12/2009 09:10, Emily Grettel wrote: Hi, I'm using OpenSolaris 127 from my previous posts to address CIFS problems. I have a few zpools but lately (with an uptime of 32 days) we've started to get CIFS issues and really bad IO performance. I've been running scrubs on a nightly basis. I'm not sure why its happenning either - I'm new to OpenSolaris. I ran fsstat whilst trying to unrar a 8.4Gb file with an ISO inside it: fsstat zfs 1 new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 3.29K 466 367 633K 1.50K 1.66M 8.66K 964K 15.6G 314K 9.38G zfs 0 0 0 4 0 8 0 135 5.13M93 5.02M zfs 0 0 0 7 0 18 0 205 5.63M 137 5.64M zfs 0 0 0 4 0 8 090 3.92K49 14.6K zfs 0 0 0 4 0 8 0 115 16.4K65 27.5K zfs 0 0 0 8 0 13 0 153 8.36M 113 8.38M zfs 0 0 0 4 0 8 094 3.96K53 19.1K zfs 0 0 0 7 0 18 080 80042 1.13K zfs 0 0 0 4 0 8 090 3.92K48 7.62K zfs 0 0 0 4 0 8 099 132K53 7.14K zfs 0 0 0 4 0 8 0 188 5.99K96 5.62K zfs 0 0 0 4 0 8 095 664K52 420K zfs 0 0 0 9 0 22 0 164 7.97K92 12.2K zfs new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 0 0 0 4 0 8 0 111 2.63M70 2.63M zfs 0 0 0 4 0 8 0 262 6.63M 153 6.63M zfs 0 0 0 4 0 8 080 80044 1.70K zfs 0 0 0 4 0 8 0 337 18.1M 247 18.1M zfs 0 0 0 7 0 18 0 127 5.75M89 5.63M zfs 0 0 0 4 0 8 080 80050 25.6K zfs My iostat appears below this message (its quite long! to give you an idea). I'm really not sure why the performance has really dropped all of a sudden or how to diagnose it. CIFS shares occasionally drop out too. Its a bit of a downer to be experiencing on the 31st of December. I hope everyone has a Safe Happy New Years :-) I'm unable to upgrade to the latest release because of an issue with python: pfexec pkg image-update Creating Plan /pkg: Cannot remove 'pkg://opensolaris.org/sunwipkg-gui-l...@0.5.11 ,5.11-0.127:2009T075414Z' due to the following packages that depend on it: pkg://opensolaris.org/SUNWipkg- g...@0.5.11,5.11-0.127:2009T075333Z So I'm stuck on 127 until I can rebuild this machine :( Cheers, Em extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.01.00.00.5 2.8 1.0 2815.9 1000.0 100 100 c7t3d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 2.0 1.00.00.0 100 100 c7t3d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.06.00.0 81.5 1.2 1.0 198.6 166.6 60 100 c7t3d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 1.00.00.0 0 100 c7t3d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.06.00.04.0 0.0 0.00.00.2 0 0 c7t1d0 0.06.00.04.0 0.0 0.00.00.2 0 0 c7t2d0 0.09.00.0 31.5 0.0 0.40.0 41.7 0 38 c7t3d0 0.06.00.04.0 0.0 0.00.00.3 0 0 c7t4d0 0.06.00.04.0 0.0 0.00.00.2 0 0 c7t5d0 0.06.00.04.0 0.0 0.00.00.1 0 0 c0t1d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 56.0 122.0 5972.3 8592.2 0.0 0.50.02.9 0 25 c7t1d0 55.0 136.0 5998.3 8590.2 0.0 0.70.03.8 0 29 c7t2d0 0.0 111.00.0 4342.9 0.0 2.20.0 20.2 0 57 c7t3d0 103.0 153.0 5868.3 8590.7 0.0 0.40.01.7 0 21 c7t4d0 96.0 130.0 5946.8 8591.2 0.0 0.70.03.2 0
Re: [zfs-discuss] ZFS extremely slow performance
On Thu, 31 Dec 2009, Emily Grettel wrote: I'm using OpenSolaris 127 from my previous posts to address CIFS problems. I have a few zpools but lately (with an uptime of 32 days) we've started to get CIFS issues and really bad IO performance. I've been running scrubs on a nightly basis. I'm not sure why its happenning either - I'm new to OpenSolaris. Without knowing anything about your pool, your c7t3d0 device seems possibly suspect. Notice that it often posts a very high asvc_t. What is the output from 'zpool status' for this pool? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS extremely slow performance
Hello! This could be a broken disk, or it could be some other hardware/software/firmware issue. Check the errors on the device with iostat -En Heres the output: c7t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00L Revision: 1A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4 Predictive Failure Analysis: 0 c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD740GD-00FL Revision: 8F33 Serial No: Size: 74.36GB 74355769344 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 6 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 5 Predictive Failure Analysis: 0 c3t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 5 Predictive Failure Analysis: 0 c3t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No: Size: 750.16GB 750156374016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 5 Predictive Failure Analysis: 0 You should also check the fma logs: fmadm faulty Empty fmdump -eV This turned out to be huge. But they're mostly something like this: Nov 13 2009 10:15:41.883716494 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x7cfde552fd100401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xda1d003c03abad23 vdev = 0x4389ee65271b9187 (end detector) pool = tank pool_guid = 0xda1d003c03abad23 pool_context = 0 pool_failmode = wait vdev_guid = 0x4389ee65271b9187 vdev_type = replacing parent_guid = 0x79c2f2cf0b81ae5a parent_type = raidz zio_err = 0 zio_offset = 0xae9b3fa00 zio_size = 0x6600 zio_objset = 0x24 zio_object = 0x1b2 zio_level = 0 zio_blkid = 0x635 __ttl = 0x1 __tod = 0x4afc971d 0x34ac718e Thanks for helping and telling me about those commands :-) The scrub I started last night is still running, it usually takes about 8 hours. Will post the results. - Em From: richard.ell...@gmail.com To: mi...@task.gda.pl Date: Thu, 31 Dec 2009 08:37:03 -0800 CC: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS extremely slow performance On Dec 31, 2009, at 2:49 AM, Robert Milkowski wrote: judging by a *very* quick glance it looks like you have an issue with c3t0d0 device which is responding very slowly. Yes, there is an I/O stuck on the device which is not getting serviced. See below... -- Robert Milkowski http://milek.blogspot.com _ If It Exists, You'll Find it on SEEK Australia's #1 job site http://clk.atdmt.com/NMN/go/157639755/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss