Re: [zfs-discuss] ZFS extremely slow performance

2010-01-09 Thread Emily Grettel


Hello again,
 
I swapped out the PSU and replaced the cables and ran scrubs almost every day 
(after hours) with no reported faults. I also upgraded to SNV_130 thanks to 
Brock  changed cables and PSU after the suggestion from Richard. I owe you two 
both beers!
 
We thought our troubles were resolved but I'm noticing alot of the messages 
above from my /var/adm/messages and I'm starting to worry. I tailed the log 
whilst we streamed some MPEG2 captures (it was about 12Gb) and the log went 
crazy!

Jan  9 23:52:04 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() 
failed: repository server unavailable
Jan  9 23:52:04 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: 
failed to refresh SMF instance svc:/network/smb/server:default
Jan  9 23:52:04 razor last message repeated 11 times
Jan  9 23:52:04 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() 
failed: repository server unavailable
Jan  9 23:52:04 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: 
failed to refresh SMF instance svc:/network/smb/server:default
Jan  9 23:52:05 razor last message repeated 4 times
Jan  9 23:52:05 razor nwamd[34]: [ID 244715 daemon.error] scf_handle_destroy() 
failed: repository server unavailable
Jan  9 23:52:05 razor smbd[13585]: [ID 354691 daemon.error] smb_nicmon_daemon: 
failed to refresh SMF instance svc:/network/smb/server:default

Any ideas why this may be happenning? I'm really starting to worry, is it a ZFS 
issue or SMB again?
 
Cheers,
Emily


 On Dec 31, 2009, at 11:38 PM, Emily Grettel wrote:
 
  Hi Richard,
 
  This is my zpool status -v
 
  pool: tank
  state: ONLINE
  status: One or more devices has experienced an unrecoverable error. 
  An
  attempt was made to correct the error. Applications are 
  unaffected.
  action: Determine if the device needs to be replaced, and clear the 
  errors
  using 'zpool clear' or replace the device with 'zpool 
  replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 5h15m with 0 errors on Fri Jan 1 
  17:39:57 2010
  config:
  NAME STATE READ WRITE CKSUM
  tank ONLINE 0 0 0
  raidz1-0 ONLINE 0 0 0
  c7t1d0 ONLINE 0 0 2 51.5K repaired
  c7t4d0 ONLINE 0 0 2 52K repaired
  c0t1d0 ONLINE 0 0 3 77.5K repaired
  c7t5d0 ONLINE 0 0 0
  c7t3d0 ONLINE 0 0 1 26K repaired
  c7t2d0 ONLINE 0 0 0
  errors: No known data errors
 
  I might swap the SATA cables to some better quality ones that are 
  shielded I think (ACRyan has some) and see if its that.
 
  Cheers,
  Em
 
   From: richard.ell...@gmail.com
   To: emilygrettelis...@hotmail.com
   Subject: Re: [zfs-discuss] ZFS extremely slow performance
   Date: Thu, 31 Dec 2009 19:58:24 -0800
  
   hmmm... might be something other than the disk, like cables or
   vibration.
   Let's see what happens after the scrub completes.
   -- richard
  
   On Dec 31, 2009, at 5:34 PM, Emily Grettel wrote:
  
Hello!
   
   
 This could be a broken disk, or it could be some other
 hardware/software/firmware issue. Check the errors on the
 device with
 iostat -En
   
Heres the output:
   
c7t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00L Revision: 1A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t4d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t5d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD740GD-00FL Revision: 8F33 Serial No:
Size: 74.36GB 74355769344 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 6 Predictive Failure Analysis: 0
c0t1d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB

Re: [zfs-discuss] ZFS extremely slow performance

2009-12-31 Thread Richard Elling

On Dec 31, 2009, at 2:49 AM, Robert Milkowski wrote:



judging by a *very* quick glance it looks like you have an issue  
with c3t0d0 device which is responding very slowly.


Yes, there is an I/O stuck on the device which is not getting serviced.
See below...



--
Robert Milkowski
http://milek.blogspot.com



On 31/12/2009 09:10, Emily Grettel wrote:


Hi,

I'm using OpenSolaris 127 from my previous posts to address CIFS  
problems. I have a few zpools but lately (with an uptime of 32  
days) we've started to get CIFS issues and really bad IO  
performance. I've been running scrubs on a nightly basis.


I'm not sure why its happenning either - I'm new to OpenSolaris.

I ran fsstat whilst trying to unrar a 8.4Gb file with an ISO inside  
it:


fsstat zfs 1
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
3.29K   466   367  633K 1.50K  1.66M 8.66K  964K 15.6G  314K 9.38G  
zfs
0 0 0 4 0  8 0   135 5.13M93 5.02M  
zfs
0 0 0 7 0 18 0   205 5.63M   137 5.64M  
zfs
0 0 0 4 0  8 090 3.92K49 14.6K  
zfs
0 0 0 4 0  8 0   115 16.4K65 27.5K  
zfs
0 0 0 8 0 13 0   153 8.36M   113 8.38M  
zfs
0 0 0 4 0  8 094 3.96K53 19.1K  
zfs
0 0 0 7 0 18 080   80042 1.13K  
zfs
0 0 0 4 0  8 090 3.92K48 7.62K  
zfs
0 0 0 4 0  8 099  132K53 7.14K  
zfs
0 0 0 4 0  8 0   188 5.99K96 5.62K  
zfs
0 0 0 4 0  8 095  664K52  420K  
zfs
0 0 0 9 0 22 0   164 7.97K92 12.2K  
zfs

 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
0 0 0 4 0  8 0   111 2.63M70 2.63M  
zfs
0 0 0 4 0  8 0   262 6.63M   153 6.63M  
zfs
0 0 0 4 0  8 080   80044 1.70K  
zfs
0 0 0 4 0  8 0   337 18.1M   247 18.1M  
zfs
0 0 0 7 0 18 0   127 5.75M89 5.63M  
zfs
0 0 0 4 0  8 080   80050 25.6K  
zfs


My iostat appears below this message (its quite long! to give you  
an idea). I'm really not sure why the performance has really  
dropped all of a sudden or how to diagnose it. CIFS shares  
occasionally drop out too.


Its a bit of a downer to be experiencing on the 31st of December. I  
hope everyone has a Safe  Happy New Years :-)


I'm unable to upgrade to the latest release because of an issue  
with python:


pfexec pkg image-update
Creating Plan /pkg: Cannot remove 'pkg://opensolaris.org/sunwipkg-gui-l...@0.5.11 
,5.11-0.127:2009T075414Z' due to the following packages that  
depend on it:
  pkg://opensolaris.org/SUNWipkg- 
g...@0.5.11,5.11-0.127:2009T075333Z


So I'm stuck on 127 until I can rebuild this machine :(

Cheers,
Em

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.01.00.00.5  2.8  1.0 2815.9 1000.0 100 100 c7t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  2.0  1.00.00.0 100 100 c7t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.06.00.0   81.5  1.2  1.0  198.6  166.6  60 100 c7t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  1.00.00.0   0 100 c7t3d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.06.00.04.0  0.0  0.00.00.2   0   0 c7t1d0
0.06.00.04.0  0.0  0.00.00.2   0   0 c7t2d0
0.09.00.0   31.5  0.0  0.40.0   41.7   0  38 c7t3d0
0.06.00.04.0  0.0  0.00.00.3   0   0 c7t4d0
0.06.00.04.0  0.0  0.00.00.2   0   0 c7t5d0
0.06.00.04.0  0.0  0.00.00.1   0   0 c0t1d0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   56.0  122.0 5972.3 8592.2  0.0  0.50.02.9   0  25 c7t1d0
   55.0  136.0 5998.3 8590.2  0.0  0.70.03.8   0  29 c7t2d0
0.0  111.00.0 4342.9  0.0  2.20.0   20.2   0  57 c7t3d0
  103.0  153.0 5868.3 8590.7  0.0  0.40.01.7   0  21 c7t4d0
   96.0  130.0 5946.8 8591.2  0.0  0.70.03.2   0  

Re: [zfs-discuss] ZFS extremely slow performance

2009-12-31 Thread Bob Friesenhahn

On Thu, 31 Dec 2009, Emily Grettel wrote:

 
I'm using OpenSolaris 127 from my previous posts to address CIFS problems. I 
have a few zpools but
lately (with an uptime of 32 days) we've started to get CIFS issues and really 
bad IO performance.
I've been running scrubs on a nightly basis.
 
I'm not sure why its happenning either - I'm new to OpenSolaris.


Without knowing anything about your pool, your c7t3d0 device seems 
possibly suspect.  Notice that it often posts a very high asvc_t.


What is the output from 'zpool status' for this pool?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS extremely slow performance

2009-12-31 Thread Emily Grettel

Hello!

 


 This could be a broken disk, or it could be some other
 hardware/software/firmware issue. Check the errors on the
 device with
 iostat -En


Heres the output:

 

c7t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00L Revision: 1A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t2d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t3d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t4d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t5d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4 Predictive Failure Analysis: 0
c7t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD740GD-00FL Revision: 8F33 Serial No:
Size: 74.36GB 74355769344 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 6 Predictive Failure Analysis: 0
c0t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD10EADS-00P Revision: 0A01 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 5 Predictive Failure Analysis: 0
c3t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No:
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 5 Predictive Failure Analysis: 0
c3t1d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD7500AAKS-0 Revision: 4G30 Serial No:
Size: 750.16GB 750156374016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 5 Predictive Failure Analysis: 0


 You should also check the fma logs:
 fmadm faulty

 

Empty


 fmdump -eV
 


This turned out to be huge. But they're mostly something like this:

 

 

Nov 13 2009 10:15:41.883716494 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x7cfde552fd100401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xda1d003c03abad23
vdev = 0x4389ee65271b9187
(end detector)

pool = tank
pool_guid = 0xda1d003c03abad23
pool_context = 0
pool_failmode = wait
vdev_guid = 0x4389ee65271b9187
vdev_type = replacing
parent_guid = 0x79c2f2cf0b81ae5a
parent_type = raidz
zio_err = 0
zio_offset = 0xae9b3fa00
zio_size = 0x6600
zio_objset = 0x24
zio_object = 0x1b2
zio_level = 0
zio_blkid = 0x635
__ttl = 0x1
__tod = 0x4afc971d 0x34ac718e

 

 

Thanks for helping and telling me about those commands :-)

 

The scrub I started last night is still running, it usually takes about 8 
hours. Will post the results.

 

- Em

 

 
 From: richard.ell...@gmail.com
 To: mi...@task.gda.pl
 Date: Thu, 31 Dec 2009 08:37:03 -0800
 CC: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] ZFS extremely slow performance
 
 On Dec 31, 2009, at 2:49 AM, Robert Milkowski wrote:
 
 
  judging by a *very* quick glance it looks like you have an issue 
  with c3t0d0 device which is responding very slowly.
 
 Yes, there is an I/O stuck on the device which is not getting serviced.
 See below...
 
 
  -- 
  Robert Milkowski
  http://milek.blogspot.com
 
 
  
_
If It Exists, You'll Find it on SEEK Australia's #1 job site
http://clk.atdmt.com/NMN/go/157639755/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss