Re: [zfs-discuss] How to find poor performing disks
Scott Lawson writes: > Also you may wish to look at the output of 'iostat -xnce 1' as well. > > You can post those to the list if you have a specific problem. > > You want to be looking for error counts increasing and specifically 'asvc_t' > for the service times on the disks. I higher number for asvc_t may help to > isolate poorly performing individual disks. > > I blast the pool with dd, and look for drives that are *always* active, while others in the same group have completed their transaction group and get no more activity. Within a group drives should be getting the same amount of data per 5 second (zfs_txg_synctime) and the ones that are always active are the ones slowing you down. If whole groups are unbalanced that's a sign that they have different amount of free space and the expectation is that you will be gated by the speed on the group that needs to catch up. -r > > Scott Meilicke wrote: > > You can try: > > > > zpool iostat pool_name -v 1 > > > > This will show you IO on each vdev at one second intervals. Perhaps you > > will see different IO behavior on any suspect drive. > > > > -Scott > > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find poor performing disks
Running "iostat -nxce 1", I saw write sizes alternate between two raidz groups in the same pool. At one time, drives on cotroller 1 have larger writes (3-10 times) than ones on controller2: extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 fd0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c1t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c0t10d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c0t11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c3t0d0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c4t0d0 0.09.00.04.0 0.0 0.00.00.5 0 0 1 0 0 1 c0t12d0 0.09.00.04.0 0.0 0.00.00.1 0 0 1 0 0 1 c0t13d0 0.09.00.04.5 0.0 0.00.00.1 0 0 1 0 0 1 c0t14d0 0.08.00.04.5 0.0 0.00.00.2 0 0 1 0 0 1 c0t15d0 0.09.00.03.5 0.0 0.00.00.1 0 0 1 0 0 1 c0t16d0 0.09.00.03.5 0.0 0.00.00.1 0 0 1 0 0 1 c0t17d0 0.0 20.00.0 56.5 0.0 0.00.00.2 0 0 1 0 0 1 c2t6d0 0.0 20.00.0 55.0 0.0 0.00.00.3 0 0 1 0 0 1 c2t7d0 0.0 20.00.0 53.5 0.0 0.00.00.2 0 0 1 0 0 1 c2t8d0 0.0 20.00.0 53.0 0.0 0.00.00.3 0 0 1 0 0 1 c2t9d0 0.0 20.00.0 55.5 0.0 0.00.00.2 0 0 1 0 0 1 c2t10d0 0.0 20.00.0 55.0 0.0 0.00.00.3 0 0 1 0 0 1 c2t11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c2t12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c2t13d0 cpu us sy wt id 0 47 0 53 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 fd0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c1t1d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c0t10d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c0t11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c3t0d0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c4t0d0 0.08.00.0 18.5 0.0 0.00.00.2 0 0 1 0 0 1 c0t12d0 0.08.00.0 18.5 0.0 0.00.00.3 0 0 1 0 0 1 c0t13d0 0.0 11.00.0 20.5 0.0 0.00.00.3 0 0 1 0 0 1 c0t14d0 0.0 12.00.0 20.5 0.0 0.00.00.3 0 0 1 0 0 1 c0t15d0 0.08.00.0 19.0 0.0 0.00.00.2 0 0 1 0 0 1 c0t16d0 0.08.00.0 18.5 0.0 0.00.00.2 0 0 1 0 0 1 c0t17d0 0.0 21.00.0 66.0 0.0 0.00.00.4 0 1 1 0 0 1 c2t6d0 0.0 21.00.0 66.0 0.0 0.00.00.3 0 0 1 0 0 1 c2t7d0 0.0 21.00.0 65.5 0.0 0.00.00.3 0 0 1 0 0 1 c2t8d0 0.0 20.00.0 64.0 0.0 0.00.00.4 0 0 1 0 0 1 c2t9d0 0.0 21.00.0 65.0 0.0 0.00.00.4 0 0 1 0 0 1 c2t10d0 0.0 21.00.0 64.0 0.0 0.00.00.3 0 0 1 0 0 1 c2t11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c2t12d0 0.00.00.00.0 0.0 0.00.00.0 0 0 1 0 0 1 c2t13d0 cpu us sy wt id 0 23 0 77 At other time, drives on controller2 have larger writes (3-10 times) than the ones on controller1: extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.00.00.00.0 0.0 0.00.00.0 0 0 0 0 0 0 fd0 0.00.00.00.0 0.0 0.00.00.0 0 0 2 0 0 2 c1t1d0 0.00.00.00.0 0.0 0.00.00
Re: [zfs-discuss] How to find poor performing disks
Maybe you can run a Dtrace probe using Chime? http://blogs.sun.com/observatory/entry/chime Initial Traces -> Device IO -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find poor performing disks
Also you may wish to look at the output of 'iostat -xnce 1' as well. You can post those to the list if you have a specific problem. You want to be looking for error counts increasing and specifically 'asvc_t' for the service times on the disks. I higher number for asvc_t may help to isolate poorly performing individual disks. Scott Meilicke wrote: You can try: zpool iostat pool_name -v 1 This will show you IO on each vdev at one second intervals. Perhaps you will see different IO behavior on any suspect drive. -Scott ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to find poor performing disks
You can try: zpool iostat pool_name -v 1 This will show you IO on each vdev at one second intervals. Perhaps you will see different IO behavior on any suspect drive. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How to find poor performing disks
Hi, I'd appreciate if anyone can point me how to identify poor performing disks that might have dragged down performance of the pool. Also the system logged following error about one of the drives. Does it show the disk was having problem? Aug 17 13:45:56 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@6/pci1000,3...@0 (mpt1): Aug 17 13:45:56 zfs1.domain.com Disconnected command timeout for Target 10 Aug 17 13:45:56 zfs1.domain.com scsi: [ID 365881 kern.info] /p...@0,0/pci8086,2...@6/pci1000,3...@0 (mpt1): Aug 17 13:45:56 zfs1.domain.com Log info 3114 received for target 10. Aug 17 13:45:56 zfs1.domain.com scsi_status=0, ioc_status=8048, scsi_state=c Aug 17 13:45:56 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@6/pci1000,3...@0/s...@a,0 (sd15): Aug 17 13:45:56 zfs1.domain.com SCSI transport failed: reason 'reset': retrying command Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,2...@6/pci1000,3...@0/s...@a,0 (sd15): Aug 17 13:45:59 zfs1.domain.com Error for Command: read(10) Error Level: Retryable Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Requested Block: 715872929 Error Block: 715872929 Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Vendor: ATA Serial Number: WD-WCAP Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]Sense Key: Unit Attention Aug 17 13:45:59 zfs1.domain.com scsi: [ID 107833 kern.notice]ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss