[zfs-discuss] Data distribution not even between vdevs
Hi list, My zfs write performance is poor and need your help. I create zpool with 2 raidz1. When the space is to be used up, I add 2 another raidz1 to extend the zpool. After some days, the zpool is almost full, I remove some old data. But now, as show below, the first 2 raidz1 vdev usage is about 78% and the last 2 raidz1 vdev usage is about 93%. I have line in /etc/system set zfs:metaslab_df_free_pct=4 So the performance degrade will happen when the vdev usage is above 90%. All my file is small files which size is about 150KB. Now the questions is: 1. Should I balance the data between the vdevs by copy the data and remove the data which locate in last 2 vdevs? 2. Is there any method to automatically re-balance the data? or Any better solution to resolve this problem? root@nas-01:~# zpool iostat -v capacity operations bandwidth pool used avail read write read write -- - - - - - - datapool21.3T 3.93T 26 96 81.4K 2.81M raidz14.93T 1.39T 8 28 25.7K 708K c3t600221900085486703B2490FB009d0 - - 3 10 216K 119K c3t600221900085486703B4490FB063d0 - - 3 10 214K 119K c3t6002219000852889055F4CB79C10d0 - - 3 10 214K 119K c3t600221900085486703B8490FB0FFd0 - - 3 10 215K 119K c3t600221900085486703BA490FB14Fd0 - - 3 10 215K 119K c3t6002219000852889041C490FAFA0d0 - - 3 10 215K 119K c3t600221900085486703C0490FB27Dd0 - - 3 10 214K 119K raidz14.64T 1.67T 8 32 24.6K 581K c3t600221900085486703C2490FB2BFd0 - - 3 10 224K 98.2K c3t6002219000852889041F490FAFD0d0 - - 3 10 222K 98.2K c3t60022190008528890428490FB0D8d0 - - 3 10 222K 98.2K c3t60022190008528890422490FB02Cd0 - - 3 10 223K 98.3K c3t60022190008528890425490FB07Cd0 - - 3 10 223K 98.3K c3t60022190008528890434490FB24Ed0 - - 3 10 223K 98.3K c3t6002219000852889043949100968d0 - - 3 10 224K 98.2K raidz15.88T 447G 5 17 16.0K 67.7K c3t6002219000852889056B4CB79D66d0 - - 3 12 215K 12.2K c3t600221900085486704B94CB79F91d0 - - 3 12 216K 12.2K c3t600221900085486704BB4CB79FE1d0 - - 3 12 214K 12.2K c3t600221900085486704BD4CB7A035d0 - - 3 12 215K 12.2K c3t600221900085486704BF4CB7A0ABd0 - - 3 12 216K 12.2K c3t6002219000852889055C4CB79BB8d0 - - 3 12 214K 12.2K c3t600221900085486704C14CB7A0FDd0 - - 3 12 215K 12.2K raidz15.88T 441G 4 1 14.9K 12.4K c3t6002219000852889042B490FB124d0 - - 1 1 131K 2.33K c3t600221900085486704C54CB7A199d0 - - 1 1 132K 2.33K c3t600221900085486704C74CB7A1D5d0 - - 1 1 130K 2.33K c3t600221900085288905594CB79B64d0 - - 1 1 133K 2.33K c3t600221900085288905624CB79C86d0 - - 1 1 132K 2.34K c3t600221900085288905654CB79CCCd0 - - 1 1 131K 2.34K c3t600221900085288905684CB79D1Ed0 - - 1 1 132K 2.33K c3t6B8AC6FF837605864DC9E9F1d0 0 928G 0 16289 1.47M -- - - - - - - root@nas-01:~# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] solaris 10u8 hangs with message Disconnected command timeout for Target 0
Hi, My solaris storage hangs. I login to the console and there is messages[1] display on the console. I can't login into the console and seems the IO is totally blocked. The system is solaris 10u8 on Dell R710 with disk array Dell MD3000. 2 HBA cable connect the server and MD3000. The symptom is random. It is very appreciated if any one can help me out. Regards, Ding [1] Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /pci@0,0/pci8086,3410@9 /pci8086,32c@0/pci1028,1f04@8 (mpt1): Aug 16 13:14:16 nas-hz-02 Disconnected command timeout for Target 0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk@g60026b900053aa1802a44b8f0ded (sd47): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073Error Block: 1380679073 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk@g60026b900053aa18029e4b8f0d61 (sd41): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk@g60026b900053aa1802a24b8f0dc5 (sd45): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679073Error Block: 1380679073 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk@g60026b900053aa18029c4b8f0d35 (sd39): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 Aug 16 13:14:16 nas-hz-02 scsi: WARNING: /scsi_vhci/disk@g60026b900053aa1802984b8f0cd2 (sd35): Aug 16 13:14:16 nas-hz-02 Error for Command: write(10) Error Level: Retryable Aug 16 13:14:16 nas-hz-02 scsi: Requested Block: 1380679072Error Block: 1380679072 Aug 16 13:14:16 nas-hz-02 scsi: Vendor: DELL Serial Number: Aug 16 13:14:16 nas-hz-02 scsi: Sense Key: Unit Attention Aug 16 13:14:16 nas-hz-02 scsi: ASC: 0x29 (device internal reset), ASCQ: 0x4, FRU: 0x0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs usable space?
I have 15x1TB disks, each disk usable space should be 1Tib=1B=1/1024/1024/1024G=931G. As it shows in command format: # echo | format | grep MD 3. c4t60026B900053AA1502C74B8F0EADd0 DELL-MD3000-0735-931.01GB 4. c4t60026B900053AA1502C94B8F0EE3d0 DELL-MD3000-0735-931.01GB 5. c4t60026B900053AA1502CB4B8F0F0Dd0 DELL-MD3000-0735-931.01GB 6. c4t60026B900053AA1502CD4B8F0F3Dd0 DELL-MD3000-0735-931.01GB 7. c4t60026B900053AA1502CF4B8F0F6Dd0 DELL-MD3000-0735-931.01GB 8. c4t60026B900053AA1502D14B8F0F9Cd0 DELL-MD3000-0735-931.01GB 9. c4t60026B900053AA1502D34B8F0FC8d0 DELL-MD3000-0735-931.01GB 10. c4t60026B900053AA1802A04B8F0D91d0 DELL-MD3000-0735-931.01GB 11. c4t60026B900053AA1802A24B8F0DC5d0 DELL-MD3000-0735-931.01GB 12. c4t60026B900053AA1802A44B8F0DEDd0 DELL-MD3000-0735-931.01GB 13. c4t60026B900053AA18029C4B8F0D35d0 DELL-MD3000-0735-931.01GB 14. c4t60026B900053AA18029E4B8F0D61d0 DELL-MD3000-0735-931.01GB 15. c4t60026B900053AA18036E4DBF6BA6d0 DELL-MD3000-0735-931.01GB 16. c4t60026B900053AA1802984B8F0CD2d0 DELL-MD3000-0735-931.01GB 17. c4t60026B900053AA1503074B901CF3d0 DELL-MD3000-0735-931.01GB # I create 2 raidz1(7 disk) and 1 global hot spare as zpool: NAME STATE READ WRITE CKSUM datapool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t60026B900053AA1502C74B8F0EADd0ONLINE 0 0 0 c4t60026B900053AA1502C94B8F0EE3d0ONLINE 0 0 0 c4t60026B900053AA1502CB4B8F0F0Dd0ONLINE 0 0 0 c4t60026B900053AA1502CD4B8F0F3Dd0ONLINE 0 0 0 c4t60026B900053AA1502CF4B8F0F6Dd0ONLINE 0 0 0 c4t60026B900053AA1502D14B8F0F9Cd0ONLINE 0 0 0 c4t60026B900053AA1502D34B8F0FC8d0ONLINE 0 0 0 raidz1 ONLINE 0 0 0 spareONLINE 0 0 7 c4t60026B900053AA1802A04B8F0D91d0 ONLINE 10 0 0 194K resilvered c4t60026B900053AA18036E4DBF6BA6d0 ONLINE 0 0 0 531G resilvered c4t60026B900053AA1802A24B8F0DC5d0ONLINE 0 0 0 c4t60026B900053AA1802A44B8F0DEDd0ONLINE 0 0 0 c4t60026B900053AA1503074B901CF3d0ONLINE 0 0 0 c4t60026B900053AA18029C4B8F0D35d0ONLINE 0 0 0 c4t60026B900053AA18029E4B8F0D61d0ONLINE 0 0 0 c4t60026B900053AA1802984B8F0CD2d0ONLINE 0 0 0 spares c4t60026B900053AA18036E4DBF6BA6d0 INUSE currently in use I expect to have 14*931/1024=12.7TB zpool space, but actually, it only have 12.6TB zpool space: # zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT datapool 12.6T 9.96T 2.66T78% ONLINE - # And I expect the zfs usable space is 12*931/1024=10.91TB, but actually, it only have 10.58TB zfs space. Can any one explain where the disk space goes? Regards, Ding ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
On 06/08/2011 12:12 PM, Donald Stahl wrote: One day, the write performance of zfs degrade. The write performance decrease from 60MB/s to about 6MB/s in sequence write. Command: date;dd if=/dev/zero of=block bs=1024*128 count=1;date See this thread: http://www.opensolaris.org/jive/thread.jspa?threadID=139317tstart=45 And search in the page for: metaslab_min_alloc_size Try adjusting the metaslab size and see if it fixes your performance problem. -Don metaslab_min_alloc_size is not in use when block allocator isDynamic block allocator[1]. So it is not tunable parameter in my case. Thanks anyway. [1] http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c#496 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
For now, I find it take long time in function metaslab_block_picker in metaslab.c. I guess there maybe many avl search actions. I still not sure what cause the avl search and if there is any parameters to tune for it. Any suggestions? On 06/08/2011 05:57 PM, Markus Kovero wrote: Hi, also see; http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg45408.html We hit this with Sol11 though, not sure if it's possible with sol10 Yours Markus Kovero -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ding Honghui Sent: 8. kesäkuuta 2011 6:07 To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Wired write performance problem Hi, I got a wired write performance and need your help. One day, the write performance of zfs degrade. The write performance decrease from 60MB/s to about 6MB/s in sequence write. Command: date;dd if=/dev/zero of=block bs=1024*128 count=1;date The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks. The OS is Solaris 10U8, zpool version 15 and zfs version 4. I run Dtrace to trace the write performance: fbt:zfs:zfs_write:entry { self-ts = timestamp; } fbt:zfs:zfs_write:return /self-ts/ { @time = quantize(timestamp-self-ts); self-ts = 0; } It shows value - Distribution - count 8192 | 0 16384 | 16 32768 | 3270 65536 |@@@ 898 131072 |@@@ 985 262144 | 33 524288 | 1 1048576 | 1 2097152 | 3 4194304 | 0 8388608 |@180 16777216 | 33 33554432 | 0 67108864 | 0 134217728 | 0 268435456 | 1 536870912 | 1 1073741824 | 2 2147483648 | 0 4294967296 | 0 8589934592 | 0 17179869184 | 2 34359738368 | 3 68719476736 | 0 Compare to a working well storage(1 MD3000), the max write time of zfs_write is 4294967296, it is about 10 times faster. Any suggestions? Thanks Ding ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
On 06/08/2011 04:05 PM, Tomas Ögren wrote: On 08 June, 2011 - Donald Stahl sent me these 0,6K bytes: One day, the write performance of zfs degrade. The write performance decrease from 60MB/s to about 6MB/s in sequence write. Command: date;dd if=/dev/zero of=block bs=1024*128 count=1;date See this thread: http://www.opensolaris.org/jive/thread.jspa?threadID=139317tstart=45 And search in the page for: metaslab_min_alloc_size Try adjusting the metaslab size and see if it fixes your performance problem. And if pool usage is90%, then there's another problem (change of finding free space algorithm). /Tomas Tomas, Thanks for your suggestion. You are right. I have tune parameter metaslab_df_free_pct from 35 to 4 to reduce this problem some days ago. The performance keep good for about 1 week and performance degrade again. And I still not sure how many operation run into best fit block allocate policy and how many run into fist fit block allocate policy in current situation. It's very appreciate if you can help. Regards, Ding ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
On 06/08/2011 09:15 PM, Donald Stahl wrote: metaslab_min_alloc_size is not in use when block allocator isDynamic block allocator[1]. So it is not tunable parameter in my case. May I ask where it says this is not a tunable in that case? I've read through the code and I don't see what you are talking about. The problem you are describing- including the long time in function metaslab_block_picker exactly matches the block picker trying to find a large enough block and failing. What value do you get when you run: echo metaslab_min_alloc_size/K | mdb -kw ? You can always try setting it via: echo metaslab_min_alloc_size/Z 1000 | mdb -kw and if that doesn't work set it right back. I'm not familiar with the specifics of Solaris 10u8 so perhaps this is not a tunable in that version but if it is- I would suggest you try changing it. If your performance is as bad as you say then it can't hurt to try it. -Don Thanks very much, Don. In Solaris 10u8: root@nas-hz-01:~# uname -a SunOS nas-hz-01 5.10 Generic_141445-09 i86pc i386 i86pc root@nas-hz-01:~# echo metaslab_min_alloc_size/K | mdb -kw mdb: failed to dereference symbol: unknown symbol name root@nas-hz-01:~# The pool version is 15 and zfs version is 4. And this parameter is valid in my openindiana build 148, it's zpool version is 28 and zfs version is 5. ops@oi:~$ echo metaslab_min_alloc_size/Z 1000 | pfexec mdb -kw metaslab_min_alloc_size:0x1000 = 0x1000 ops@oi:~$ I'm not sure which version introduce the parameter. Should I run this openindiana? Any suggestions? Regards, Ding ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
On 06/09/2011 12:23 AM, Donald Stahl wrote: Another (less satisfying) workaround is to increase the amount of free space in the pool, either by reducing usage or adding more storage. Observed behavior is that allocation is fast until usage crosses a threshhold, then performance hits a wall. We actually tried this solution. We were at 70% usage and performance hit a wall. We figured it was because of the change of fit algorithm so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It made almost no difference in our pool performance. It wasn't until we told the metaslab allocator to stop looking for such large chunks that the problem went away. The original poster's pool is about 78% full. If possible, try freeing stuff until usage goes back under 75% or 70% and see if your performance returns. Freeing stuff did fix the problem for us (temporarily) but only in an indirect way. When we freed up a bunch of space, the metaslab allocator was able to find large enough blocks to write to without searching all over the place. This would fix the performance problem until those large free blocks got used up. Then- even though we were below the usage problem threshold from earlier- we would still have the performance problem. -Don ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Don, From your words, my symptom is almost same with yours. We have examine the metaslab layout, when metaslab_df_free_pct is 35, there are 65 free metaslab(64G), The write performance is very low and the rough test shows no new free metaslab will be loaded and activated. Then we tune the metaslab_df_free_pct to 4, the performance keep good for 1 week and the free metaslab reduce to 51. But now, the write bandwidth is poor again ( maybe I'd better trace the free space of each metaslab? ) Maybe there are some problem in metaslab rating score(weight) for select the metaslab and block allocator algorithm? There is snapshot of metaslab layout, the last 51 metaslabs have 64G free space. vdev offsetspacemap free -- --- --- - ... snip vdev 3 offset 270 spacemap440 free21.0G vdev 3 offset 280 spacemap 31 free7.36G vdev 3 offset 290 spacemap 32 free2.44G vdev 3 offset 2a0 spacemap 33 free2.91G vdev 3 offset 2b0 spacemap 34 free3.25G vdev 3 offset 2c0 spacemap 35 free3.03G vdev 3 offset 2d0 spacemap 36 free3.20G vdev 3 offset 2e0 spacemap 90 free3.28G vdev 3 offset 2f0 spacemap 91 free2.46G vdev 3 offset 300 spacemap 92 free2.98G vdev 3 offset 310 spacemap 93 free2.19G vdev 3 offset 320 spacemap 94 free2.42G vdev 3 offset 330 spacemap 95 free2.83G vdev 3 offset 340 spacemap252 free41.6G vdev 3 offset 350 spacemap 0 free 64G vdev 3 offset 360 spacemap 0 free 64G vdev 3 offset 370 spacemap 0 free 64G vdev 3 offset 380 spacemap 0 free 64G vdev 3 offset 390 spacemap 0 free 64G vdev 3 offset 3a0 spacemap 0 free 64G vdev 3 offset 3b0 spacemap 0 free 64G vdev 3 offset 3c0 spacemap 0 free 64G vdev 3 offset 3d0 spacemap 0 free 64G vdev 3 offset 3e0 spacemap 0 free 64G ...snip ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
On 06/09/2011 10:14 AM, Ding Honghui wrote: On 06/09/2011 12:23 AM, Donald Stahl wrote: Another (less satisfying) workaround is to increase the amount of free space in the pool, either by reducing usage or adding more storage. Observed behavior is that allocation is fast until usage crosses a threshhold, then performance hits a wall. We actually tried this solution. We were at 70% usage and performance hit a wall. We figured it was because of the change of fit algorithm so we added 16 2TB disks in mirrors. (Added 16TB to an 18TB pool). It made almost no difference in our pool performance. It wasn't until we told the metaslab allocator to stop looking for such large chunks that the problem went away. The original poster's pool is about 78% full. If possible, try freeing stuff until usage goes back under 75% or 70% and see if your performance returns. Freeing stuff did fix the problem for us (temporarily) but only in an indirect way. When we freed up a bunch of space, the metaslab allocator was able to find large enough blocks to write to without searching all over the place. This would fix the performance problem until those large free blocks got used up. Then- even though we were below the usage problem threshold from earlier- we would still have the performance problem. -Don ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Don, From your words, my symptom is almost same with yours. We have examine the metaslab layout, when metaslab_df_free_pct is 35, there are 65 free metaslab(64G), The write performance is very low and the rough test shows no new free metaslab will be loaded and activated. Then we tune the metaslab_df_free_pct to 4, the performance keep good for 1 week and the free metaslab reduce to 51. But now, the write bandwidth is poor again ( maybe I'd better trace the free space of each metaslab? ) Maybe there are some problem in metaslab rating score(weight) for select the metaslab and block allocator algorithm? There is snapshot of metaslab layout, the last 51 metaslabs have 64G free space. vdev offsetspacemap free -- --- --- - ... snip vdev 3 offset 270 spacemap440 free 21.0G vdev 3 offset 280 spacemap 31 free 7.36G vdev 3 offset 290 spacemap 32 free 2.44G vdev 3 offset 2a0 spacemap 33 free 2.91G vdev 3 offset 2b0 spacemap 34 free 3.25G vdev 3 offset 2c0 spacemap 35 free 3.03G vdev 3 offset 2d0 spacemap 36 free 3.20G vdev 3 offset 2e0 spacemap 90 free 3.28G vdev 3 offset 2f0 spacemap 91 free 2.46G vdev 3 offset 300 spacemap 92 free 2.98G vdev 3 offset 310 spacemap 93 free 2.19G vdev 3 offset 320 spacemap 94 free 2.42G vdev 3 offset 330 spacemap 95 free 2.83G vdev 3 offset 340 spacemap252 free 41.6G vdev 3 offset 350 spacemap 0 free 64G vdev 3 offset 360 spacemap 0 free 64G vdev 3 offset 370 spacemap 0 free 64G vdev 3 offset 380 spacemap 0 free 64G vdev 3 offset 390 spacemap 0 free 64G vdev 3 offset 3a0 spacemap 0 free 64G vdev 3 offset 3b0 spacemap 0 free 64G vdev 3 offset 3c0 spacemap 0 free 64G vdev 3 offset 3d0 spacemap 0 free 64G vdev 3 offset 3e0 spacemap 0 free 64G ...snip I free up some disk space(about 300GB), the performance is back again. I'm sure the performance will degrade again soon. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Wired write performance problem
Hi, I got a wired write performance and need your help. One day, the write performance of zfs degrade. The write performance decrease from 60MB/s to about 6MB/s in sequence write. Command: date;dd if=/dev/zero of=block bs=1024*128 count=1;date The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks. The OS is Solaris 10U8, zpool version 15 and zfs version 4. I run Dtrace to trace the write performance: fbt:zfs:zfs_write:entry { self-ts = timestamp; } fbt:zfs:zfs_write:return /self-ts/ { @time = quantize(timestamp-self-ts); self-ts = 0; } It shows value - Distribution - count 8192 | 0 16384 | 16 32768 | 3270 65536 |@@@ 898 131072 |@@@ 985 262144 | 33 524288 | 1 1048576 | 1 2097152 | 3 4194304 | 0 8388608 |@180 16777216 | 33 33554432 | 0 67108864 | 0 134217728 | 0 268435456 | 1 536870912 | 1 1073741824 | 2 2147483648 | 0 4294967296 | 0 8589934592 | 0 17179869184 | 2 34359738368 | 3 68719476736 | 0 Compare to a working well storage(1 MD3000), the max write time of zfs_write is 4294967296, it is about 10 times faster. Any suggestions? Thanks Ding ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Wired write performance problem
And one comment: When we do write operation(by command dd), heavy read operation increased from zero to 3M for each disk, and the write bandwidth is poor. The disk io %b increase from 0 to about 60. I don't understand why this happened. capacity operations bandwidth pool used avail read write read write -- - - - - - - datapool19.8T 5.48T543 47 1.74M 5.89M raidz15.64T 687G146 13 480K 1.66M c3t600221900085486703B2490FB009d0 - - 49 13 3.26M 293K c3t600221900085486703B4490FB063d0 - - 48 13 3.19M 296K c3t6002219000852889055F4CB79C10d0 - - 48 13 3.19M 293K c3t600221900085486703B8490FB0FFd0 - - 50 13 3.28M 284K c3t600221900085486703BA490FB14Fd0 - - 50 13 3.31M 287K c3t6002219000852889041C490FAFA0d0 - - 49 14 3.27M 297K c3t600221900085486703C0490FB27Dd0 - - 48 14 3.24M 300K raidz15.73T 594G102 7 337K 996K c3t600221900085486703C2490FB2BFd0 - - 52 5 3.59M 166K c3t6002219000852889041F490FAFD0d0 - - 54 5 3.72M 166K c3t60022190008528890428490FB0D8d0 - - 55 5 3.79M 166K c3t60022190008528890422490FB02Cd0 - - 52 5 3.57M 166K c3t60022190008528890425490FB07Cd0 - - 53 5 3.64M 166K c3t60022190008528890434490FB24Ed0 - - 55 5 3.76M 166K c3t6002219000852889043949100968d0 - - 55 5 3.83M 166K raidz15.81T 519G117 10 388K 1.26M c3t6002219000852889056B4CB79D66d0 - - 46 9 3.09M 215K c3t600221900085486704B94CB79F91d0 - - 44 9 2.91M 215K c3t600221900085486704BB4CB79FE1d0 - - 44 9 2.97M 224K c3t600221900085486704BD4CB7A035d0 - - 44 9 2.96M 215K c3t600221900085486704BF4CB7A0ABd0 - - 44 9 2.97M 216K c3t6002219000852889055C4CB79BB8d0 - - 45 9 3.04M 215K c3t600221900085486704C14CB7A0FDd0 - - 46 9 3.02M 215K raidz12.59T 3.72T176 16 581K 2.00M c3t6002219000852889042B490FB124d0 - - 48 5 3.21M 342K c3t600221900085486704C54CB7A199d0 - - 46 5 2.99M 342K c3t600221900085486704C74CB7A1D5d0 - - 49 5 3.27M 342K c3t600221900085288905594CB79B64d0 - - 46 6 3.00M 342K c3t600221900085288905624CB79C86d0 - - 47 6 3.11M 342K c3t600221900085288905654CB79CCCd0 - - 50 6 3.29M 342K c3t600221900085288905684CB79D1Ed0 - - 45 5 2.98M 342K c3t6B8AC6FF837605864DC9E9F1d0 4K 928G 0 0 0 0 -- - - - - - - ^C root@nas-hz-01:~# On 06/08/2011 11:07 AM, Ding Honghui wrote: Hi, I got a wired write performance and need your help. One day, the write performance of zfs degrade. The write performance decrease from 60MB/s to about 6MB/s in sequence write. Command: date;dd if=/dev/zero of=block bs=1024*128 count=1;date The hardware configuration is 1 Dell MD3000 and 1 MD1000 with 30 disks. The OS is Solaris 10U8, zpool version 15 and zfs version 4. I run Dtrace to trace the write performance: fbt:zfs:zfs_write:entry { self-ts = timestamp; } fbt:zfs:zfs_write:return /self-ts/ { @time = quantize(timestamp-self-ts); self-ts = 0; } It shows value - Distribution - count 8192 | 0 16384 | 16 32768 | 3270 65536 |@@@ 898 131072 |@@@ 985 262144 | 33 524288 | 1 1048576 | 1 2097152 | 3 4194304 | 0 8388608 |@180 16777216 | 33 33554432 | 0