Re: [zfs-discuss] Re: slow reads question...
Harley Gorrell writes: On Fri, 22 Sep 2006, [EMAIL PROTECTED] wrote: Are you just trying to measure ZFS's read performance here? That is what I started looking at. We scrounged around and found a set of 300GB drives to replace the old ones we started with. Comparing these new drives to the old ones: Old 36GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real2m31.991s | user0m0.007s | sys 0m0.923s Newer 300GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real0m8.425s | user0m0.010s | sys 0m1.809s At this point I am pretty happy. This looks like on the second run, you had lots more free memory and mkfile completed near memcpy speed. Something is awry on the first pass though. Then, zpool iostat 1 can put some lights on this. IO will keep on going after the mkfile completes in the second case. For the first one, there may have been an interaction with not yet finished I/O loads ? -r I am wondering if there is something other than capacity and seek time which has changed between the drives. Would a different scsi command set or features have this dramatic a difference? thanks!, harley. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
On Mon, 25 Sep 2006, Roch wrote: This looks like on the second run, you had lots more free memory and mkfile completed near memcpy speed. Both times the system was near idle. Something is awry on the first pass though. Then, zpool iostat 1 can put some lights on this. IO will keep on going after the mkfile completes in the second case. For the first one, there may have been an interaction with not yet finished I/O loads ? The old drives arent in the system, but I did try this with the new drives. I ran mkfile -v 1g zeros-1g a couple times while zpool iostat -v 1 was running in another window. There were a seven stats like this first one where it is writing to disk. The next to last is were the bandwidth drops as there isnt enough IO to fill out that second. Followed by zeros of no IO. I didnt see any write behind -- Once the IO was done I didnt see more until I started something else. |capacity operationsbandwidth | pool used avail read write read write | -- - - - - - - | tank26.1G 1.34T 0 1.13K 0 134M | raidz126.1G 1.34T 0 1.13K 0 134M | c0t1d0 - - 0367 0 33.6M | c0t2d0 - - 0377 0 35.5M | c0t3d0 - - 0401 0 35.0M | c0t4d0 - - 0411 0 36.0M | c0t5d0 - - 0424 0 34.9M | -- - - - - - - | |capacity operationsbandwidth | pool used avail read write read write | -- - - - - - - | tank26.4G 1.34T 0 1.01K560 118M | raidz126.4G 1.34T 0 1.01K560 118M | c0t1d0 - - 0307 0 29.6M | c0t2d0 - - 0309 0 27.6M | c0t3d0 - - 0331 0 28.1M | c0t4d0 - - 0338 35.0K 27.0M | c0t5d0 - - 0338 35.0K 28.3M | -- - - - - - - | |capacity operationsbandwidth | pool used avail read write read write | -- - - - - - - | tank26.4G 1.34T 0 0 0 0 | raidz126.4G 1.34T 0 0 0 0 | c0t1d0 - - 0 0 0 0 | c0t2d0 - - 0 0 0 0 | c0t3d0 - - 0 0 0 0 | c0t4d0 - - 0 0 0 0 | c0t5d0 - - 0 0 0 0 | -- - - - - - - As things stand now, I am happy. I do wonder what accounts for the improvement -- seek time, transfer rate, disk cache, or something else? Does anywone have a dtrace script to measure this which they would share? harley. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
Harley Gorrell wrote: I do wonder what accounts for the improvement -- seek time, transfer rate, disk cache, or something else? Does anywone have a dtrace script to measure this which they would share? You might also be seeing the effects of defect management. As drives get older, they tend to find and repair more defects. This will slow the performance of the drive, though I've not seen this sort of extreme. You might infer this from a dtrace script which would record the service time per iop -- in which case you may see some iops with much larger service times than normal. I would expect this to be a second order effect. Meanwhile, you should check to make sure you're tranferring data at the rate you think (SCSI autonegotiates data transfer rates). If you know the model number, you can get the rotational speed and average seek times to see if that is radically different for the two disk types. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: slow reads question...
ZFS uses a 128k block size. If you change dd to use a bs=128k, do you observe any performance improvement? | # time dd if=zeros-10g of=/dev/null bs=8k count=102400 | 102400+0 records in | 102400+0 records out | real1m8.763s | user0m0.104s | sys 0m1.759s It's also worth noting that this dd used less system and user time than the read from the raw device, yet took a longer time in real time. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
On Fri, 22 Sep 2006, johansen wrote: ZFS uses a 128k block size. If you change dd to use a bs=128k, do you observe any performance improvement? I had tried other sizes with much the same results, but hadnt gone as large as 128K. With bs=128K, it gets worse: | # time dd if=zeros-10g of=/dev/null bs=128k count=102400 | 81920+0 records in | 81920+0 records out | | real2m19.023s | user0m0.105s | sys 0m8.514s It's also worth noting that this dd used less system and user time than the read from the raw device, yet took a longer time in real time. I think some of the blocks might be cached, as I have run this a number of times. I really dont know how the time might be accounted for -- However, the real time is correct as that is what I see while waiting for the command to complete. Is there any other info I can provide which would help? harley. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
Harley: I had tried other sizes with much the same results, but hadnt gone as large as 128K. With bs=128K, it gets worse: | # time dd if=zeros-10g of=/dev/null bs=128k count=102400 | 81920+0 records in | 81920+0 records out | | real2m19.023s | user0m0.105s | sys 0m8.514s I may have done my math wrong, but if we assume that the real time is the actual amount of time we spent performing the I/O (which may be incorrect) haven't you done better here? In this case you pushed 81920 128k records in ~139 seconds -- approx 75437 k/sec. Using ZFS with 8k bs, you pushed 102400 8k records in ~68 seconds -- approx 12047 k/sec. Using the raw device you pushed 102400 8k records in ~23 seconds -- approx 35617 k/sec. I may have missed something here, but isn't this newest number the highest performance so far? What does iostat(1M) say about your disk read performance? Is there any other info I can provide which would help? Are you just trying to measure ZFS's read performance here? It might be interesting to change your outfile (of) argument and see if we're actually running into some other performance problem. If you change of=/tmp/zeros does performance improve or degrade? Likewise, if you write the file out to another disk (UFS, ZFS, whatever), does this improve performance? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
On Fri, 22 Sep 2006, [EMAIL PROTECTED] wrote: Are you just trying to measure ZFS's read performance here? That is what I started looking at. We scrounged around and found a set of 300GB drives to replace the old ones we started with. Comparing these new drives to the old ones: Old 36GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real2m31.991s | user0m0.007s | sys 0m0.923s Newer 300GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real0m8.425s | user0m0.010s | sys 0m1.809s At this point I am pretty happy. I am wondering if there is something other than capacity and seek time which has changed between the drives. Would a different scsi command set or features have this dramatic a difference? thanks!, harley. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: slow reads question...
Harley: Old 36GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real2m31.991s | user0m0.007s | sys 0m0.923s Newer 300GB drives: | # time mkfile -v 1g zeros-1g | zeros-1g 1073741824 bytes | | real0m8.425s | user0m0.010s | sys 0m1.809s This is a pretty dramatic difference. What type of drives were your old 36g drives? I am wondering if there is something other than capacity and seek time which has changed between the drives. Would a different scsi command set or features have this dramatic a difference? I'm hardly the authority on hardware, but there are a couple of possibilties. Your newer drives may have a write cache. It's also quite likely that the newer drives have a faster speed of rotation and seek time. If you subtract the usr + sys time from the real time in these measurements, I suspect the result is the amount of time you were actually waiting for the I/O to finish. In the first case, you spent 99% of your total time waiting for stuff to happen, whereas in the second case it was only ~86% of your overall time. -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss