Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
queuing theory should explain this rather nicely. iostat measures %busy by counting if there is an entry in the queue for the clock ticks. There are two queues, one in the controller and one on the disk. As you can clearly see the way ZFS pushes the load is very different than dd or UFS. -- richard Marko Milisavljevic wrote: I am very grateful to everyone who took the time to run a few tests to help me figure what is going on. As per j's suggestions, I tried some simultaneous reads, and a few other things, and I am getting interesting and confusing results. All tests are done using two Seagate 320G drives on sil3114. In each test I am using dd if= of=/dev/null bs=128k count=1. Each drive is freshly formatted with one 2G file copied to it. That way dd from raw disk and from file are using roughly same area of disk. I tried using raw, zfs and ufs, single drives and two simultaneously (just executing dd commands in separate terminal windows). These are snapshots of iostat -xnczpm 3 captured somewhere in the middle of the operation. I am not bothering to report CPU% as it never rose over 50%, and was uniformly proportional to reported throughput. single drive raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1378.40.0 77190.70.0 0.0 1.70.01.2 0 98 c0d1 single drive, ufs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1255.10.0 69949.60.0 0.0 1.80.01.4 0 100 c0d0 Small slowdown, but pretty good. single drive, zfs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 258.3 0.0 33066.60.0 33.0 2.0 127.77.7 100 100 c0d1 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / r/s gives 256K, as I would imagine it should. simultaneous raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 797.00.0 44632.00.0 0.0 1.80.02.3 0 100 c0d0 795.70.0 44557.40.0 0.0 1.80.02.3 0 100 c0d1 This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is to serve files on gigabit SOHO network. sumultaneous raw on c0d1 and ufs on c0d0: extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 722.40.0 40246.80.0 0.0 1.80.02.5 0 100 c0d0 717.10.0 40156.20.0 0.0 1.80.02.5 0 99 c0d1 hmm, can no longer get the 90MB/sec. simultaneous zfs on c0d1 and raw on c0d0: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.70.01.8 0.0 0.00.00.1 0 0 c1d0 334.90.0 18756.00.0 0.0 1.90.05.5 0 97 c0d0 172.50.0 22074.60.0 33.0 2.0 191.3 11.6 100 100 c0d1 Everything is slow. What happens if we throw onboard IDE interface into the mix? simultaneous raw SATA and raw PATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1036.30.3 58033.90.3 0.0 1.60.01.6 0 99 c1d0 1422.60.0 79668.30.0 0.0 1.60.01.1 1 98 c0d0 Both at maximum throughput. Read ZFS on SATA drive and raw disk on PATA interface: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1018.90.3 57056.14.0 0.0 1.70.01.7 0 99 c1d0 268.40.0 34353.1 0.0 33.0 2.0 122.97.5 100 100 c0d0 SATA is slower with ZFS as expected by now, but ATA remains at full speed. So they are operating quite independantly. Except... What if we read a UFS file from the PATA disk and ZFS from SATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 792.80.0 44092.90.0 0.0 1.80.02.2 1 98 c1d0 224.00.0 28675.20.0 33.0 2.0 147.38.9 100 100 c0d0 Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a number of times, not a fluke. Finally, after reviewing all this, I've noticed another interesting bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is 56k, suggesting that underlying IO system is using that as some kind of a native block size? (even though dd is requesting 128k). But when reading ZFS files, this always comes to 128k, which is expected, since that is ZFS default (and same thing happens regardless of bs= in dd). On the theory that my system just doesn't like 128k reads (I'm desperate!), and that this would explain the whole slowdown and wait/wsvc_t column, I tried changing recsize to 32k and rewriting the test file. However, accessing ZFS files continues to show 128k reads, and it is just as slow. Is there a way to either confirm that the ZFS file in question is indeed written with 32k records or, even better, to force ZFS to use 56k when accessing the disk. Or perhaps I just misunderstand implications of iostat output. I've repeated each of these tests a few times and doublechecked, and the numbers, although snapshots
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
> >*sata_hba_list::list sata_hba_inst_t satahba_next | ::print > >sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | > >::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | > >::print -a sata_drive_info_t satadrv_features_support satadrv_settings > >satadrv_features_enabled > This gives me "mdb: failed to dereference symbol: unknown symbol > name". You may not have the SATA module installed. If you type: ::modinfo ! grep sata and don't get any output, your sata driver is attached some other way. My apologies for the confusion. -K ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On 5/15/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Each drive is freshly formatted with one 2G file copied to it. How are you creating each of these files? zpool create tank c0d0 c0d1; zfs create tank/test; cp ~/bigfile /tank/test/ Actual content of the file is random junk from /dev/random. Also, would you please include a the output from the isalist(1) command? pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86 Have you double-checked that this isn't a measurement problem by measuring zfs with zpool iostat (see zpool(1M)) and verifying that outputs from both iostats match? Both give same kb/s. How much memory is in this box? 1.5g, I can see in /var/adm/messages that it is recognized. As root, type mdb -k, and then at the ">" prompt that appears, enter the following command (this is one very long line): *sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t satadrv_features_support satadrv_settings satadrv_features_enabled This gives me "mdb: failed to dereference symbol: unknown symbol name". I don't know enough about the syntax here to try to isolate which token it is complaining about. But, I don't know if my PCI/SATA card is going through sd driver, if that is what commands above assume... my understanding is that sil3114 goes through ata driver, as per this blog: http://blogs.sun.com/mlf/entry/ata_on_solaris_x86_at If there is any other testing I can do, I would be happy to. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
> Each drive is freshly formatted with one 2G file copied to it. How are you creating each of these files? Also, would you please include a the output from the isalist(1) command? > These are snapshots of iostat -xnczpm 3 captured somewhere in the > middle of the operation. Have you double-checked that this isn't a measurement problem by measuring zfs with zpool iostat (see zpool(1M)) and verifying that outputs from both iostats match? > single drive, zfs file >r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 258.30.0 33066.60.0 33.0 2.0 127.77.7 100 100 c0d1 > > Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / > r/s gives 256K, as I would imagine it should. Not sure. If we can figure out why ZFS is slower than raw disk access in your case, it may explain why you're seeing these results. > What if we read a UFS file from the PATA disk and ZFS from SATA: >r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device > 792.80.0 44092.90.0 0.0 1.80.02.2 1 98 c1d0 > 224.00.0 28675.20.0 33.0 2.0 147.38.9 100 100 c0d0 > > Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a > number of times, not a fluke. This could be cache interference. ZFS and UFS use different caches. How much memory is in this box? > I have no idea what to make of all this, except that it ZFS has a problem > with this hardware/drivers that UFS and other traditional file systems, > don't. Is it a bug in the driver that ZFS is inadvertently exposing? A > specific feature that ZFS assumes the hardware to have, but it doesn't? Who > knows! This may be a more complicated interaction than just ZFS and your hardware. There are a number of layers of drivers underneath ZFS that may also be interacting with your hardware in an unfavorable way. If you'd like to do a little poking with MDB, we can see the features that your SATA disks claim they support. As root, type mdb -k, and then at the ">" prompt that appears, enter the following command (this is one very long line): *sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t satadrv_features_support satadrv_settings satadrv_features_enabled This should show satadrv_features_support, satadrv_settings, and satadrv_features_enabled for each SATA disk on the system. The values for these variables are defined in: http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sata/impl/sata.h this is the relevant snippet for interpreting these values: /* * Device feature_support (satadrv_features_support) */ #define SATA_DEV_F_DMA 0x01 #define SATA_DEV_F_LBA280x02 #define SATA_DEV_F_LBA480x04 #define SATA_DEV_F_NCQ 0x08 #define SATA_DEV_F_SATA10x10 #define SATA_DEV_F_SATA20x20 #define SATA_DEV_F_TCQ 0x40/* Non NCQ tagged queuing */ /* * Device features enabled (satadrv_features_enabled) */ #define SATA_DEV_F_E_TAGGED_QING0x01/* Tagged queuing enabled */ #define SATA_DEV_F_E_UNTAGGED_QING 0x02/* Untagged queuing enabled */ /* * Drive settings flags (satdrv_settings) */ #define SATA_DEV_READ_AHEAD 0x0001 /* Read Ahead enabled */ #define SATA_DEV_WRITE_CACHE0x0002 /* Write cache ON */ #define SATA_DEV_SERIAL_FEATURES0x8000 /* Serial ATA feat. enabled */ #define SATA_DEV_ASYNCH_NOTIFY 0x2000 /* Asynch-event enabled */ This may give us more information if this is indeed a problem with hardware/drivers supporting the right features. -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
I am very grateful to everyone who took the time to run a few tests to help me figure what is going on. As per j's suggestions, I tried some simultaneous reads, and a few other things, and I am getting interesting and confusing results. All tests are done using two Seagate 320G drives on sil3114. In each test I am using dd if= of=/dev/null bs=128k count=1. Each drive is freshly formatted with one 2G file copied to it. That way dd from raw disk and from file are using roughly same area of disk. I tried using raw, zfs and ufs, single drives and two simultaneously (just executing dd commands in separate terminal windows). These are snapshots of iostat -xnczpm 3 captured somewhere in the middle of the operation. I am not bothering to report CPU% as it never rose over 50%, and was uniformly proportional to reported throughput. single drive raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1378.40.0 77190.70.0 0.0 1.70.01.2 0 98 c0d1 single drive, ufs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1255.10.0 69949.60.0 0.0 1.80.01.4 0 100 c0d0 Small slowdown, but pretty good. single drive, zfs file r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 258.30.0 33066.60.0 33.0 2.0 127.77.7 100 100 c0d1 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / r/s gives 256K, as I would imagine it should. simultaneous raw: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 797.00.0 44632.00.0 0.0 1.80.02.3 0 100 c0d0 795.70.0 44557.40.0 0.0 1.80.02.3 0 100 c0d1 This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is to serve files on gigabit SOHO network. sumultaneous raw on c0d1 and ufs on c0d0: extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 722.40.0 40246.80.0 0.0 1.80.02.5 0 100 c0d0 717.10.0 40156.20.0 0.0 1.80.02.5 0 99 c0d1 hmm, can no longer get the 90MB/sec. simultaneous zfs on c0d1 and raw on c0d0: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.70.01.8 0.0 0.00.00.1 0 0 c1d0 334.90.0 18756.00.0 0.0 1.90.05.5 0 97 c0d0 172.50.0 22074.60.0 33.0 2.0 191.3 11.6 100 100 c0d1 Everything is slow. What happens if we throw onboard IDE interface into the mix? simultaneous raw SATA and raw PATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1036.30.3 58033.90.3 0.0 1.60.01.6 0 99 c1d0 1422.60.0 79668.30.0 0.0 1.60.01.1 1 98 c0d0 Both at maximum throughput. Read ZFS on SATA drive and raw disk on PATA interface: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1018.90.3 57056.14.0 0.0 1.70.01.7 0 99 c1d0 268.40.0 34353.10.0 33.0 2.0 122.97.5 100 100 c0d0 SATA is slower with ZFS as expected by now, but ATA remains at full speed. So they are operating quite independantly. Except... What if we read a UFS file from the PATA disk and ZFS from SATA: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 792.80.0 44092.90.0 0.0 1.80.02.2 1 98 c1d0 224.00.0 28675.20.0 33.0 2.0 147.38.9 100 100 c0d0 Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a number of times, not a fluke. Finally, after reviewing all this, I've noticed another interesting bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is 56k, suggesting that underlying IO system is using that as some kind of a native block size? (even though dd is requesting 128k). But when reading ZFS files, this always comes to 128k, which is expected, since that is ZFS default (and same thing happens regardless of bs= in dd). On the theory that my system just doesn't like 128k reads (I'm desperate!), and that this would explain the whole slowdown and wait/wsvc_t column, I tried changing recsize to 32k and rewriting the test file. However, accessing ZFS files continues to show 128k reads, and it is just as slow. Is there a way to either confirm that the ZFS file in question is indeed written with 32k records or, even better, to force ZFS to use 56k when accessing the disk. Or perhaps I just misunderstand implications of iostat output. I've repeated each of these tests a few times and doublechecked, and the numbers, although snapshots of a point in time, fairly represent averages. I have no idea what to make of all this, except that it ZFS has a problem with this hardware/drivers that UFS and other traditional file systems, don't. Is it a bug in the driver that ZFS is inadvertently exposing? A specific feature that ZFS assumes the hardware to have, but it doesn't? Who knows! I will have to give up on Solaris/ZFS on this hardware for now,
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On Mon, 14 May 2007, Marko Milisavljevic wrote: > Thank you, Al. > > Would you mind also doing: > > ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1 # ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1 real 20.046 user0.013 sys 3.568 > to see the raw performance of underlying hardware. Regards, Al Hopper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko, I tried this experiment again using 1 disk and got nearly identical times: # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real 21.4 user0.0 sys 2.4 $ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real 21.0 user0.0 sys 0.7 > [I]t is not possible for dd to meaningfully access multiple-disk > configurations without going through the file system. I find it > curious that there is such a large slowdown by going through file > system (with single drive configuration), especially compared to UFS > or ext3. Comparing a filesystem to raw dd access isn't a completely fair comparison either. Few filesystems actually layout all of their data and metadata so that every read is a completely sequential read. > I simply have a small SOHO server and I am trying to evaluate which OS to > use to keep a redundant disk array. With unreliable consumer-level hardware, > ZFS and the checksum feature are very interesting and the primary selling > point compared to a Linux setup, for as long as ZFS can generate enough > bandwidth from the drive array to saturate single gigabit ethernet. I would take Bart's reccomendation and go with Solaris on something like a dual-core box with 4 disks. > My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114 > SATA controller on a 32-bit AthlonXP, according to many posts I found. Bill Moore lists some controller reccomendations here: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html > However, since dd over raw disk is capable of extracting 75+MB/s from this > setup, I keep feeling that surely I must be able to get at least that much > from reading a pair of striped or mirrored ZFS drives. But I can't - single > drive or 2-drive stripes or mirrors, I only get around 34MB/s going through > ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) Maybe this is a problem with your controller? What happens when you have two simultaneous dd's to different disks running? This would simulate the case where you're reading from the two disks at the same time. -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Don't know how much this will help, but my results: Ultra 20 we just got at work: # uname -a SunOS unknown 5.10 Generic_118855-15 i86pc i386 i86pc raw disk dd if=/dev/dsk/c1d0s6 of=/dev/null bs=128k count=1 0.00s user 2.16s system 14% cpu 15.131 total 1,280,000k in 15.131 seconds 84768k/s through filesystem dd if=testfile of=/dev/null bs=128k count=1 0.01s user 0.88s system 4% cpu 19.666 total 1,280,000k in 19.666 seconds 65087k/s AMD64 Freebsd 7 on a Lenovo something or other, Athlon X2 3800+ uname -a FreeBSD 7.0-CURRENT-200705 FreeBSD 7.0-CURRENT-200705 #0: Fri May 11 14:41:37 UTC 2007 root@:/usr/src/sys/amd64/compile/ZFS amd64 raw disk dd if=/dev/ad6p1 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 17.126926 secs (76529787 bytes/sec) (74735k/s) filesystem # dd of=/dev/null if=testfile bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 17.174395 secs (76318263 bytes/sec) (74529k/s) Odd to say the least since "du" for instance is faster on Solaris ZFS... FWIW Freebsd is running version 6 of ZFS and the unpatched but _new_ Ultra 20 is running version 2 of ZFS according to zdb Make sure you're all patched up? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you, Ian, You are getting ZFS over 2-disk RAID-0 to be twice as fast as dd raw disk read on one disk, which sounds more encouraging. But, there is something odd with dd from raw drive - it is only 28MB/s or so, if I divided that right? I would expect it to be around 100MB/s on 10K drives, or at least that should be roughly potential throughput rate. Compared to throughput from ZFS 2-disk RAID-0 which is showing 57MB/s. Any idea why raw dd read is so slow? Also, I wonder if everyone is using different dd command then I am - I get summary line that shows elapsed time and MB/s. On 5/14/07, Ian Collins <[EMAIL PROTECTED]> wrote: Marko Milisavljevic wrote: > To reply to my own message this article offers lots of insight into why dd access directly through raw disk is fast, while accessing a file through the file system may be slow. > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > So, I guess what I'm wondering now is, does it happen to everyone that ZFS is under half the speed of raw disk access? What speeds are other people getting trying to dd a file through zfs file system? Something like > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS block size) > > how does that compare to: > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1 > > Testing on a old Athlon MP box, two U160 10K SCSI drives. bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real0m44.470s user0m0.018s sys 0m8.290s time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real0m22.714s user0m0.020s sys 0m3.228s zpool status pool: test state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 mirrorONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Right now, the AthlonXP machine is booted into Linux, and I'm getting same raw speed as when it is in Solaris, from PCI Sil3114 with Seagate 320G ( 7200.10): dd if=/dev/sdb of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes (1.3 GB) copied, 16.7756 seconds, 78.1 MB/s sudo dd if=./test.mov of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes (1.3 GB) copied, 24.2731 seconds, 54.0 MB/s <-- some overhead compared to raw speed of same disk above same machine, onboard ATA, Seagate 120G: dd if=/dev/hda of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes (1.3 GB) copied, 22.5892 seconds, 58.0 MB/s On another machine with Pentium D 3.0GHz and ICH7 onboard SATA in AHCI mode, running Darwin OS: from a Seagate 500G (7200.10): dd if=/dev/rdisk0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 17.697512 secs (74062388 bytes/sec) same disk, access through file system (HFS+) dd if=./Summer\ 2006\ with\ Cohen\ 4 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 20.381901 secs (64308035 bytes/sec) <- very small overhead compared to raw access above! same Intel machine, Seagate 200G (7200.8, I think): dd if=/dev/rdisk1 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out 131072 bytes transferred in 20.850229 secs (62863578 bytes/sec) Modern disk drives are definitely fast and pushing close to 80MB/s raw performance. And some file systems can get over 85% of that with simple sequential access. So far, on these particular hardware and software combinations, I have, filesystem performance as percentage of raw disk performance for sequential unchached read: HFS+: 86% ext3 and UFS: 70% ZFS: 45% On 5/14/07, Richard Elling <[EMAIL PROTECTED]> wrote: Marko Milisavljevic wrote: > I missed an important conclusion from j's data, and that is that single > disk raw access gives him 56MB/s, and RAID 0 array gives him > 961/46=21MB/s per disk, which comes in at 38% of potential performance. > That is in the ballpark of getting 45% of potential performance, as I am > seeing with my puny setup of single or dual drives. Of course, I don't > expect a complex file system to match raw disk dd performance, but it > doesn't compare favourably to common file systems like UFS or ext3, so > the question remains, is ZFS overhead normally this big? That would mean > that one needs to have at least 4-5 way stripe to generate enough data > to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" > filesystem, a possibly important consideration in SOHO situation. Could you post iostat data for these runs? Also, as I suggested previously, try with checksum off. AthlonXP doesn't have a reputation as a speed deamon. BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s isn't bad. The media speed will range from perhaps [30-40]-[60-75] MBytes/s judging from a quick scan of disk vendor datasheets. In other words, it would not surprise me to see 4-5 way stripe being required to keep a GbE saturated. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote: > To reply to my own message this article offers lots of insight into why > dd access directly through raw disk is fast, while accessing a file through > the file system may be slow. > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > So, I guess what I'm wondering now is, does it happen to everyone that ZFS is > under half the speed of raw disk access? What speeds are other people getting > trying to dd a file through zfs file system? Something like > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default > ZFS block size) > > how does that compare to: > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1 > > Testing on a old Athlon MP box, two U160 10K SCSI drives. bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real0m44.470s user0m0.018s sys 0m8.290s time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real0m22.714s user0m0.020s sys 0m3.228s zpool status pool: test state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM testONLINE 0 0 0 mirrorONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote: I missed an important conclusion from j's data, and that is that single disk raw access gives him 56MB/s, and RAID 0 array gives him 961/46=21MB/s per disk, which comes in at 38% of potential performance. That is in the ballpark of getting 45% of potential performance, as I am seeing with my puny setup of single or dual drives. Of course, I don't expect a complex file system to match raw disk dd performance, but it doesn't compare favourably to common file systems like UFS or ext3, so the question remains, is ZFS overhead normally this big? That would mean that one needs to have at least 4-5 way stripe to generate enough data to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" filesystem, a possibly important consideration in SOHO situation. I don't see this on my system, but it has more CPU (dual core 2.6 GHz). It saturates a GB net w/ 4 drives & samba, not working hard at all. A thumper does 2 GB/sec w 2 dual core CPUs. Do you have compression enabled? This can be a choke point for weak CPUs. - Bart Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you, Al. Would you mind also doing: ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1 to see the raw performance of underlying hardware. On 5/14/07, Al Hopper <[EMAIL PROTECTED]> wrote: # ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real6.407 user0.008 sys 1.624 pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 3-way mirror: 1+0 records in 1+0 records out real 12.500 user0.007 sys 1.216 2-way mirror: 1+0 records in 1+0 records out real 18.356 user0.006 sys 0.935 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Marko Milisavljevic wrote: I missed an important conclusion from j's data, and that is that single disk raw access gives him 56MB/s, and RAID 0 array gives him 961/46=21MB/s per disk, which comes in at 38% of potential performance. That is in the ballpark of getting 45% of potential performance, as I am seeing with my puny setup of single or dual drives. Of course, I don't expect a complex file system to match raw disk dd performance, but it doesn't compare favourably to common file systems like UFS or ext3, so the question remains, is ZFS overhead normally this big? That would mean that one needs to have at least 4-5 way stripe to generate enough data to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" filesystem, a possibly important consideration in SOHO situation. Could you post iostat data for these runs? Also, as I suggested previously, try with checksum off. AthlonXP doesn't have a reputation as a speed deamon. BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s isn't bad. The media speed will range from perhaps [30-40]-[60-75] MBytes/s judging from a quick scan of disk vendor datasheets. In other words, it would not surprise me to see 4-5 way stripe being required to keep a GbE saturated. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
On Mon, 14 May 2007, Marko Milisavljevic wrote: > To reply to my own message this article offers lots of insight into why > dd access directly through raw disk is fast, while accessing a file through > the file system may be slow. > > http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 > > So, I guess what I'm wondering now is, does it happen to everyone that ZFS is > under half the speed of raw disk access? What speeds are other people getting > trying to dd a file through zfs file system? Something like > > dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default > ZFS block size) > > how does that compare to: > > dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1 > > If you could please post your MB/s and show output of zpool status so we > can see your disk configuration I would appreciate it. Please use file > that is 100MB or more - result is be too random with small files. Also > make sure zfs is not caching the file already! # ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real6.407 user0.008 sys 1.624 pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 3-way mirror: 1+0 records in 1+0 records out real 12.500 user0.007 sys 1.216 2-way mirror: 1+0 records in 1+0 records out real 18.356 user0.006 sys 0.935 # psrinfo -v Status of virtual processor 0 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:21. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:24. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:26. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 05/14/2007 17:31:18 on-line since 05/03/2007 08:01:28. The i386 processor operates at 2009 MHz, and has an i387 compatible floating point processor. > What I am seeing is that ZFS performance for sequential access is about 45% > of raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For > workload consisting mostly of reading large files sequentially, it would seem > then that ZFS is the wrong tool performance-wise. But, it could be just my > setup, so I would appreciate more data points. > Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
I missed an important conclusion from j's data, and that is that single disk raw access gives him 56MB/s, and RAID 0 array gives him 961/46=21MB/s per disk, which comes in at 38% of potential performance. That is in the ballpark of getting 45% of potential performance, as I am seeing with my puny setup of single or dual drives. Of course, I don't expect a complex file system to match raw disk dd performance, but it doesn't compare favourably to common file systems like UFS or ext3, so the question remains, is ZFS overhead normally this big? That would mean that one needs to have at least 4-5 way stripe to generate enough data to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" filesystem, a possibly important consideration in SOHO situation. On 5/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: This certainly isn't the case on my machine. $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real1.3 user0.0 sys 1.2 # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real 22.3 user0.0 sys 2.2 This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. My pool is configured into a 46 disk RAID-0 stripe. I'm going to omit the zpool status output for the sake of brevity. > What I am seeing is that ZFS performance for sequential access is > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > around 70%. For workload consisting mostly of reading large files > sequentially, it would seem then that ZFS is the wrong tool > performance-wise. But, it could be just my setup, so I would > appreciate more data points. This isn't what we've observed in much of our performance testing. It may be a problem with your config, although I'm not an expert on storage configurations. Would you mind providing more details about your controller, disks, and machine setup? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
Thank you for those numbers. I should have mentioned that I was mostly interested in single disk or small array performance, as it is not possible for dd to meaningfully access multiple-disk configurations without going through the file system. I find it curious that there is such a large slowdown by going through file system (with single drive configuration), especially compared to UFS or ext3. I simply have a small SOHO server and I am trying to evaluate which OS to use to keep a redundant disk array. With unreliable consumer-level hardware, ZFS and the checksum feature are very interesting and the primary selling point compared to a Linux setup, for as long as ZFS can generate enough bandwidth from the drive array to saturate single gigabit ethernet. My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114 SATA controller on a 32-bit AthlonXP, according to many posts I found. However, since dd over raw disk is capable of extracting 75+MB/s from this setup, I keep feeling that surely I must be able to get at least that much from reading a pair of striped or mirrored ZFS drives. But I can't - single drive or 2-drive stripes or mirrors, I only get around 34MB/s going through ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) Everything is stock Nevada b63 installation, so I haven't messed it up with misguided tuning attempts. Don't know if it matters, but test file was created originally from /dev/random. Compression is off, and everything is default. CPU utilization remains low at all times (haven't seen it go over 25%). On 5/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: This certainly isn't the case on my machine. $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real1.3 user0.0 sys 1.2 # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real 22.3 user0.0 sys 2.2 This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. My pool is configured into a 46 disk RAID-0 stripe. I'm going to omit the zpool status output for the sake of brevity. > What I am seeing is that ZFS performance for sequential access is > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > around 70%. For workload consisting mostly of reading large files > sequentially, it would seem then that ZFS is the wrong tool > performance-wise. But, it could be just my setup, so I would > appreciate more data points. This isn't what we've observed in much of our performance testing. It may be a problem with your config, although I'm not an expert on storage configurations. Would you mind providing more details about your controller, disks, and machine setup? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
This certainly isn't the case on my machine. $ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real1.3 user0.0 sys 1.2 # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1 1+0 records in 1+0 records out real 22.3 user0.0 sys 2.2 This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool. My pool is configured into a 46 disk RAID-0 stripe. I'm going to omit the zpool status output for the sake of brevity. > What I am seeing is that ZFS performance for sequential access is > about 45% of raw disk access, while UFS (as well as ext3 on Linux) is > around 70%. For workload consisting mostly of reading large files > sequentially, it would seem then that ZFS is the wrong tool > performance-wise. But, it could be just my setup, so I would > appreciate more data points. This isn't what we've observed in much of our performance testing. It may be a problem with your config, although I'm not an expert on storage configurations. Would you mind providing more details about your controller, disks, and machine setup? -j ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?
To reply to my own message this article offers lots of insight into why dd access directly through raw disk is fast, while accessing a file through the file system may be slow. http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1 So, I guess what I'm wondering now is, does it happen to everyone that ZFS is under half the speed of raw disk access? What speeds are other people getting trying to dd a file through zfs file system? Something like dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS block size) how does that compare to: dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1 If you could please post your MB/s and show output of zpool status so we can see your disk configuration I would appreciate it. Please use file that is 100MB or more - result is be too random with small files. Also make sure zfs is not caching the file already! What I am seeing is that ZFS performance for sequential access is about 45% of raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For workload consisting mostly of reading large files sequentially, it would seem then that ZFS is the wrong tool performance-wise. But, it could be just my setup, so I would appreciate more data points. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss