I am very grateful to everyone who took the time to run a few tests to help me figure what is going on. As per j's suggestions, I tried some simultaneous reads, and a few other things, and I am getting interesting and confusing results.
All tests are done using two Seagate 320G drives on sil3114. In each test I am using dd if=.... of=/dev/null bs=128k count=10000. Each drive is freshly formatted with one 2G file copied to it. That way dd from raw disk and from file are using roughly same area of disk. I tried using raw, zfs and ufs, single drives and two simultaneously (just executing dd commands in separate terminal windows). These are snapshots of iostat -xnczpm 3 captured somewhere in the middle of the operation. I am not bothering to report CPU% as it never rose over 50%, and was uniformly proportional to reported throughput. single drive raw: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1378.4 0.0 77190.7 0.0 0.0 1.7 0.0 1.2 0 98 c0d1 single drive, ufs file r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1255.1 0.0 69949.6 0.0 0.0 1.8 0.0 1.4 0 100 c0d0 Small slowdown, but pretty good. single drive, zfs file r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 258.3 0.0 33066.6 0.0 33.0 2.0 127.7 7.7 100 100 c0d1 Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s / r/s gives 256K, as I would imagine it should. simultaneous raw: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 797.0 0.0 44632.0 0.0 0.0 1.8 0.0 2.3 0 100 c0d0 795.7 0.0 44557.4 0.0 0.0 1.8 0.0 2.3 0 100 c0d1 This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is to serve files on gigabit SOHO network. sumultaneous raw on c0d1 and ufs on c0d0: extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 722.4 0.0 40246.8 0.0 0.0 1.8 0.0 2.5 0 100 c0d0 717.1 0.0 40156.2 0.0 0.0 1.8 0.0 2.5 0 99 c0d1 hmm, can no longer get the 90MB/sec. simultaneous zfs on c0d1 and raw on c0d0: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.7 0.0 1.8 0.0 0.0 0.0 0.1 0 0 c1d0 334.9 0.0 18756.0 0.0 0.0 1.9 0.0 5.5 0 97 c0d0 172.5 0.0 22074.6 0.0 33.0 2.0 191.3 11.6 100 100 c0d1 Everything is slow. What happens if we throw onboard IDE interface into the mix? simultaneous raw SATA and raw PATA: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1036.3 0.3 58033.9 0.3 0.0 1.6 0.0 1.6 0 99 c1d0 1422.6 0.0 79668.3 0.0 0.0 1.6 0.0 1.1 1 98 c0d0 Both at maximum throughput. Read ZFS on SATA drive and raw disk on PATA interface: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 1018.9 0.3 57056.1 4.0 0.0 1.7 0.0 1.7 0 99 c1d0 268.4 0.0 34353.1 0.0 33.0 2.0 122.9 7.5 100 100 c0d0 SATA is slower with ZFS as expected by now, but ATA remains at full speed. So they are operating quite independantly. Except... What if we read a UFS file from the PATA disk and ZFS from SATA: r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 792.8 0.0 44092.9 0.0 0.0 1.8 0.0 2.2 1 98 c1d0 224.0 0.0 28675.2 0.0 33.0 2.0 147.3 8.9 100 100 c0d0 Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a number of times, not a fluke. Finally, after reviewing all this, I've noticed another interesting bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is 56k, suggesting that underlying IO system is using that as some kind of a native block size? (even though dd is requesting 128k). But when reading ZFS files, this always comes to 128k, which is expected, since that is ZFS default (and same thing happens regardless of bs= in dd). On the theory that my system just doesn't like 128k reads (I'm desperate!), and that this would explain the whole slowdown and wait/wsvc_t column, I tried changing recsize to 32k and rewriting the test file. However, accessing ZFS files continues to show 128k reads, and it is just as slow. Is there a way to either confirm that the ZFS file in question is indeed written with 32k records or, even better, to force ZFS to use 56k when accessing the disk. Or perhaps I just misunderstand implications of iostat output. I've repeated each of these tests a few times and doublechecked, and the numbers, although snapshots of a point in time, fairly represent averages. I have no idea what to make of all this, except that it ZFS has a problem with this hardware/drivers that UFS and other traditional file systems, don't. Is it a bug in the driver that ZFS is inadvertently exposing? A specific feature that ZFS assumes the hardware to have, but it doesn't? Who knows! I will have to give up on Solaris/ZFS on this hardware for now, but I hope to try it again sometime in the future. I'll give FreeBSD/ZFS a spin to see if it fares better (although at this point in its development it is probably more risky then just sticking with Linux and missing out on ZFS). (Another contributor suggested turning checksumming off - it made no difference. Same for atime. Compression was always off.) On 5/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Marko, I tried this experiment again using 1 disk and got nearly identical times: # /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 21.4 user 0.0 sys 2.4 $ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k count=10000 10000+0 records in 10000+0 records out real 21.0 user 0.0 sys 0.7 > [I]t is not possible for dd to meaningfully access multiple-disk > configurations without going through the file system. I find it > curious that there is such a large slowdown by going through file > system (with single drive configuration), especially compared to UFS > or ext3. Comparing a filesystem to raw dd access isn't a completely fair comparison either. Few filesystems actually layout all of their data and metadata so that every read is a completely sequential read. > I simply have a small SOHO server and I am trying to evaluate which OS to > use to keep a redundant disk array. With unreliable consumer-level hardware, > ZFS and the checksum feature are very interesting and the primary selling > point compared to a Linux setup, for as long as ZFS can generate enough > bandwidth from the drive array to saturate single gigabit ethernet. I would take Bart's reccomendation and go with Solaris on something like a dual-core box with 4 disks. > My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114 > SATA controller on a 32-bit AthlonXP, according to many posts I found. Bill Moore lists some controller reccomendations here: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html > However, since dd over raw disk is capable of extracting 75+MB/s from this > setup, I keep feeling that surely I must be able to get at least that much > from reading a pair of striped or mirrored ZFS drives. But I can't - single > drive or 2-drive stripes or mirrors, I only get around 34MB/s going through > ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.) Maybe this is a problem with your controller? What happens when you have two simultaneous dd's to different disks running? This would simulate the case where you're reading from the two disks at the same time. -j
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss