Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-18 Thread Richard Elling

queuing theory should explain this rather nicely.  iostat measures
%busy by counting if there is an entry in the queue for the clock
ticks.  There are two queues, one in the controller and one on the
disk.  As you can clearly see the way ZFS pushes the load is very
different than dd or UFS.
 -- richard

Marko Milisavljevic wrote:
I am very grateful to everyone who took the time to run a few tests to 
help me figure what is going on. As per j's suggestions, I tried some 
simultaneous reads, and a few other things, and I am getting interesting 
and confusing results.


All tests are done using two Seagate 320G drives on sil3114. In each 
test I am using dd if= of=/dev/null bs=128k count=1. Each drive 
is freshly formatted with one 2G file copied to it. That way dd from raw 
disk and from file are using roughly same area of disk. I tried using 
raw, zfs and ufs, single drives and two simultaneously (just executing 
dd commands in separate terminal windows). These are snapshots of iostat 
-xnczpm 3 captured somewhere in the middle of the operation. I am not 
bothering to report CPU% as it never rose over 50%, and was uniformly 
proportional to reported throughput.


single drive raw:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1378.40.0 77190.70.0  0.0  1.70.01.2   0  98 c0d1

single drive, ufs file
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1255.10.0 69949.60.0  0.0  1.80.01.4   0 100 c0d0

Small slowdown, but pretty good.

single drive, zfs file
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  258.3 0.0 33066.60.0 33.0  2.0  127.77.7 100 100 c0d1

Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s 
/ r/s gives 256K, as I would imagine it should.


simultaneous raw:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  797.00.0 44632.00.0  0.0  1.80.02.3   0 100 c0d0
  795.70.0 44557.40.0  0.0  1.80.02.3   0 100 c0d1

This PCI interface seems to be saturated at 90MB/s. Adequate if the goal 
is to serve files on gigabit SOHO network.


sumultaneous raw on c0d1 and ufs on c0d0:
extended device statistics 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

  722.40.0 40246.80.0  0.0   1.80.02.5   0 100 c0d0
  717.10.0 40156.20.0  0.0  1.80.02.5   0  99 c0d1

hmm, can no longer get the 90MB/sec.

simultaneous zfs on c0d1 and raw on c0d0:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.70.01.8  0.0  0.00.00.1   0   0 c1d0
  334.90.0 18756.00.0  0.0  1.90.05.5   0  97 c0d0
  172.50.0 22074.60.0 33.0  2.0  191.3   11.6 100 100 c0d1

Everything is slow.

What happens if we throw onboard IDE interface into the mix?
simultaneous raw SATA and raw PATA:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1036.30.3 58033.90.3  0.0  1.60.01.6   0  99 c1d0
 1422.60.0 79668.30.0  0.0  1.60.01.1   1  98 c0d0

Both at maximum throughput.

Read ZFS on SATA drive and raw disk on PATA interface:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1018.90.3 57056.14.0  0.0  1.70.01.7   0  99 c1d0
  268.40.0 34353.1 0.0 33.0  2.0  122.97.5 100 100 c0d0

SATA is slower with ZFS as expected by now, but ATA remains at full 
speed. So they are operating quite independantly. Except...


What if we read a UFS file from the PATA disk and ZFS from SATA:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  792.80.0 44092.90.0  0.0  1.80.02.2   1  98 c1d0
  224.00.0 28675.20.0 33.0  2.0  147.38.9 100 100 c0d0
 
Now that is confusing! Why did SATA/ZFS slow down too? I've retried this 
a number of times, not a fluke.


Finally, after reviewing all this, I've noticed another interesting 
bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s 
over r/s is 56k, suggesting that underlying IO system is using that as 
some kind of a native block size? (even though dd is requesting 128k). 
But when reading ZFS files, this always comes to 128k, which is 
expected, since that is ZFS default (and same thing happens regardless 
of bs= in dd). On the theory that my system just doesn't like 128k reads 
(I'm desperate!), and that this would explain the whole slowdown and 
wait/wsvc_t column, I tried changing recsize to 32k and rewriting the 
test file. However, accessing ZFS files continues to show 128k reads, 
and it is just as slow. Is there a way to either confirm that the ZFS 
file in question is indeed written with 32k records or, even better, to 
force ZFS to use 56k when accessing the disk. Or perhaps I just 
misunderstand implications of iostat output.


I've repeated each of these tests a few times and doublechecked, and the 
numbers, although snapshots

Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-16 Thread johansen-osdev
> >*sata_hba_list::list sata_hba_inst_t satahba_next | ::print 
> >sata_hba_inst_t satahba_dev_port | ::array void* 32 | ::print void* | 
> >::grep ".!=0" | ::print sata_cport_info_t cport_devp.cport_sata_drive | 
> >::print -a sata_drive_info_t satadrv_features_support satadrv_settings 
> >satadrv_features_enabled

> This gives me "mdb: failed to dereference symbol: unknown symbol
> name". 

You may not have the SATA module installed.  If you type:

::modinfo !  grep sata

and don't get any output, your sata driver is attached some other way.

My apologies for the confusion.

-K
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-15 Thread Marko Milisavljevic

On 5/15/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> Each drive is freshly formatted with one 2G file copied to it.

How are you creating each of these files?


zpool create tank c0d0 c0d1; zfs create tank/test; cp ~/bigfile /tank/test/
Actual content of the file is random junk from /dev/random.


Also, would you please include a the output from the isalist(1) command?


pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86


Have you double-checked that this isn't a measurement problem by
measuring zfs with zpool iostat (see zpool(1M)) and verifying that
outputs from both iostats match?


Both give same kb/s.


How much memory is in this box?


1.5g, I can see in /var/adm/messages that it is recognized.


As root, type mdb -k, and then at the ">" prompt that appears, enter the
following command (this is one very long line):

*sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t 
satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print 
sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t 
satadrv_features_support satadrv_settings satadrv_features_enabled


This gives me "mdb: failed to dereference symbol: unknown symbol
name". I don't know enough about the syntax here to try to isolate
which token it is complaining about. But, I don't know if my PCI/SATA
card is going through sd driver, if that is what commands above
assume... my understanding is that sil3114 goes through ata driver, as
per this blog: http://blogs.sun.com/mlf/entry/ata_on_solaris_x86_at

If there is any other testing I can do, I would be happy to.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-15 Thread johansen-osdev
> Each drive is freshly formatted with one 2G file copied to it. 

How are you creating each of these files?

Also, would you please include a the output from the isalist(1) command?

> These are snapshots of iostat -xnczpm 3 captured somewhere in the
> middle of the operation.

Have you double-checked that this isn't a measurement problem by
measuring zfs with zpool iostat (see zpool(1M)) and verifying that
outputs from both iostats match?

> single drive, zfs file
>r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  258.30.0 33066.60.0 33.0  2.0  127.77.7 100 100 c0d1
> 
> Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s /
> r/s gives 256K, as I would imagine it should.

Not sure.  If we can figure out why ZFS is slower than raw disk access
in your case, it may explain why you're seeing these results.

> What if we read a UFS file from the PATA disk and ZFS from SATA:
>r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
>  792.80.0 44092.90.0  0.0  1.80.02.2   1  98 c1d0
>  224.00.0 28675.20.0 33.0  2.0  147.38.9 100 100 c0d0
> 
> Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a
> number of times, not a fluke.

This could be cache interference.  ZFS and UFS use different caches.

How much memory is in this box?

> I have no idea what to make of all this, except that it ZFS has a problem
> with this hardware/drivers that UFS and other traditional file systems,
> don't. Is it a bug in the driver that ZFS is inadvertently exposing? A
> specific feature that ZFS assumes the hardware to have, but it doesn't? Who
> knows!

This may be a more complicated interaction than just ZFS and your
hardware.  There are a number of layers of drivers underneath ZFS that
may also be interacting with your hardware in an unfavorable way.

If you'd like to do a little poking with MDB, we can see the features
that your SATA disks claim they support.

As root, type mdb -k, and then at the ">" prompt that appears, enter the
following command (this is one very long line):

*sata_hba_list::list sata_hba_inst_t satahba_next | ::print sata_hba_inst_t 
satahba_dev_port | ::array void* 32 | ::print void* | ::grep ".!=0" | ::print 
sata_cport_info_t cport_devp.cport_sata_drive | ::print -a sata_drive_info_t 
satadrv_features_support satadrv_settings satadrv_features_enabled

This should show satadrv_features_support, satadrv_settings, and
satadrv_features_enabled for each SATA disk on the system.

The values for these variables are defined in:

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/sys/sata/impl/sata.h

this is the relevant snippet for interpreting these values:

/*
 * Device feature_support (satadrv_features_support)
 */
#define SATA_DEV_F_DMA  0x01
#define SATA_DEV_F_LBA280x02
#define SATA_DEV_F_LBA480x04
#define SATA_DEV_F_NCQ  0x08
#define SATA_DEV_F_SATA10x10
#define SATA_DEV_F_SATA20x20
#define SATA_DEV_F_TCQ  0x40/* Non NCQ tagged queuing */

/*
 * Device features enabled (satadrv_features_enabled)
 */
#define SATA_DEV_F_E_TAGGED_QING0x01/* Tagged queuing enabled */
#define SATA_DEV_F_E_UNTAGGED_QING  0x02/* Untagged queuing enabled */

/*
 * Drive settings flags (satdrv_settings)
 */
#define SATA_DEV_READ_AHEAD 0x0001  /* Read Ahead enabled */
#define SATA_DEV_WRITE_CACHE0x0002  /* Write cache ON */
#define SATA_DEV_SERIAL_FEATURES0x8000  /* Serial ATA feat.  enabled */
#define SATA_DEV_ASYNCH_NOTIFY  0x2000  /* Asynch-event enabled */

This may give us more information if this is indeed a problem with
hardware/drivers supporting the right features.

-j
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

I am very grateful to everyone who took the time to run a few tests to help
me figure what is going on. As per j's suggestions, I tried some
simultaneous reads, and a few other things, and I am getting interesting and
confusing results.

All tests are done using two Seagate 320G drives on sil3114. In each test I
am using dd if= of=/dev/null bs=128k count=1. Each drive is freshly
formatted with one 2G file copied to it. That way dd from raw disk and from
file are using roughly same area of disk. I tried using raw, zfs and ufs,
single drives and two simultaneously (just executing dd commands in separate
terminal windows). These are snapshots of iostat -xnczpm 3 captured
somewhere in the middle of the operation. I am not bothering to report CPU%
as it never rose over 50%, and was uniformly proportional to reported
throughput.

single drive raw:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1378.40.0 77190.70.0  0.0  1.70.01.2   0  98 c0d1

single drive, ufs file
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1255.10.0 69949.60.0  0.0  1.80.01.4   0 100 c0d0

Small slowdown, but pretty good.

single drive, zfs file
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 258.30.0 33066.60.0 33.0  2.0  127.77.7 100 100 c0d1

Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s /
r/s gives 256K, as I would imagine it should.

simultaneous raw:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 797.00.0 44632.00.0  0.0  1.80.02.3   0 100 c0d0
 795.70.0 44557.40.0  0.0  1.80.02.3   0 100 c0d1

This PCI interface seems to be saturated at 90MB/s. Adequate if the goal is
to serve files on gigabit SOHO network.

sumultaneous raw on c0d1 and ufs on c0d0:
   extended device statistics
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 722.40.0 40246.80.0  0.0  1.80.02.5   0 100 c0d0
 717.10.0 40156.20.0  0.0  1.80.02.5   0  99 c0d1

hmm, can no longer get the 90MB/sec.

simultaneous zfs on c0d1 and raw on c0d0:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.00.70.01.8  0.0  0.00.00.1   0   0 c1d0
 334.90.0 18756.00.0  0.0  1.90.05.5   0  97 c0d0
 172.50.0 22074.60.0 33.0  2.0  191.3   11.6 100 100 c0d1

Everything is slow.

What happens if we throw onboard IDE interface into the mix?
simultaneous raw SATA and raw PATA:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1036.30.3 58033.90.3  0.0  1.60.01.6   0  99 c1d0
1422.60.0 79668.30.0  0.0  1.60.01.1   1  98 c0d0

Both at maximum throughput.

Read ZFS on SATA drive and raw disk on PATA interface:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
1018.90.3 57056.14.0  0.0  1.70.01.7   0  99 c1d0
 268.40.0 34353.10.0 33.0  2.0  122.97.5 100 100 c0d0

SATA is slower with ZFS as expected by now, but ATA remains at full speed.
So they are operating quite independantly. Except...

What if we read a UFS file from the PATA disk and ZFS from SATA:
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 792.80.0 44092.90.0  0.0  1.80.02.2   1  98 c1d0
 224.00.0 28675.20.0 33.0  2.0  147.38.9 100 100 c0d0

Now that is confusing! Why did SATA/ZFS slow down too? I've retried this a
number of times, not a fluke.

Finally, after reviewing all this, I've noticed another interesting bit...
whenever I read from raw disks or UFS files, SATA or PATA, kr/s over r/s is
56k, suggesting that underlying IO system is using that as some kind of a
native block size? (even though dd is requesting 128k). But when reading ZFS
files, this always comes to 128k, which is expected, since that is ZFS
default (and same thing happens regardless of bs= in dd). On the theory that
my system just doesn't like 128k reads (I'm desperate!), and that this would
explain the whole slowdown and wait/wsvc_t column, I tried changing recsize
to 32k and rewriting the test file. However, accessing ZFS files continues
to show 128k reads, and it is just as slow. Is there a way to either confirm
that the ZFS file in question is indeed written with 32k records or, even
better, to force ZFS to use 56k when accessing the disk. Or perhaps I just
misunderstand implications of iostat output.

I've repeated each of these tests a few times and doublechecked, and the
numbers, although snapshots of a point in time, fairly represent averages.

I have no idea what to make of all this, except that it ZFS has a problem
with this hardware/drivers that UFS and other traditional file systems,
don't. Is it a bug in the driver that ZFS is inadvertently exposing? A
specific feature that ZFS assumes the hardware to have, but it doesn't? Who
knows! I will have to give up on Solaris/ZFS on this hardware for now,

Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Al Hopper
On Mon, 14 May 2007, Marko Milisavljevic wrote:

> Thank you, Al.
>
> Would you mind also doing:
>
> ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1

# ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1

real   20.046
user0.013
sys 3.568


> to see the raw performance of underlying hardware.

Regards,

Al Hopper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread johansen-osdev
Marko,

I tried this experiment again using 1 disk and got nearly identical
times:

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   21.4
user0.0
sys 2.4

$ /usr/bin/time dd if=/test/filebench/testfile of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   21.0
user0.0
sys 0.7


> [I]t is not possible for dd to meaningfully access multiple-disk
> configurations without going through the file system. I find it
> curious that there is such a large slowdown by going through file
> system (with single drive configuration), especially compared to UFS
> or ext3.

Comparing a filesystem to raw dd access isn't a completely fair
comparison either.  Few filesystems actually layout all of their data
and metadata so that every read is a completely sequential read.

> I simply have a small SOHO server and I am trying to evaluate which OS to
> use to keep a redundant disk array. With unreliable consumer-level hardware,
> ZFS and the checksum feature are very interesting and the primary selling
> point compared to a Linux setup, for as long as ZFS can generate enough
> bandwidth from the drive array to saturate single gigabit ethernet.

I would take Bart's reccomendation and go with Solaris on something like a
dual-core box with 4 disks.

> My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114
> SATA controller on a 32-bit AthlonXP, according to many posts I found.

Bill Moore lists some controller reccomendations here:

http://mail.opensolaris.org/pipermail/zfs-discuss/2006-March/016874.html

> However, since dd over raw disk is capable of extracting 75+MB/s from this
> setup, I keep feeling that surely I must be able to get at least that much
> from reading a pair of striped or mirrored ZFS drives. But I can't - single
> drive or 2-drive stripes or mirrors, I only get around 34MB/s going through
> ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.)

Maybe this is a problem with your controller?  What happens when you
have two simultaneous dd's to different disks running?  This would
simulate the case where you're reading from the two disks at the same
time.

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Nick G
Don't know how much this will help, but my results:

Ultra 20 we just got at work:

 # uname -a
SunOS unknown 5.10 Generic_118855-15 i86pc i386 i86pc

raw disk
dd if=/dev/dsk/c1d0s6 of=/dev/null bs=128k count=1  0.00s user 2.16s system 
14% cpu 15.131 total

1,280,000k in 15.131 seconds
84768k/s

through filesystem
dd if=testfile of=/dev/null bs=128k count=1  0.01s user 0.88s system 4% cpu 
19.666 total

1,280,000k in 19.666 seconds
65087k/s


AMD64 Freebsd 7 on a Lenovo something or other, Athlon X2 3800+

 uname -a 
FreeBSD  7.0-CURRENT-200705 FreeBSD 7.0-CURRENT-200705 #0: Fri May 11 14:41:37 
UTC 2007 root@:/usr/src/sys/amd64/compile/ZFS  amd64

raw disk
dd if=/dev/ad6p1 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes transferred in 17.126926 secs (76529787 bytes/sec)
(74735k/s)

filesystem
# dd of=/dev/null if=testfile bs=128k count=1
1+0 records in
1+0 records out
131072 bytes transferred in 17.174395 secs (76318263 bytes/sec)
(74529k/s)

Odd to say the least since "du" for instance is faster on Solaris ZFS...

FWIW Freebsd is running version 6 of ZFS and the unpatched but _new_ Ultra 20 
is running version 2 of ZFS according to zdb


Make sure you're all patched up?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

Thank you, Ian,

You are getting ZFS over 2-disk RAID-0 to be twice as fast as dd raw disk
read on one disk, which sounds more encouraging. But, there is something odd
with dd from raw drive - it is only 28MB/s or so, if I divided that right? I
would expect it to be around 100MB/s on 10K drives, or at least that should
be roughly potential throughput rate. Compared to throughput from ZFS 2-disk
RAID-0 which is showing 57MB/s. Any idea why raw dd read is so slow?

Also, I wonder if everyone is using different dd command then I am - I get
summary line that shows elapsed time and MB/s.

On 5/14/07, Ian Collins <[EMAIL PROTECTED]> wrote:


Marko Milisavljevic wrote:
> To reply to my own message this article offers lots of insight into
why dd access directly through raw disk is fast, while accessing a file
through the file system may be slow.
>
> http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1
>
> So, I guess what I'm wondering now is, does it happen to everyone that
ZFS is under half the speed of raw disk access? What speeds are other people
getting trying to dd a file through zfs file system? Something like
>
> dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using
default ZFS block size)
>
> how does that compare to:
>
> dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1
>
>
Testing on a old Athlon MP box, two U160 10K SCSI drives.

bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real0m44.470s
user0m0.018s
sys 0m8.290s

time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real0m22.714s
user0m0.020s
sys 0m3.228s

zpool status
  pool: test
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  mirrorONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0

Ian


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

Right now, the AthlonXP machine is booted into Linux, and I'm getting same
raw speed as when it is in Solaris, from PCI Sil3114 with Seagate 320G (
7200.10):

dd if=/dev/sdb of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes (1.3 GB) copied, 16.7756 seconds, 78.1 MB/s

sudo dd if=./test.mov of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes (1.3 GB) copied, 24.2731 seconds, 54.0 MB/s <-- some
overhead compared to raw speed of same disk above

same machine, onboard ATA, Seagate 120G:
dd if=/dev/hda of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes (1.3 GB) copied, 22.5892 seconds, 58.0 MB/s

On another machine with Pentium D 3.0GHz and ICH7 onboard SATA in AHCI mode,
running Darwin OS:

from a Seagate 500G (7200.10):
dd if=/dev/rdisk0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes transferred in 17.697512 secs (74062388 bytes/sec)

same disk, access through file system (HFS+)
dd if=./Summer\ 2006\ with\ Cohen\ 4 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes transferred in 20.381901 secs (64308035 bytes/sec) <- very
small overhead compared to raw access above!

same Intel machine, Seagate 200G (7200.8, I think):
dd if=/dev/rdisk1 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out
131072 bytes transferred in 20.850229 secs (62863578 bytes/sec)

Modern disk drives are definitely fast and pushing close to 80MB/s raw
performance. And some file systems can get over 85% of that with simple
sequential access. So far, on these particular hardware and software
combinations, I have, filesystem performance as percentage of raw disk
performance for sequential unchached read:

HFS+: 86%
ext3 and UFS: 70%
ZFS: 45%

On 5/14/07, Richard Elling <[EMAIL PROTECTED]> wrote:


Marko Milisavljevic wrote:
> I missed an important conclusion from j's data, and that is that single
> disk raw access gives him 56MB/s, and RAID 0 array gives him
> 961/46=21MB/s per disk, which comes in at 38% of potential performance.
> That is in the ballpark of getting 45% of potential performance, as I am
> seeing with my puny setup of single or dual drives. Of course, I don't
> expect a complex file system to match raw disk dd performance, but it
> doesn't compare favourably to common file systems like UFS or ext3, so
> the question remains, is ZFS overhead normally this big? That would mean
> that one needs to have at least 4-5 way stripe to generate enough data
> to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser"
> filesystem, a possibly important consideration in SOHO situation.

Could you post iostat data for these runs?

Also, as I suggested previously, try with checksum off.  AthlonXP doesn't
have a reputation as a speed deamon.

BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s
isn't bad.  The media speed will range from perhaps [30-40]-[60-75]
MBytes/s
judging from a quick scan of disk vendor datasheets.  In other words, it
would not surprise me to see 4-5 way stripe being required to keep a
GbE saturated.
  -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Ian Collins
Marko Milisavljevic wrote:
> To reply to my own message this article offers lots of insight into why 
> dd access directly through raw disk is fast, while accessing a file through 
> the file system may be slow. 
>
> http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1
>
> So, I guess what I'm wondering now is, does it happen to everyone that ZFS is 
> under half the speed of raw disk access? What speeds are other people getting 
> trying to dd a file through zfs file system? Something like
>
> dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default 
> ZFS block size)
>
> how does that compare to:
>
> dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1
>
>   
Testing on a old Athlon MP box, two U160 10K SCSI drives.

bash-3.00# time dd if=/dev/dsk/c2t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real0m44.470s
user0m0.018s
sys 0m8.290s

 time dd if=/test/play/sol-nv-b62-x86-dvd.iso of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real0m22.714s
user0m0.020s
sys 0m3.228s

 zpool status
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  mirrorONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Bart Smaalders

Marko Milisavljevic wrote:
I missed an important conclusion from j's data, and that is that single 
disk raw access gives him 56MB/s, and RAID 0 array gives him 
961/46=21MB/s per disk, which comes in at 38% of potential performance. 
That is in the ballpark of getting 45% of potential performance, as I am 
seeing with my puny setup of single or dual drives. Of course, I don't 
expect a complex file system to match raw disk dd performance, but it 
doesn't compare favourably to common file systems like UFS or ext3, so 
the question remains, is ZFS overhead normally this big? That would mean 
that one needs to have at least 4-5 way stripe to generate enough data 
to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" 
filesystem, a possibly important consideration in SOHO situation.





I don't see this on my system, but it has more CPU (dual
core 2.6 GHz).  It saturates a GB net w/ 4 drives & samba,
not working hard at all.  A thumper does 2 GB/sec w 2 dual
core CPUs.

Do you have compression enabled?  This can be a choke point
for weak CPUs.

- Bart


Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

Thank you, Al.

Would you mind also doing:

ptime dd if=/dev/dsk/c2t1d0 of=/dev/null bs=128k count=1

to see the raw performance of underlying hardware.

On 5/14/07, Al Hopper <[EMAIL PROTECTED]> wrote:


# ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real6.407
user0.008
sys 1.624

  pool: tank
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0

3-way mirror:

1+0 records in
1+0 records out

real   12.500
user0.007
sys 1.216

2-way mirror:

1+0 records in
1+0 records out

real   18.356
user0.006
sys 0.935


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Richard Elling

Marko Milisavljevic wrote:
I missed an important conclusion from j's data, and that is that single 
disk raw access gives him 56MB/s, and RAID 0 array gives him 
961/46=21MB/s per disk, which comes in at 38% of potential performance. 
That is in the ballpark of getting 45% of potential performance, as I am 
seeing with my puny setup of single or dual drives. Of course, I don't 
expect a complex file system to match raw disk dd performance, but it 
doesn't compare favourably to common file systems like UFS or ext3, so 
the question remains, is ZFS overhead normally this big? That would mean 
that one needs to have at least 4-5 way stripe to generate enough data 
to saturate gigabit ethernet, compared to 2-3 way stripe on a "lesser" 
filesystem, a possibly important consideration in SOHO situation.


Could you post iostat data for these runs?

Also, as I suggested previously, try with checksum off.  AthlonXP doesn't
have a reputation as a speed deamon.

BTW, for 7,200 rpm drives, which are typical in desktops, 56 MBytes/s
isn't bad.  The media speed will range from perhaps [30-40]-[60-75] MBytes/s
judging from a quick scan of disk vendor datasheets.  In other words, it
would not surprise me to see 4-5 way stripe being required to keep a
GbE saturated.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Al Hopper
On Mon, 14 May 2007, Marko Milisavljevic wrote:

> To reply to my own message this article offers lots of insight into why 
> dd access directly through raw disk is fast, while accessing a file through 
> the file system may be slow.
>
> http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1
>
> So, I guess what I'm wondering now is, does it happen to everyone that ZFS is 
> under half the speed of raw disk access? What speeds are other people getting 
> trying to dd a file through zfs file system? Something like
>
> dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default 
> ZFS block size)
>
> how does that compare to:
>
> dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1
>
> If you could please post your MB/s and show output of zpool status so we
> can see your disk configuration I would appreciate it. Please use file
> that is 100MB or more - result is be too random with small files. Also
> make sure zfs is not caching the file already!

# ptime dd if=./allhomeal20061209_01.tar of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real6.407
user0.008
sys 1.624

  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2t0d0  ONLINE   0 0 0
c2t1d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0

3-way mirror:

1+0 records in
1+0 records out

real   12.500
user0.007
sys 1.216

2-way mirror:

1+0 records in
1+0 records out

real   18.356
user0.006
sys 0.935


# psrinfo -v
Status of virtual processor 0 as of: 05/14/2007 17:31:18
  on-line since 05/03/2007 08:01:21.
  The i386 processor operates at 2009 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 05/14/2007 17:31:18
  on-line since 05/03/2007 08:01:24.
  The i386 processor operates at 2009 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 2 as of: 05/14/2007 17:31:18
  on-line since 05/03/2007 08:01:26.
  The i386 processor operates at 2009 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 3 as of: 05/14/2007 17:31:18
  on-line since 05/03/2007 08:01:28.
  The i386 processor operates at 2009 MHz,
and has an i387 compatible floating point processor.


> What I am seeing is that ZFS performance for sequential access is about 45% 
> of raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For 
> workload consisting mostly of reading large files sequentially, it would seem 
> then that ZFS is the wrong tool performance-wise. But, it could be just my 
> setup, so I would appreciate more data points.
>

Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

I missed an important conclusion from j's data, and that is that single disk
raw access gives him 56MB/s, and RAID 0 array gives him 961/46=21MB/s per
disk, which comes in at 38% of potential performance. That is in the
ballpark of getting 45% of potential performance, as I am seeing with my
puny setup of single or dual drives. Of course, I don't expect a complex
file system to match raw disk dd performance, but it doesn't compare
favourably to common file systems like UFS or ext3, so the question remains,
is ZFS overhead normally this big? That would mean that one needs to have at
least 4-5 way stripe to generate enough data to saturate gigabit ethernet,
compared to 2-3 way stripe on a "lesser" filesystem, a possibly important
consideration in SOHO situation.

On 5/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


This certainly isn't the case on my machine.

$ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real1.3
user0.0
sys 1.2

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   22.3
user0.0
sys 2.2

This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool.

My pool is configured into a 46 disk RAID-0 stripe.  I'm going to omit
the zpool status output for the sake of brevity.

> What I am seeing is that ZFS performance for sequential access is
> about 45% of raw disk access, while UFS (as well as ext3 on Linux) is
> around 70%. For workload consisting mostly of reading large files
> sequentially, it would seem then that ZFS is the wrong tool
> performance-wise. But, it could be just my setup, so I would
> appreciate more data points.

This isn't what we've observed in much of our performance testing.
It may be a problem with your config, although I'm not an expert on
storage configurations.  Would you mind providing more details about
your controller, disks, and machine setup?

-j


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic

Thank you for those numbers.

I should have mentioned that I was mostly interested in single disk or small
array performance, as it is not possible for dd to meaningfully access
multiple-disk configurations without going through the file system. I find
it curious that there is such a large slowdown by going through file system
(with single drive configuration), especially compared to UFS or ext3.

I simply have a small SOHO server and I am trying to evaluate which OS to
use to keep a redundant disk array. With unreliable consumer-level hardware,
ZFS and the checksum feature are very interesting and the primary selling
point compared to a Linux setup, for as long as ZFS can generate enough
bandwidth from the drive array to saturate single gigabit ethernet.

My hardware at the moment is the "wrong" choice for Solaris/ZFS - PCI 3114
SATA controller on a 32-bit AthlonXP, according to many posts I found.
However, since dd over raw disk is capable of extracting 75+MB/s from this
setup, I keep feeling that surely I must be able to get at least that much
from reading a pair of striped or mirrored ZFS drives. But I can't - single
drive or 2-drive stripes or mirrors, I only get around 34MB/s going through
ZFS. (I made sure mirror was rebuilt and I resilvered the stripes.)
Everything is stock Nevada b63 installation, so I haven't messed it up with
misguided tuning attempts. Don't know if it matters, but test file was
created originally from /dev/random. Compression is off, and everything is
default. CPU utilization remains low at all times (haven't seen it go over
25%).

On 5/14/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:


This certainly isn't the case on my machine.

$ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k
count=1
1+0 records in
1+0 records out

real1.3
user0.0
sys 1.2

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   22.3
user0.0
sys 2.2

This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool.

My pool is configured into a 46 disk RAID-0 stripe.  I'm going to omit
the zpool status output for the sake of brevity.

> What I am seeing is that ZFS performance for sequential access is
> about 45% of raw disk access, while UFS (as well as ext3 on Linux) is
> around 70%. For workload consisting mostly of reading large files
> sequentially, it would seem then that ZFS is the wrong tool
> performance-wise. But, it could be just my setup, so I would
> appreciate more data points.

This isn't what we've observed in much of our performance testing.
It may be a problem with your config, although I'm not an expert on
storage configurations.  Would you mind providing more details about
your controller, disks, and machine setup?

-j


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread johansen-osdev
This certainly isn't the case on my machine.

$ /usr/bin/time dd if=/test/filebench/largefile2 of=/dev/null bs=128k 
count=1
1+0 records in
1+0 records out

real1.3
user0.0
sys 1.2

# /usr/bin/time dd if=/dev/dsk/c0t0d0 of=/dev/null bs=128k count=1
1+0 records in
1+0 records out

real   22.3
user0.0
sys 2.2

This looks like 56 MB/s on the /dev/dsk and 961 MB/s on the pool.

My pool is configured into a 46 disk RAID-0 stripe.  I'm going to omit
the zpool status output for the sake of brevity.

> What I am seeing is that ZFS performance for sequential access is
> about 45% of raw disk access, while UFS (as well as ext3 on Linux) is
> around 70%. For workload consisting mostly of reading large files
> sequentially, it would seem then that ZFS is the wrong tool
> performance-wise. But, it could be just my setup, so I would
> appreciate more data points.

This isn't what we've observed in much of our performance testing.
It may be a problem with your config, although I'm not an expert on
storage configurations.  Would you mind providing more details about
your controller, disks, and machine setup?

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-14 Thread Marko Milisavljevic
To reply to my own message this article offers lots of insight into why dd 
access directly through raw disk is fast, while accessing a file through the 
file system may be slow. 

http://www.informit.com/articles/printerfriendly.asp?p=606585&rl=1

So, I guess what I'm wondering now is, does it happen to everyone that ZFS is 
under half the speed of raw disk access? What speeds are other people getting 
trying to dd a file through zfs file system? Something like

dd if=/pool/mount/file of=/dev/null bs=128k (assuming you are using default ZFS 
block size)

how does that compare to:

dd if=/dev/dsk/diskinzpool of=/dev/null bs=128k count=1

If you could please post your MB/s and show output of zpool status so we can 
see your disk configuration I would appreciate it. Please use file that is 
100MB or more - result is be too random with small files. Also make sure zfs is 
not caching the file already!

What I am seeing is that ZFS performance for sequential access is about 45% of 
raw disk access, while UFS (as well as ext3 on Linux) is around 70%. For 
workload consisting mostly of reading large files sequentially, it would seem 
then that ZFS is the wrong tool performance-wise. But, it could be just my 
setup, so I would appreciate more data points.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss