subject:"\[zfs\-discuss\] Pool iscsi \/zfs performance in opensolaris 0906"


erik.ableson wrote:

On 7 août 09, at 02:03, Stephen Green wrote:


Man, that looks so nice I think I'll change my mail client to do dates 
in French :-)


Now my only question is:  what do I do when it's done?  If I reboot 
and the ram disk disappears, will my tank be dead? Or will it just 
continue without the slog?  I realize that I'm probably totally boned 
if the system crashes, so I'm copying off the stuff that I really care 
about to another pool (the Mac's already been backed up to a USB drive.)


You have a number of options to preserve the current state of affairs 
and be able to reboot the OpenSolaris server if required.


The absolute safest bet would be the following, but the resilvering will 
take a while before you'll be able to shutdown:


create a file of the same size of the ramdisk on the rpool volume
replace the ramdisk slog with the 2G file (zpool replace poolname 
/dev/ramdisk/slog /root/slogtemp)

wait for the resilver/replacement operation to run its course
reboot
create a new ramdisk (same size, as always)
replace the file slog with the newly created ramdisk


Would having an slog as a file on a different pool provide anywhere near 
the same improvement that I saw by adding a ram disk? Would it affect 
the typical performance (i.e., reading and writing files in my editor) 
adversely?


That is, could I move the slog to a file and then just leave it there so 
that I don't have trouble across reboots?  I could then just use the 
ramdisk when big things happened on the MacBook.


If your machine reboots unexpectedly things are a little dicier, but you 
should still be able to get things back online.  If you did a dump of 
the ramdisk via dd to a file it should contain the correct signature and 
be recognized by ZFS.  Now there will be no guarantees to the state of 
the data since if there was anything actively used on the ramdisk when 
it stopped you'll lose data and I'm not sure how the pool will deal with 
this.  But in a pinch, you should be able to either replace the missing 
ramdisk device with the dd file copy of the ramdisk (make a copy first, 
just in case) or mount a new ramdisk, and dd the contents of the file 
back to the device and then import the pool.


So, I take it if I just do a shutdown, the slog will be emptied 
appropriately to the pool, but then at startup the slog device will be 
missing and the system won't be able to import that pool.


If I dd the ramdisk to a file, I suppose that I should use a file on my 
rpool, right?


Thanks for the advice, I think it might be time to convince the wife 
that I need to buy an SSD.  Anyone have recommendations for a reasonably 
priced SSD for a home box?


Steve



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Stephen Green wrote:
Thanks for the advice, I think it might be time to convince the wife 
that I need to buy an SSD.  Anyone have recommendations for a reasonably 
priced SSD for a home box?


For example, does anyone know if something like:

http://www.newegg.com/Product/Product.aspx?Item=N82E16820227436

manufacturers homepage:

http://www.ocztechnology.com/products/solid_state_drives/ocz_minipci_express_ssd-sata_

would work in OpenSolaris?  It (apparently) just looks like a SATA disk 
on the PCIe bus, and the package that they ship it in doesn't look big 
enough to have a driver disk in it (and the manufacturer doesn't provide 
drivers on their Web site.)


Compatibility aside, would a 16GB SSD on a SATA port be a good solution 
to my problem? My box is a bit shy on SATA ports, but I've got lots of 
PCI ports.  Should I get two?


It's only $60, so not such a troublesome sell to my wife.

Steve


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-07 Thread Scott Meilicke

Note - this has a mini PCIe interface, not PCIe.

I had the 64GB version in a Dell Mini 9. While it was great for it's small 
size, low power and low heat characteristics (no fan on the Mini 9!), it was 
only faster than the striped sata drives in my mac pro when it came to random 
reads. Everything else was slower, sometimes by a lot, as measured by XBench. 
Unfortunately I no longer have the numbers to share. I see the sustained writes 
listed as up to 25 MB/s, and bursts up to 51 MB/s.

That said, I have read of people having good luck with fast CF cards (no ref, 
sorry). So maybe this will be just fine :) 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Stephen Green wrote:
Oh, and for those following along at home, the re-silvering of the slog 
to a file is proceeding well.  72% done in 25 minutes.


And, for the purposes of the archives, the re-silver finished in 34 
minutes and I successfully removed the RAM disk.  Thanks, Erik for the 
eminently followable instructions.


Also, I got my wife to agree to a new SSD, so I presume that I can 
simply do the re-silver with the new drive when it arrives.


Can I replace a log with a larger one?  Can I partition the SSD (looks 
like I'll be getting a 32GB one) and use half for cache and half for 
log?  Even if I can, should I?


Steve

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Stephen Green wrote:
Also, I got my wife to agree to a new SSD, so I presume that I can 
simply do the re-silver with the new drive when it arrives.


And the last thing for today, I ended up getting:

http://www.newegg.com/Product/Product.aspx?Item=N82E16820609330

which is 16GB and should be sufficient to my needs.  I'll let you know 
how it works out.  Suggestions as to pre/post installation IO tests welcome.


Steve
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-06 Thread Stephen Green


erik.ableson wrote:
You're running into the same problem I had with 2009.06 as they have 
corrected a bug where the iSCSI target prior to 
2009.06 didn't honor completely SCSI sync commands issued by the initiator.


I think I've hit the same thing. I'm using an iscsi volume as the target 
for Time Machine backups for my new Mac Book Pro using the GlobalSAN 
initiator.  Running against an iscsi volume on my zfs pool, with both 
the Mac and the Solaris box on gigE, I was seeing the Time Machine 
backup (of 90GB of data) running at about 600-700KB (yes, KB) per second.


This would mean a backup time on the order of (optimistically) 45 hours, 
so I decided to give your suggestion a go.


For my freewheeling home use where everything gets tried, crashed, 
patched and put back together with baling twine (and is backed up 
elsewhere...) I've mounted a RAM disk of 1Gb which is attached to the 
pool as a ZIL and you see the performance run in cycles where the ZIL 
loads up to saturation, flushes out to disk and keeps going. I did write 
a script to regularly dd the ram disk device out to a file so that I can 
recreate with the appropriate signature if I have to reboot the osol 
box. This is used with the GlobalSAN initiator on OS X as well as 
various Windows and Linux machines, physical and VM.


Assuming this is a test system that you're playing with and you can 
destroy the pool with inpunity, and you don't have an SSD lying around 
to test with, try the following :


ramdiskadm -a slog 2g (or whatever size you can manage reasonably with 
the available physical RAM - try vmstat 1 2 to determine available memory)

zpool add poolname log /dev/ramdisk/slog


I used a 2GB ram disk (the machine has 12GB of RAM) and this jumped the 
backup up to somewhere between 18-40MB/s, which means that I'm only a 
couple of hours away from finishing my backup.  This is, as far as I can 
tell, magic (since I started this message nearly 10GB of data have been 
transferred, when it took from 6am this morning to get to 20GB.)


It transfer speed drops like crazy when the write to disk happens, but 
it jumps right back up afterwards.


If you want to perhaps reuse the slog later (ram disks are not preserved 
over reboot) write the slog volume out to disk and dump it back in after 
restarting.

 dd if=/dev/ramdisk/slog of=/root/slog.dd


Now my only question is:  what do I do when it's done?  If I reboot and 
the ram disk disappears, will my tank be dead? Or will it just continue 
without the slog?  I realize that I'm probably totally boned if the 
system crashes, so I'm copying off the stuff that I really care about to 
another pool (the Mac's already been backed up to a USB drive.)


Have I meddled in the affairs of wizards?  Is ZFS subtle and quick to anger?

Steve
--
Stephen Green
http://blogs.sun.com/searchguy

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Ross Walker wrote:

On Aug 4, 2009, at 8:36 PM, Carson Gaspar car...@taltos.org wrote:


Ross Walker wrote:

I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential  
write). It's a Dell PERC 6/e with 512MB onboard.

...
there, dedicated slog device with NVRAM speed. It would be even  
better to have a pair of SSDs behind the NVRAM, but it's hard to  
find compatible SSDs for these controllers, Dell currently doesn't  
even support SSDs in their RAID products :-(


Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.


Yes, but the LSI support of SSDs is on later controllers.


Sure that's not just a firmware issue ?

My PERC 6/E seems to support SSD's : 


# ./MegaCli -AdpAllInfo -a2 | grep -i ssd
Enable Copyback to SSD on SMART Error   : No
Enable SSD Patrol Read  : No
Allow SSD SAS/SATA Mix in VD : No
Allow HDD/SSD Mix in VD  : No


Controller info : 
   Versions


Product Name: PERC 6/E Adapter
Serial No   : 
FW Package Build: 6.0.3-0002

Mfg. Data

Mfg. Date   : 06/08/07
Rework Date : 06/08/07
Revision No : 
Battery FRU : N/A


Image Versions in Flash:

FW Version : 1.11.82-0473
BIOS Version   : NT13-2
WebBIOS Version: 1.1-32-e_11-Rel
Ctrl-R Version : 1.01-010B
Boot Block Version : 1.00.00.01-0008


I currently have 2 x Intel X25-E (32 GB) as dedicated slogs and 1 x
Intel X25-M (80 GB) for the L2ARC behind a PERC 6/i on my Dell R905
testbox.

So far there have been no problems with them.



-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

Ross Walker wrote:
On Aug 4, 2009, at 10:22 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us
wrote:

On Tue, 4 Aug 2009, Ross Walker wrote:
Are you sure that it is faster than an SSD? The data is indeed
pushed closer to the disks, but there may be considerably more
latency associated with getting that data into the controller
NVRAM cache than there is into a dedicated slog SSD.

I don't see how, as the SSD is behind a controller it still must
make it to the controller.

If you take a look at 'iostat -x' output you will see that the
system knows about a queue for each device. If it was any other
way, then a slow device would slow down access to all of the other
devices. If there is concern about lack of bandwidth (PCI-E?) to
the controller, then you can use a separate controller for the SSDs.

It's not bandwidth. Though with a lot of mirrors that does become a
concern.

Well the duplexing benefit you mention does hold true. That's a
complex real-world scenario that would be hard to benchmark in
production.

But easy to see the effects of.

I actually meant to say, hard to bench out of production.

Tests done by others show a considerable NFS write speed advantage
when using a dedicated slog SSD rather than a controller's NVRAM
cache.

I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential
write). It's a Dell PERC 6/e with 512MB onboard.

I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM),
but that is not very good when the network is good for 100 MB/s.
With an SSD, some other folks here are getting essentially network
speed.

In testing with ram disks I was only able to get a max of around 60MB/
s with 4k block sizes, with 4 outstanding.

I can do 64k blocks now and get around 115MB/s.

I just ran some filebench microbenchmarks against my 10 Gbit testbox
which is a Dell R905, 4 x 2.5 Ghz AMD Quad Core CPU's and 64 GB RAM.

My current pool is comprised of 7 mirror vdevs (SATA disks), 2 Intel
X25-E as slogs and 1 Intel X25-M for the L2ARC.

The pool is a MD1000 array attached to a PERC 6/E using 2 SAS cables.

The nic's are ixgbe based.

Here are the numbers :

Randomwrite benchmark - via 10Gbit NFS :
IO Summary: 4483228 ops, 73981.2 ops/s, (0/73981 r/w) 578.0mb/s, 44us cpu/op, 0.0ms latency

Randomread benchmark - via 10Gbit NFS :
IO Summary: 7663903 ops, 126467.4 ops/s, (126467/0 r/w) 988.0mb/s, 5us cpu/op,
0.0ms latency

The real question is if these numbers can be trusted - I am currently
preparing new test runs with other software to be able to do a
comparison.

There is still bus and controller plus SSD latency. I suppose one
could use a pair of disks as an slog mirror, enable NVRAM just for
those and let the others do write-through with their disk caches

But this encounters the problem that when the NVRAM becomes full
then you hit the wall of synchronous disk write performance. With
the SSD slog, the write log can be quite large and disk writes are
then done in a much more efficient ordered fashion similar to non-
sync writes.

Yes, you have a point there.

So, what SSD disks do you use?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Ross Walker wrote:

On Aug 4, 2009, at 10:17 PM, James Lever j...@jamver.id.au wrote:



On 05/08/2009, at 11:41 AM, Ross Walker wrote:


What is your recipe for these?


There wasn't one! ;)

The drive I'm using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3.


So the key is the drive needs to have the Dell badging to work?

I called my rep about getting a Dell badged SSD and he told me they  
didn't support those in MD series enclosures so therefore were  
unavailable.


If the Dell branded SSD's are Samsung's then you might want to search
the archives - if I remember correctly there were mentionings of
less-than-desired performance using them but I cannot recall the
details.



Maybe it's time for a new account rep.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-05 Thread Joseph L. Casale

Quick snipped from zpool iostat :

   mirror 1.12G   695G  0  0  0  0
 c8t12d0  -  -  0  0  0  0
 c8t13d0  -  -  0  0  0  0
   c7t2d04K  29.0G  0  1.56K  0   200M
   c7t3d04K  29.0G  0  1.58K  0   202M

The disks on c7 are both Intel X25-E 

Henrik,
So the SATA discs are in the MD1000 behind the PERC 6/E and how
have you configured/attached the 2 SSD slogs and L2ARC drive? If
I understand you, you have sued 14 of the 15 slots in the MD so
I assume you have the 3 SSD's in the R905, what controller are
they running on?

Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


Joseph L. Casale wrote:

Quick snipped from zpool iostat :

  mirror 1.12G   695G  0  0  0  0
c8t12d0  -  -  0  0  0  0
c8t13d0  -  -  0  0  0  0
  c7t2d04K  29.0G  0  1.56K  0   200M
  c7t3d04K  29.0G  0  1.58K  0   202M

The disks on c7 are both Intel X25-E 


Henrik,
So the SATA discs are in the MD1000 behind the PERC 6/E and how
have you configured/attached the 2 SSD slogs and L2ARC drive? If
I understand you, you have sued 14 of the 15 slots in the MD so
I assume you have the 3 SSD's in the R905, what controller are
they running on?


The internal PERC 6/i controller - but I've had them on the PERC 6/E
during other test runs since I have a couple of spare MD1000's at hand. 


Both controllers work well with the SSD's.


Thanks!
jlc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread Charles Baker

 My testing has shown some serious problems with the
 iSCSI implementation for OpenSolaris.
 
 I setup a VMware vSphere 4 box with RAID 10
 direct-attached storage and 3 virtual machines:
 - OpenSolaris 2009.06 (snv_111b) running 64-bit
 - CentOS 5.3 x64 (ran yum update)
 - Ubuntu Server 9.04 x64 (ran apt-get upgrade)
 
 I gave each virtual 2 GB of RAM, a 32 GB drive and
 setup a 16 GB iSCSI target on each (the two Linux vms
 used iSCSI Enterprise Target 0.4.16 with blockio).
 VMware Tools was installed on each. No tuning was
 done on any of the operating systems.
 
 I ran two tests for write performance - one one the
 server itself and one from my Mac connected via
 Gigabit (mtu of 1500) iSCSI connection using
 globalSAN’s latest initiator.
 
 Here’s what I used on the servers:
 time dd if=/dev/zero of=/root/testfile bs=1048576k
 count=4
 and the Mac OS with the iSCSI connected drive
 (formatted with GPT / Mac OS Extended journaled):
 time dd if=/dev/zero of=/Volumes/test/testfile
 bs=1048576k count=4
 
 The results were very interesting (all calculations
 using 1 MB = 1,084,756 bytes)
 
 For OpenSolaris, the local write performance averaged
 86 MB/s. I turned on lzjb compression for rpool (zfs
 set compression=lzjb rpool) and it went up to 414
 MB/s since I’m writing zeros). The average
 performance via iSCSI was an abysmal 16 MB/s (even
 with compression turned on - with it off, 13 MB/s).
 
 For CentOS (ext3), local write performance averaged
 141 MB/s. iSCSI performance was 78 MB/s (almost as
 fast as local ZFS performance on the OpenSolaris
 server when compression was turned off).
 
 Ubuntu Server (ext4) had 150 MB/s for the local
 write. iSCSI performance averaged 80 MB/s.
 
 One of the main differences between the three virtual
 machines was that the iSCSI target on the Linux
 machines used partitions with no file system. On
 OpenSolaris, the iSCSI target created sits on top of
 ZFS. That creates a lot of overhead (although you do
 get some great features).
 
 Since all the virtual machines were connected to the
 same switch (with the same MTU), had the same amount
 of RAM, used default configurations for the operating
 systems, and sat on the same RAID 10 storage, I’d say
 it was a pretty level playing field. 
 
 While jumbo frames will help iSCSI performance, it
 won’t overcome inherit limitations of the iSCSI
 target’s implementation.

cross-posting with zfs discuss.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread erik.ableson

You're running into the same problem I had with 2009.06 as they have  
corrected a bug where the iSCSI target prior to 2009.06 didn't honor  
completely SCSI sync commands issued by the initiator.


Some background :

Discussion:
http://opensolaris.org/jive/thread.jspa?messageID=388492

corrected bug
http://bugs.opensolaris.org/view_bug.do?bug_id=6770534

The upshot is that unless you have an SSD (or other high speed  
dedicated device) attached as a ZIL (or slog) on 2009.06 you won't see  
anywhere near the local speed performance that the storage is capable  
of since you're forcing individual transactions all the way down to  
disk and back up before moving onto the next SCSI block command.


This iSCSI performance profile is currently specific to 2009.06 and  
does not occur on 2008.11.  As a stopgap (since I don't have a budget  
for SSDs right now) I'm keeping my production servers on 2008.11  
(taking into account the additional potential risk, but these are  
machines with battery backed SAS cards in a conditioned data center).  
These machines are serving up iSCSI to ESX 3.5 and ESX 4 servers.


For my freewheeling home use where everything gets tried, crashed,  
patched and put back together with baling twine (and is backed up  
elsewhere...) I've mounted a RAM disk of 1Gb which is attached to the  
pool as a ZIL and you see the performance run in cycles where the ZIL  
loads up to saturation, flushes out to disk and keeps going. I did  
write a script to regularly dd the ram disk device out to a file so  
that I can recreate with the appropriate signature if I have to reboot  
the osol box. This is used with the GlobalSAN initiator on OS X as  
well as various Windows and Linux machines, physical and VM.


Assuming this is a test system that you're playing with and you can  
destroy the pool with inpunity, and you don't have an SSD lying around  
to test with, try the following :


ramdiskadm -a slog 2g (or whatever size you can manage reasonably with  
the available physical RAM - try vmstat 1 2 to determine available  
memory)

zpool add poolname log /dev/ramdisk/slog

If you want to perhaps reuse the slog later (ram disks are not  
preserved over reboot) write the slog volume out to disk and dump it  
back in after restarting.

 dd if=/dev/ramdisk/slog of=/root/slog.dd

All of the above assumes that you are not doing this stuff against  
rpool.  I think that attaching a volatile log device to your boot pool  
would result in a machine that can't mount the root zfs volume.


It's easiest to monitor from the Mac (I find) so try your test again  
with the Activity Monitor showing network traffic and you'll see that  
it goes to a wire speed ceiling while it's filling up the ZIL and once  
it's saturated your traffic will drop to near nothing, and then pick  
up again after a few seconds. If you don't saturate the ZIL you'll see  
continuous speed data transfer.


Cheers,

Erik

On 4 août 09, at 15:57, Charles Baker wrote:


My testing has shown some serious problems with the
iSCSI implementation for OpenSolaris.

I setup a VMware vSphere 4 box with RAID 10
direct-attached storage and 3 virtual machines:
- OpenSolaris 2009.06 (snv_111b) running 64-bit
- CentOS 5.3 x64 (ran yum update)
- Ubuntu Server 9.04 x64 (ran apt-get upgrade)

I gave each virtual 2 GB of RAM, a 32 GB drive and
setup a 16 GB iSCSI target on each (the two Linux vms
used iSCSI Enterprise Target 0.4.16 with blockio).
VMware Tools was installed on each. No tuning was
done on any of the operating systems.

I ran two tests for write performance - one one the
server itself and one from my Mac connected via
Gigabit (mtu of 1500) iSCSI connection using
globalSAN’s latest initiator.

Here’s what I used on the servers:
time dd if=/dev/zero of=/root/testfile bs=1048576k
count=4
and the Mac OS with the iSCSI connected drive
(formatted with GPT / Mac OS Extended journaled):
time dd if=/dev/zero of=/Volumes/test/testfile
bs=1048576k count=4

The results were very interesting (all calculations
using 1 MB = 1,084,756 bytes)

For OpenSolaris, the local write performance averaged
86 MB/s. I turned on lzjb compression for rpool (zfs
set compression=lzjb rpool) and it went up to 414
MB/s since I’m writing zeros). The average
performance via iSCSI was an abysmal 16 MB/s (even
with compression turned on - with it off, 13 MB/s).

For CentOS (ext3), local write performance averaged
141 MB/s. iSCSI performance was 78 MB/s (almost as
fast as local ZFS performance on the OpenSolaris
server when compression was turned off).

Ubuntu Server (ext4) had 150 MB/s for the local
write. iSCSI performance averaged 80 MB/s.

One of the main differences between the three virtual
machines was that the iSCSI target on the Linux
machines used partitions with no file system. On
OpenSolaris, the iSCSI target created sits on top of
ZFS. That creates a lot of overhead (although you do
get some great features).

Since all the virtual machines were connected to the
same

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

On Tue, Aug 4, 2009 at 10:40 AM, erik.ablesoneable...@mac.com wrote:
 You're running into the same problem I had with 2009.06 as they have
 corrected a bug where the iSCSI target prior to
 2009.06 didn't honor completely SCSI sync commands issued by the initiator.
 Some background :
 Discussion:
 http://opensolaris.org/jive/thread.jspa?messageID=388492
 corrected bug
 http://bugs.opensolaris.org/view_bug.do?bug_id=6770534

But this MUST happen. If it doesn't then you are playing Russian
Roulette with your data, as a kernel panic can cause a loss of up to
1/8 of the size of your system's RAM (ZFS lazy write cache) of your
iSCSI target's data!

 The upshot is that unless you have an SSD (or other high speed dedicated
 device) attached as a ZIL (or slog) on 2009.06 you won't see anywhere near
 the local speed performance that the storage is capable of since you're
 forcing individual transactions all the way down to disk and back up before
 moving onto the next SCSI block command.

Actually I recommend using a controller with an NVRAM cache on it, say
256MB-512MB (or more).

This is much faster then SSD and has the advantage that the ZIL is
stripped across the pool making ZIL reads much faster!

You don't need to use the hardware raid, export the drives as JBOD or
individual RAID0 luns and make a zpool out of them.

 This iSCSI performance profile is currently specific to 2009.06 and does not
 occur on 2008.11.  As a stopgap (since I don't have a budget for SSDs right
 now) I'm keeping my production servers on 2008.11 (taking into account the
 additional potential risk, but these are machines with battery backed SAS
 cards in a conditioned data center). These machines are serving up iSCSI to
 ESX 3.5 and ESX 4 servers.

God I hope not.

Tick-tock, eventually you will corrupt your iSCSI data with that
setup, it's not a matter of if, it's a matter of when.

 For my freewheeling home use where everything gets tried, crashed, patched
 and put back together with baling twine (and is backed up elsewhere...) I've
 mounted a RAM disk of 1Gb which is attached to the pool as a ZIL and you see
 the performance run in cycles where the ZIL loads up to saturation, flushes
 out to disk and keeps going. I did write a script to regularly dd the ram
 disk device out to a file so that I can recreate with the appropriate
 signature if I have to reboot the osol box. This is used with the GlobalSAN
 initiator on OS X as well as various Windows and Linux machines, physical
 and VM.
 Assuming this is a test system that you're playing with and you can destroy
 the pool with inpunity, and you don't have an SSD lying around to test with,
 try the following :
 ramdiskadm -a slog 2g (or whatever size you can manage reasonably with the
 available physical RAM - try vmstat 1 2 to determine available memory)
 zpool add poolname log /dev/ramdisk/slog
 If you want to perhaps reuse the slog later (ram disks are not preserved
 over reboot) write the slog volume out to disk and dump it back in after
 restarting.
  dd if=/dev/ramdisk/slog of=/root/slog.dd

You might as well use a ramdisk ZIL in production too with 2008.11 ZVOLs.

 All of the above assumes that you are not doing this stuff against rpool.  I
 think that attaching a volatile log device to your boot pool would result in
 a machine that can't mount the root zfs volume.

I think you can re-create the ramdisk and do a replace to bring the pool online.

Just don't do it with your rpool or you will be in a world of hurt.

 It's easiest to monitor from the Mac (I find) so try your test again with
 the Activity Monitor showing network traffic and you'll see that it goes to
 a wire speed ceiling while it's filling up the ZIL and once it's saturated
 your traffic will drop to near nothing, and then pick up again after a few
 seconds. If you don't saturate the ZIL you'll see continuous speed data
 transfer.

I also use a network activity monitor for quick estimate of throughput
while running. Works good on Mac, Windows (Task Manager) or Linux
(variety, GUI sysstat, ntop, etc).

-Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

On Tue, Aug 4, 2009 at 9:57 AM, Charles Bakerno-re...@opensolaris.org wrote:
 My testing has shown some serious problems with the
 iSCSI implementation for OpenSolaris.

 I setup a VMware vSphere 4 box with RAID 10
 direct-attached storage and 3 virtual machines:
 - OpenSolaris 2009.06 (snv_111b) running 64-bit
 - CentOS 5.3 x64 (ran yum update)
 - Ubuntu Server 9.04 x64 (ran apt-get upgrade)

 I gave each virtual 2 GB of RAM, a 32 GB drive and
 setup a 16 GB iSCSI target on each (the two Linux vms
 used iSCSI Enterprise Target 0.4.16 with blockio).
 VMware Tools was installed on each. No tuning was
 done on any of the operating systems.

 I ran two tests for write performance - one one the
 server itself and one from my Mac connected via
 Gigabit (mtu of 1500) iSCSI connection using
 globalSAN’s latest initiator.

 Here’s what I used on the servers:
 time dd if=/dev/zero of=/root/testfile bs=1048576k
 count=4
 and the Mac OS with the iSCSI connected drive
 (formatted with GPT / Mac OS Extended journaled):
 time dd if=/dev/zero of=/Volumes/test/testfile
 bs=1048576k count=4

 The results were very interesting (all calculations
 using 1 MB = 1,084,756 bytes)

 For OpenSolaris, the local write performance averaged
 86 MB/s. I turned on lzjb compression for rpool (zfs
 set compression=lzjb rpool) and it went up to 414
 MB/s since I’m writing zeros). The average
 performance via iSCSI was an abysmal 16 MB/s (even
 with compression turned on - with it off, 13 MB/s).

 For CentOS (ext3), local write performance averaged
 141 MB/s. iSCSI performance was 78 MB/s (almost as
 fast as local ZFS performance on the OpenSolaris
 server when compression was turned off).

 Ubuntu Server (ext4) had 150 MB/s for the local
 write. iSCSI performance averaged 80 MB/s.

 One of the main differences between the three virtual
 machines was that the iSCSI target on the Linux
 machines used partitions with no file system. On
 OpenSolaris, the iSCSI target created sits on top of
 ZFS. That creates a lot of overhead (although you do
 get some great features).

 Since all the virtual machines were connected to the
 same switch (with the same MTU), had the same amount
 of RAM, used default configurations for the operating
 systems, and sat on the same RAID 10 storage, I’d say
 it was a pretty level playing field.

 While jumbo frames will help iSCSI performance, it
 won’t overcome inherit limitations of the iSCSI
 target’s implementation.

If you want to host your VMs from Solaris (Open or not) use NFS right
now as the iSCSI implementation is still quite a bit immature and
won't perform nearly as good as the Linux implementation. Until
comstar stabilizes and replaces iscsitgt I would hold off on iSCSI on
Solaris.

-Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

On Tue, Aug 4, 2009 at 11:21 AM, Ross Walkerrswwal...@gmail.com wrote:
 On Tue, Aug 4, 2009 at 9:57 AM, Charles Bakerno-re...@opensolaris.org wrote:
 My testing has shown some serious problems with the
 iSCSI implementation for OpenSolaris.

 I setup a VMware vSphere 4 box with RAID 10
 direct-attached storage and 3 virtual machines:
 - OpenSolaris 2009.06 (snv_111b) running 64-bit
 - CentOS 5.3 x64 (ran yum update)
 - Ubuntu Server 9.04 x64 (ran apt-get upgrade)

 I gave each virtual 2 GB of RAM, a 32 GB drive and
 setup a 16 GB iSCSI target on each (the two Linux vms
 used iSCSI Enterprise Target 0.4.16 with blockio).
 VMware Tools was installed on each. No tuning was
 done on any of the operating systems.

 I ran two tests for write performance - one one the
 server itself and one from my Mac connected via
 Gigabit (mtu of 1500) iSCSI connection using
 globalSAN’s latest initiator.

 Here’s what I used on the servers:
 time dd if=/dev/zero of=/root/testfile bs=1048576k
 count=4
 and the Mac OS with the iSCSI connected drive
 (formatted with GPT / Mac OS Extended journaled):
 time dd if=/dev/zero of=/Volumes/test/testfile
 bs=1048576k count=4

 The results were very interesting (all calculations
 using 1 MB = 1,084,756 bytes)

 For OpenSolaris, the local write performance averaged
 86 MB/s. I turned on lzjb compression for rpool (zfs
 set compression=lzjb rpool) and it went up to 414
 MB/s since I’m writing zeros). The average
 performance via iSCSI was an abysmal 16 MB/s (even
 with compression turned on - with it off, 13 MB/s).

 For CentOS (ext3), local write performance averaged
 141 MB/s. iSCSI performance was 78 MB/s (almost as
 fast as local ZFS performance on the OpenSolaris
 server when compression was turned off).

 Ubuntu Server (ext4) had 150 MB/s for the local
 write. iSCSI performance averaged 80 MB/s.

 One of the main differences between the three virtual
 machines was that the iSCSI target on the Linux
 machines used partitions with no file system. On
 OpenSolaris, the iSCSI target created sits on top of
 ZFS. That creates a lot of overhead (although you do
 get some great features).

 Since all the virtual machines were connected to the
 same switch (with the same MTU), had the same amount
 of RAM, used default configurations for the operating
 systems, and sat on the same RAID 10 storage, I’d say
 it was a pretty level playing field.

 While jumbo frames will help iSCSI performance, it
 won’t overcome inherit limitations of the iSCSI
 target’s implementation.

 If you want to host your VMs from Solaris (Open or not) use NFS right
 now as the iSCSI implementation is still quite a bit immature and
 won't perform nearly as good as the Linux implementation. Until
 comstar stabilizes and replaces iscsitgt I would hold off on iSCSI on
 Solaris.

This sounds crazy, but I was wondering if someone has tried running
Linux iSCSI from within a domU in Xen on OpenSolaris 2009.06 to a ZVOL
on dom0.

Of course the zpool still needs NVRAM or SSD ZIL to perform well, but
if the Xen dom0 is stable it and the crossbow networking works well,
this could allow the best of both worlds.

-Ross
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread Bob Friesenhahn


On Tue, 4 Aug 2009, Ross Walker wrote:


But this MUST happen. If it doesn't then you are playing Russian
Roulette with your data, as a kernel panic can cause a loss of up to
1/8 of the size of your system's RAM (ZFS lazy write cache) of your
iSCSI target's data!


The actual risk (with recent zfs) seems to be 7/8ths RAM (not 1/8), 
sufficient data to accomplish 5 seconds of 100% write, or up to 30 
seconds of aggregation time.  On a large memory system with high 
performance I/O, this can represent a huge amount (gigabytes) of data.



Actually I recommend using a controller with an NVRAM cache on it, say
256MB-512MB (or more).

This is much faster then SSD and has the advantage that the ZIL is
stripped across the pool making ZIL reads much faster!


Are you sure that it is faster than an SSD?  The data is indeed pushed 
closer to the disks, but there may be considerably more latency 
associated with getting that data into the controller NVRAM cache than 
there is into a dedicated slog SSD.  Remember that only synchronous 
writes go into the slog but all writes must pass through the 
controller's NVRAM, and so synchronous writes may need to wait for 
other I/Os to make it to controller NVRAM cache before their turn 
comes.  There may also be read requests queued in the same I/O channel 
which are queued before the synchronous write request.


Tests done by others show a considerable NFS write speed advantage 
when using a dedicated slog SSD rather than a controller's NVRAM 
cache.


The slog SSD is a dedicated function device so there is minimal 
access latency.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

On Aug 4, 2009, at 1:35 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
 wrote:



On Tue, 4 Aug 2009, Ross Walker wrote:


But this MUST happen. If it doesn't then you are playing Russian
Roulette with your data, as a kernel panic can cause a loss of up to
1/8 of the size of your system's RAM (ZFS lazy write cache) of your
iSCSI target's data!


The actual risk (with recent zfs) seems to be 7/8ths RAM (not 1/8),  
sufficient data to accomplish 5 seconds of 100% write, or up to 30  
seconds of aggregation time.  On a large memory system with high  
performance I/O, this can represent a huge amount (gigabytes) of data.


Yikes! Worse then I thought.

Actually I recommend using a controller with an NVRAM cache on it,  
say

256MB-512MB (or more).

This is much faster then SSD and has the advantage that the ZIL is
stripped across the pool making ZIL reads much faster!


Are you sure that it is faster than an SSD?  The data is indeed  
pushed closer to the disks, but there may be considerably more  
latency associated with getting that data into the controller NVRAM  
cache than there is into a dedicated slog SSD.


I don't see how, as the SSD is behind a controller it still must make  
it to the controller.


Remember that only synchronous writes go into the slog but all  
writes must pass through the controller's NVRAM, and so synchronous  
writes may need to wait for other I/Os to make it to controller  
NVRAM cache before their turn comes.  There may also be read  
requests queued in the same I/O channel which are queued before the  
synchronous write request.


Well the duplexing benefit you mention does hold true. That's a  
complex real-world scenario that would be hard to benchmark in  
production.


Tests done by others show a considerable NFS write speed advantage  
when using a dedicated slog SSD rather than a controller's NVRAM  
cache.


I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential  
write). It's a Dell PERC 6/e with 512MB onboard.


The slog SSD is a dedicated function device so there is minimal  
access latency.


There is still bus and controller plus SSD latency. I suppose one  
could use a pair of disks as an slog mirror, enable NVRAM just for  
those and let the others do write-through with their disk caches  
enabled and there, dedicated slog device with NVRAM speed. It would be  
even better to have a pair of SSDs behind the NVRAM, but it's hard to  
find compatible SSDs for these controllers, Dell currently doesn't  
even support SSDs in their RAID products :-(


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread Mr liu

What shall I do ? my server is not support ssd . go back to use 0811 ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread James Lever



On 05/08/2009, at 10:36 AM, Carson Gaspar wrote:

Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.


Yep, it's a mega raid device.

I have been using one with a Samsung SSD in RAID0 mode (to avail  
myself of the cache) recently with great success.


cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


On Aug 4, 2009, at 8:36 PM, Carson Gaspar car...@taltos.org wrote:


Ross Walker wrote:

I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential  
write). It's a Dell PERC 6/e with 512MB onboard.

...
there, dedicated slog device with NVRAM speed. It would be even  
better to have a pair of SSDs behind the NVRAM, but it's hard to  
find compatible SSDs for these controllers, Dell currently doesn't  
even support SSDs in their RAID products :-(


Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.


Yes, but the LSI support of SSDs is on later controllers.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


On Aug 4, 2009, at 9:18 PM, James Lever j...@jamver.id.au wrote:



On 05/08/2009, at 10:36 AM, Carson Gaspar wrote:

Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.


Yep, it's a mega raid device.

I have been using one with a Samsung SSD in RAID0 mode (to avail  
myself of the cache) recently with great success.


Which model?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread James Lever



On 05/08/2009, at 11:36 AM, Ross Walker wrote:


Which model?


PERC 6/E w/512MB BBWC.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread James Lever



On 05/08/2009, at 11:41 AM, Ross Walker wrote:


What is your recipe for these?


There wasn't one! ;)

The drive I'm using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3.

cheers,
James

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


On Aug 4, 2009, at 9:55 PM, Carson Gaspar car...@taltos.org wrote:


Ross Walker wrote:

On Aug 4, 2009, at 8:36 PM, Carson Gaspar car...@taltos.org wrote:


Isn't the PERC 6/e just a re-branded LSI? LSI added SSD support  
recently.

Yes, but the LSI support of SSDs is on later controllers.


Please cite your source for that statement.

The PERC 6/e is an LSI 1078. The LSI web site has firmware updates  
that explicitly reference SSDs for that chip. See:


http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0013_SAS_FW_Image_APP-1.40.42-0615.txt
http://www.lsi.com/DistributionSystem/AssetDocument/11.0.1-0008_SAS_FW_Image_APP-1.40.32-0580.txt

Yes, they _say_ they're only for the LSI 8[78]xx cards, but they  
should work with _any_ 1078 based controller. To quote the above:


Command syntax:  MegaCli -adpfwflash -f SAS1078_FW_Image.rom -a0


I tried that and while it looked like it took (bios reported lsi firm  
ver) it still didn't recognize my SSDs after the reboot.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906

2009-08-04 Thread Bob Friesenhahn


On Tue, 4 Aug 2009, Ross Walker wrote:
Are you sure that it is faster than an SSD?  The data is indeed pushed 
closer to the disks, but there may be considerably more latency associated 
with getting that data into the controller NVRAM cache than there is into a 
dedicated slog SSD.


I don't see how, as the SSD is behind a controller it still must make it to 
the controller.


If you take a look at 'iostat -x' output you will see that the system 
knows about a queue for each device.  If it was any other way, then a 
slow device would slow down access to all of the other devices.  If 
there is concern about lack of bandwidth (PCI-E?) to the controller, 
then you can use a separate controller for the SSDs.


Well the duplexing benefit you mention does hold true. That's a complex 
real-world scenario that would be hard to benchmark in production.


But easy to see the effects of.

Tests done by others show a considerable NFS write speed advantage when 
using a dedicated slog SSD rather than a controller's NVRAM cache.


I get pretty good NFS write speeds with NVRAM (40MB/s 4k sequential write). 
It's a Dell PERC 6/e with 512MB onboard.


I get 47.9 MB/s (60.7 MB/s peak) here too (also with 512MB NVRAM), but 
that is not very good when the network is good for 100 MB/s.  With an 
SSD, some other folks here are getting essentially network speed.


There is still bus and controller plus SSD latency. I suppose one 
could use a pair of disks as an slog mirror, enable NVRAM just for 
those and let the others do write-through with their disk caches


But this encounters the problem that when the NVRAM becomes full then 
you hit the wall of synchronous disk write performance.  With the SSD 
slog, the write log can be quite large and disk writes are then done 
in a much more efficient ordered fashion similar to non-sync writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906


On Aug 4, 2009, at 10:17 PM, James Lever j...@jamver.id.au wrote:



On 05/08/2009, at 11:41 AM, Ross Walker wrote:


What is your recipe for these?


There wasn't one! ;)

The drive I'm using is a Dell badged Samsung MCCOE50G5MPQ-0VAD3.


So the key is the drive needs to have the Dell badging to work?

I called my rep about getting a Dell badged SSD and he told me they  
didn't support those in MD series enclosures so therefore were  
unavailable.


Maybe it's time for a new account rep.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pool iscsi /zfs performance in opensolaris 0906