[zfs-discuss] Best way/issues with large ZFS send?

2011-02-16 Thread Eff Norwood
I'm preparing to replicate about 200TB of data between two data centers using 
zfs send. We have ten 10TB zpools that are further broken down into zvols of 
various sizes in each data center. One DC is primary and the other will be the 
replication target and there is plenty of bandwidth between them (10 gig dark 
fiber).

Are there any gotchas that I should be aware of? Also, at what level should I 
be taking the snapshot to do the zfs send? At the primary pool level or at the 
zvol level? Since the targets are to be exact replicas, I presume at the 
primary pool level (e.g. tank) rather than for every zvol (e.g. 
tank/prod/vol1)?

This is all using Solaris 11 Express, snv_151a.

Thanks,

Eff
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lower latency ZIL Option?: SSD behind Controller BB Write Cache

2011-01-27 Thread Eff Norwood
We tried all combinations of OCZ SSDs including their PCI based SSDs and they 
do NOT work as a ZIL. After a very short time performance degrades horribly and 
for the OCZ drives they eventually fail completely. We also tried Intel which 
performed a little better and didn't flat out fail over time, but these still 
did not work out as a ZIL. We use the DDRdrive X1 now for all of our ZIL 
applications and could not be happier. The cards are great, support is great 
and performance is incredible. We use them to provide NFS storage to 50K VMWare 
VDI users. As you stated, the DDRdrive is ideal. Go with that and you'll be 
very happy you did!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lower latency ZIL Option?: SSD behind Controller BB Write Cache

2011-01-27 Thread Eff Norwood
They have been incredibly reliable with zero downtime or issues. As a result, 
we use 2 in every system striped. For one application outside of VDI, we use a 
pair of them mirrored, but that is very unusual and driven by the customer and 
not us.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Does a zvol use the zil?

2010-10-21 Thread Eff Norwood
Let me frame this in the context specifically of VMWare ESXi 4.x. If I create a 
zvol and give it to ESXi via iSCSI our experience has been that it is very fast 
and guest response is excellent. If we use NFS without a zil (we use DDRdrive 
X1==awesome) because VMWare uses sync (Stable = FSYNC) writes NFS performance 
is not very good. Once we enable our zil accelerator, NFS performance is 
approximately as fast as iSCSI. Enabling or disabling the zil has no measurable 
impact on iSCSI performance for us.

Does a zvol use the zil then or not? If it does, then iSCSI performance seems 
like it should also be slower without a zil accelerator but it's not. If it 
doesn't, then is it true that if the power goes off when I'm doing a write to 
iSCSI and I have no battery backed HBA or RAID card I'll lose data?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD partitioned into multiple L2ARC read cache

2010-10-19 Thread Eff Norwood
We tried this in our environment and found that it didn't work out. The more 
partitions we used, the slower it went. We decided just to use the entire SSD 
as a read cache and it worked fine. Still has the TRIM issue of course until 
the next version.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-12 Thread Eff Norwood
The NFS client in this case was VMWare ESXi 4.1 release build. What happened is 
that the file uploader behavior was changed in 4.1 to prevent I/O contention 
with the VM guests. That means when you go to upload something to the 
datastore, it only sends chunks of the file instead of streaming it all at once 
like it did in ESXi 4.0. To end users, something appeared to be broken because 
file uploads now took 95 seconds instead of 30. Turns out that is by design in 
4.1. This is the behavior *only* for the uploader and not for the VM guests. 
Their I/O is as expected.

I have to say as a side note, the DDRdrive X1s make a day and night difference 
with VMWare. If you use VMWare via NFS, I highly recommend the X1s as the ZIL. 
Otherwise the VMWare O_SYNC (Stable = FSYNC) will kill your performance dead. 
We also tried SSDs as the ZIL which worked ok until they got full, then 
performance tanked. As I have posted before, SSDs as your ZIL - don't do it!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-07 Thread Eff Norwood
The NFS client that we're using always uses O_SYNC, which is why it was 
critical for us to use the DDRdrive X1 as the ZIL. I was unclear on the entire 
system we're using, my apologies. It is:

OpenSolaris SNV_134
Motherboard: SuperMicro X8DAH
RAM: 72GB
CPU: Dual Intel 5503 @ 2.0GHz
ZIL: DDRdrive X1 (two of these, independent and not mirrored)
Drives: 24 x Seagate 1TB SAS, 7200 RPM
Network connected via 3 x gigabit links as LACP + 1 gigabit backup, IPMP on top 
of those.

The output I posted is from zpool iostat and I used that because it corresponds 
to what users are seeing. Whenever zpool iostat shows write activity, the file 
copies to the system are working as expected. As soon as zpool iostat shows no 
activity, the writes all pause. The simple test case is to copy a cd-rom ISO 
image to the server while doing the zpool iostat.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-07 Thread Eff Norwood
Figured it out - it was the NFS client. I used snoop and then some dtrace magic 
to prove that the client (which was using O_SYNC) was sending very bursty 
requests to the system. I tried a number of other NFS clients with O_SYNC as 
well and got excellent performance when they were configured correctly. Just 
for fun I disabled the DDRdrive X1 (pair of them) that I use for the ZIL and 
performance tanked across the board when using O_SYNC. I can't recommend the 
DDRdrive X1 enough as a ZIL! Here is a great article on this behavior here: 
http://blogs.sun.com/brendan/entry/slog_screenshots

Thanks for the help all!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bursty writes - why?

2010-10-06 Thread Eff Norwood
I have a 24 x 1TB system being used as an NFS file server. Seagate SAS disks 
connected via an LSI 9211-8i SAS controller, disk layout 2 x 11 disk RAIDZ2 + 2 
spares. I am using 2 x DDR Drive X1s as the ZIL. When we write anything to it, 
the writes are always very bursty like this:

ool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0232  0  29.0M
xpool488K  20.0T  0101  0  12.7M
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0 50  0  6.37M
xpool488K  20.0T  0477  0  59.7M
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool488K  20.0T  0  0  0  0
xpool   74.7M  20.0T  0702  0  76.2M
xpool   74.7M  20.0T  0577  0  72.2M
xpool   74.7M  20.0T  0110  0  13.9M
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0
xpool   74.7M  20.0T  0  0  0  0

Whenever you see 0 the write is just hanging. What I would like to see is at 
least some writing happening every second. What can I look at for this issue?

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-30 Thread Eff Norwood
As I said, please by all means try it and post your benchmarks for first hour, 
first day and first week and then first month. The data will be of interest to 
you. On a subjective basis, if you feel that an SSD is working just fine as 
your ZIL, run with it. Good luck!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-28 Thread Eff Norwood
I can't think of an easy way to measure pages that have not been consumed since 
it's really an SSD controller function which is obfuscated from the OS, and add 
the variable of over provisioning on top of that. If anyone would like to 
really get into what's going on inside of an SSD that makes it a bad choice for 
a ZIL, you can start here:

http://en.wikipedia.org/wiki/TRIM_%28SSD_command%29

and

http://en.wikipedia.org/wiki/Write_amplification

Which will be more than you might have ever wanted to know. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-27 Thread Eff Norwood
Saso is correct - ESX/i always uses F_SYNC for all writes and that is for sure 
your performance killer. Do a snoop | grep sync and you'll see the sync write 
calls from VMWare. We use DDRdrives in our production VMWare storage and they 
are excellent for solving this problem. Our cluster supports 50,000 users and 
we've had no issues at all. Do not use an SSD for the ZIL - as soon as it fills 
up you will be very unhappy.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-27 Thread Eff Norwood
David asked me what I meant by filled up. If you make the unwise decision to 
use an SSD as your ZIL, at some point days to weeks after you install it, all 
of the pages will be allocated and you will suddenly find the device to be 
slower than a conventional disk drive. This is due to the way SSDs work. A 
great write up about how this works is here:

http://www.anandtech.com/show/2738/8

The industry work around for this issue is called TRIM and AFAIK the current 
implementation of TRIM in Solaris does not work for ZIL devices, only for pool 
devices. If it does, then SSDs would not be a bad option, but the DDRdrive is 
so much better I wouldn't waste the time. If you don't believe me, try it and 
post your benchmarks for hour one, day one and week one. ;)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VM's on ZFS - 7210

2010-08-27 Thread Eff Norwood
By all means please try it to validate it yourself and post your results from 
hour one, day one and week one. In a ZIL use case, although the data set is 
small it is always writing a small ever changing (from the SSDs perspective) 
data set. The SSD does not know to release previously written pages and without 
TRIM there is no way to tell it to. That means every time a ZIL write happens, 
new SSD pages are consumed. After some amount of time, all of those empty pages 
will become consumed and the SSD will now have to go into the read-erase-write 
cycle which is incredibly slow and the whole point of TRIM.

I can assure you from my extensive benchmarking with all major SSDs in the role 
of a ZIL you will eventually not be happy. Depending on your use case it might 
take months, but eventually all those free pages will be consumed and 
read-erase-write is how the SSD world works after that - unless you have TRIM, 
which we don't yet.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and VMware

2010-08-13 Thread Eff Norwood
Don't waste your time with something other than the DDRdrive for NFS ZIL. If 
it's RAM based it might work, but why risk it and if it's an SSD forget it. No 
SSD will work well for the ZIL long term. Short term the only SSD to consider 
would be Intel, but again long term even that will not work out for you. The 
100% write characteristics of the ZIL are an SSDs worst case scenario 
especially without TRIM support. We have tried them all - Samsung, SanDisk, OCZ 
and none of those worked out. In particular, anything Sandforce 1500 based was 
the worst so avoid those at all costs if you dare to try an SSD ZIL. Don't. :)

As for the queue depths, here's the command from the ZFS Evil Tuning Guide:

echo zfs_vdev_max_pending/W0t10 | mdb -kw

The W0t10 command is what to change. W0t35 (35 seconds) was the old value, 10 
is the new one. For our NFS environment, we found W0t2 was the best by looking 
at the actual IO using dtrace scripts. Email me if you want those scripts. They 
are here, but need to be edited before they work:

http://blogs.sun.com/chrisg/entry/latency_bubble_in_your_io
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and VMware

2010-08-12 Thread Eff Norwood
We are doing NFS in VMWare 4.0U2 production, 50K users using OpenSolaris 
SNV_134 on SuperMicro boxes with SATA drives. Yes, I am crazy. Our experience 
has been that iSCSI for ESXi 4.x is fast and works well with minimal fussing 
until there is a problem. When that problem happens, getting to data on VMFS 
LUNs even with the free java VMFS utility to do so is problematic at best and 
game over at worst.

With NFS, data access in problem situations is a non event. Snapshots happen 
and everyone is happy. The problem with it is the VMWare NFS client which makes 
every write an F_SYNC write. That kills NFS performance dead. To get around 
that, we're using DDRdrive X1s for our ZIL and the problem is solved. I have 
not looked at the NFS client changes in 4.1, perhaps it's better or at least 
tuneable now.

I would recommend NFS as the overall strategy, but you must get a good ZIL 
device to make that happen. Do not disable the ZIL. Do make sure you set your 
I/O queue depths correctly.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best usage of SSD-disk in ZFS system

2010-08-06 Thread Eff Norwood
Our experience has been that a new out of the box SSD works well for the ZIL 
but as soon as it's completely full, performance drops to slower than a regular 
SAS hard drive due to the write performance penalty in their fundamental 
design, their LBA map strategy and the not yet released (to me at least) TRIM 
support in OpenSolaris. Considering this, we only use (safe) DRAM based 
products for our ZILs like the DDRDrive X1 which is incredible. For the L2ARC, 
SSDs are ok but again, once they are full you still incur the write performance 
penalty when writing to it so you can read from it which is also slower than a 
regular SAS drive in some cases. More RAM costs more but works better.

I would recommend that you test not just initially but for at least a week. 
That should give you time to fill up the drive and see what I call its native 
performance which means how fast it can do the read-erase-write operation under 
full load.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Victor L could you *please* help me with ZFS bug #6924390 - dedup hosed it

2010-02-11 Thread Eff Norwood
OpenSolaris snv_131 (problem also in snv_132 still) on an X4500, bug #6924390

Victor,

I see in researching this issue that you know ZFS really well. I would 
appreciate your help so much and this problem seems interesting.

I created a large zpool named xpool and then created 3 filesystems on that pool 
called vms, bkp and alt. Of course I enabled dedup for the entire zpool - why 
not. And then today we decided to delete bkp which was a 13TB filesystem with 
around 900GB of data in it. And now I am very familiar with bug #6924390.

When I try to import, of course it seems to hang but it's really just going 
really slowly. Someone from OpenSolaris calculated it might take 2 weeks to 
import that volume so, now I know why it's a bug.

My main objective is to be able to rescue the data in vms - none of the rest 
matters. I am currently booted into snv_132 via the ILOM and can boot it to the 
network for ssh as well. Thank you very much in advance!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss