subject:"\[zfs\-discuss\] Interesting experience with Nexenta \- anyone seen it\?"

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-22 Thread Bob Friesenhahn


On Fri, 21 May 2010, David Dyer-Bennet wrote:


To be comfortable (I don't ask for know for a certainty; I'm not sure
that exists outside of faith), I want a claim by the manufacturer and
multiple outside tests in significant journals -- which could be the
blog of somebody I trusted, as well as actual magazines and such.
Ideally, certainly if it's important, I'd then verify the tests myself.


For me, know for a certainty means that the feature is clearly 
specified in the formal specification sheet for the product, and the 
vendor has historically published reliable specification sheets. 
This may not be the same as money in the bank, but it is better than 
relying on thoughts from some blog posting.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-22 Thread Bob Friesenhahn


On Fri, 21 May 2010, Brandon High wrote:


My understanding is that the controller contains enough cache to
buffer enough data to write a complete erase block size, eliminating
the need to read / erase / write that a partial block write entails.
It's reported to do a copy-on-write, so it doesn't need to do a read
of existing blocks when making changes, which gives it such high iops
- Even random writes are turned into sequential writes (much like how
ZFS works) of entire erase blocks. The excessive spare area is used to
ensure that there are always full pages free to write to. (Some
vendors are releasing consumer drives with 60/120/240 GB, using 7%
reserved space rather than the 27% that the original drives ship
with.)


FLASH is useless as working space since it does not behave like RAM so 
every SSD needs to have some RAM for temporary storage of data.  This 
COW approach seems nice except that it would appear to inflate 
performance by only considering a specific magic block size and 
alignment.  Other block sizes and alignments would require that 
existing data be read so that the new block content can be 
constructed.  Also, the blazing fast write speed (which depends on 
plenty of already erased blocks) would stop once the spare space in 
the SSD has been consumed.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-22 Thread Bob Friesenhahn


On Fri, 21 May 2010, Don wrote:

You know- it would probably be sufficient to provide the SSD with 
_just_ a big capacitor bank. If the host lost power it would stop 
writing and if the SSD still had power it would probably use the 
idle time to flush it's buffers. Then there would be world peace!


This makes the assumption that an SSD will want to flush its write 
cache as soon as possible rather than just letting it sit there 
waiting for more data.  This is probably not a good assumption.  If 
the OS sends 512 bytes of data but the SSD block size is 4K, it is 
reasonable for the SSD to wait for 3584 more contiguous bytes of data 
before it bothers to write anything.


Writes increase the wear on the flash and writes require a slow erase 
cycle so it is reasonable for SSDs to buffer as much data in their 
write cache as possible before writing anything.  An advanced SSD 
could write non-contiguous sectors in a SSD page and then use a sort 
of lookup table to know where the sectors actually are.  Regardless, 
under slow write conditions, it is is definitely valuable to buffer 
the data for a while in the hope that more related data will appear, 
or the data might even be overwritten.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Miika Vesti


If you do not care about this NFS problem (or the others) then maybe
you can just disable the ZIL.  It is a matter of working through step
1.  Working through STEP 1 might be ``doesn't affect us.  Disable
ZIL.''  Or it might be ``get slog with supercap''.  STEP 1 will never
be ``plug in OCZ Vertex cheaposlog that ignores cacheflush'' if you
are doing it right.  And Step 2 has nothing to do with anything yet
until we finish STEP 1 and the insane failure cases.


AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile 
NAND grid. Whether it respects or ignores the cache flush seems irrelevant.


There has been previous discussion about this: 
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702


I'm pretty sure that all SandForce-based SSDs don't use DRAM as their
cache, but take a hunk of flash to use as scratch space instead. Which
means that they'll be OK for ZIL use.

Also:
http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html

Another benefit of SandForce's architecture is that the SSD keeps 
information on the NAND grid and removes the need for a separate cache 
buffer DRAM module. The result is a faster transaction, albeit at the 
expense of total storage capacity.


So if I interpret them correctly, what they chose to do with the 
current incarnation of the architecture is actually reserve some of the 
primary memory capacity for I/O transaction management.


In plain English, if the system gets interrupted either by power or by 
a crash, when it initializes the next time, it can read from its 
transaction space and resume where it left off. This makes it durable.


So, OCZ Vertex 2 seems to be a good choice for ZIL.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Attila Mravik

AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND
grid. Whether it respects or ignores the cache flush seems irrelevant.

There has been previous discussion about this:
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702

I'm pretty sure that all SandForce-based SSDs don't use DRAM as their
cache, but take a hunk of flash to use as scratch space instead. Which
means that they'll be OK for ZIL use.

Also:
http://www.techspot.com/news/37729-ocz-vertex-2-pro-100gb-ssd-review.html

Another benefit of SandForce's architecture is that the SSD keeps
information on the NAND grid and removes the need for a separate cache
buffer DRAM module. The result is a faster transaction, albeit at the
expense of total storage capacity.

So if I interpret them correctly, what they chose to do with the current
incarnation of the architecture is actually reserve some of the primary
memory capacity for I/O transaction management.

In plain English, if the system gets interrupted either by power or by a
crash, when it initializes the next time, it can read from its transaction
space and resume where it left off. This makes it durable.

Here is a detailed explanation of the SandForce controllers:
http://www.anandtech.com/show/3661/understanding-sandforces-sf1200-sf1500-not-all-drives-are-equal

So the SF-1500 is enterprise class and relies on a supercap, the
SF-1200 is consumer class and does not rely on a supercap.

The SF-1200 firmware on the other hand doesn’t assume the presence of
a large capacitor to keep the controller/NAND powered long enough to
complete all writes in the event of a power failure. As such it does
more frequent check pointing and doesn’t guarantee the write in
progress will complete before it’s acknowledged.

As I understand it, the SF-1200 will ack the sync write only after it
is written to flash thus reducing write performance.

There is an interesting part about firmwares and OCZ having an
exclusive firmware in the Vertex 2 series which based on the SF-1200
but its random write IOPS is not capped at 10K (while other vendors
and other SSDs from OCZ using the SF-1200 are capped, unless they sell
the drive with the RC firmware which is for OEM evaluation and not
production ready but does not contain the IOPS cap).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Kyle McDonald

SNIP a whole lot of ZIL/SLOG discussion

Hi guys.

yep I know about the ZIL, and SSD Slogs.

While setting Nextenta up it offered to disable the ZIL entirely. For
now I left it on. In the end (hopefully for only specifc filesystems -
once that feature is released.) I'll end up disabling the ZIL for our
software builds since:

1) The builds are disposable - We only need to save them if they finish,
and we can restart them if needed.
2) The build servers are not on UPS so a power failure is likely to make
the clients lose all state and need to restart anyway.

But, This issue I've seen with Nexenta, is not due to the ZIL. It runs
until it literally crashes the machine. It's not just slow, It brings
the machine to it's knees. I beleive it does have something to do with
exhausting memory though. As Erast says it maybe the IPS driver (though
I've used that on b130 of SXCE without issues,) or who knows what else.

I did download some updates from Nexenta yesterday. I'm going to try to
retest today or tomorrow.

 -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Bob Friesenhahn


On Fri, 21 May 2010, Miika Vesti wrote:

AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile NAND 
grid. Whether it respects or ignores the cache flush seems irrelevant.


There has been previous discussion about this: 
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702


I'm pretty sure that all SandForce-based SSDs don't use DRAM as their
cache, but take a hunk of flash to use as scratch space instead. Which
means that they'll be OK for ZIL use.

So, OCZ Vertex 2 seems to be a good choice for ZIL.


There seem to be quite a lot of blind assumptions in the above.  The 
only good choice for ZIL is when you know for a certainty and not 
assumptions based on 3rd party articles and blog postings.  Otherwise 
it is like assuming that if you jump through an open window that there 
will be firemen down below to catch you.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread David Dyer-Bennet


On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote:
 On Fri, 21 May 2010, Miika Vesti wrote:

 AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile
 NAND
 grid. Whether it respects or ignores the cache flush seems irrelevant.

 There has been previous discussion about this:
 http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702

 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their
 cache, but take a hunk of flash to use as scratch space instead. Which
 means that they'll be OK for ZIL use.

 So, OCZ Vertex 2 seems to be a good choice for ZIL.

 There seem to be quite a lot of blind assumptions in the above.  The
 only good choice for ZIL is when you know for a certainty and not
 assumptions based on 3rd party articles and blog postings.  Otherwise
 it is like assuming that if you jump through an open window that there
 will be firemen down below to catch you.

Just how DOES one know something for a certainty, anyway?  I've seen LOTS
of people mess up performance testing in ways that gave them very wrong
answers; relying solely on your own testing is as foolish as relying on a
couple of random blog posts.

To be comfortable (I don't ask for know for a certainty; I'm not sure
that exists outside of faith), I want a claim by the manufacturer and
multiple outside tests in significant journals -- which could be the
blog of somebody I trusted, as well as actual magazines and such. 
Ideally, certainly if it's important, I'd then verify the tests myself.

There aren't enough hours in the day, so I often get by with less.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread Miika Vesti

This is intresting. I thought all Vertex 2 SSDs are good choices for ZIL
but this does not seem to be the case.

According to http://www.legitreviews.com/article/1208/1/ Vertex 2 LE,
Vertex 2 Pro and Vertex 2 EX are SF-1500 based but Vertex 2 (without any
suffix) is SF-1200 based.

Here is the table:
ModelController Max Read Max Write IOPS
Vertex 2 SF-1200270MB/s 260MB/s 9500
Vertex 2 LE SF-1500270MB/s 250MB/s ?
Vertex 2 Pro SF-1500280MB/s 270MB/s 19000
Vertex 2 EX SF-1500280MB/s 270MB/s 25000

21.05.2010 17:09, Attila Mravik kirjoitti: