Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Bob Friesenhahn

On Tue, 10 Nov 2009, Tim Cook wrote:


My personal thought would be that it doesn't really make sense to 
even have it, at least for readzilla.  In theory, you always want 
the SSD to be full, or nearly full, as it's a cache.  The whole 
point of TRIM, from my understanding, is to speed up the drive by 
zeroing out unused blocks so they next time you try to write to 
them, they don't have to be cleared, then written to.  When dealing 
with a cache, there shouldn't (again in theory) be any free blocks, 
a warmed cache should be full of data.


This thought is wrong because SSDs actually have many more blocks that 
they don't admit to in their declared size.  The extreme or 
enterprise units will have more extra blocks.  These extra blocks 
are necessarily in order to replace failing blocks, and to spread the 
write load over many more underlying blocks, and thereby decrease the 
chance of failure.  If a FLASH block is to be overwritten, then the 
device can reassign the old FLASH block to the spare pool, and update 
its tables so that a different FLASH block (from the spare pool) is 
used for the write.


Logzilla is kind of in the same boat, it should constantly be 
filling and emptying as new data comes in.  I'd imagine the TRIM 
would just add unnecessary overhead.  It could in theory help there 
by zeroing out blocks ahead of time before a new batch of writes 
come in if you have a period of little I/O.  My thought is it would 
be far more work than it's worth, but I'll let the coders decide 
that one.


The problem with TRIM is that its goal is to decrease write latency 
at low/medium writing loads, or at high load for a short duration.  It 
does not do anything to increase maximum sustained write performance 
since the maximum write performance then depends on how fast the 
device can erase blocks.  Some server environments will write to the 
device at close to 100% most of the time, and especially for 
relatively slow devices like the X25-E.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Tim Cook
On Wed, Nov 11, 2009 at 11:51 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Tue, 10 Nov 2009, Tim Cook wrote:


 My personal thought would be that it doesn't really make sense to even
 have it, at least for readzilla.  In theory, you always want the SSD to be
 full, or nearly full, as it's a cache.  The whole point of TRIM, from my
 understanding, is to speed up the drive by zeroing out unused blocks so they
 next time you try to write to them, they don't have to be cleared, then
 written to.  When dealing with a cache, there shouldn't (again in theory) be
 any free blocks, a warmed cache should be full of data.


 This thought is wrong because SSDs actually have many more blocks that they
 don't admit to in their declared size.  The extreme or enterprise units
 will have more extra blocks.  These extra blocks are necessarily in order to
 replace failing blocks, and to spread the write load over many more
 underlying blocks, and thereby decrease the chance of failure.  If a FLASH
 block is to be overwritten, then the device can reassign the old FLASH block
 to the spare pool, and update its tables so that a different FLASH block
 (from the spare pool) is used for the write.


I'm well aware of the fact that SSD mfg's put extra blocks into the device
to increase both performance and MTBF.  I'm not sure how that invalidates
what I've said though, or even plays a roll, and you haven't done a very
good job of explaining why you think I'm wrong.  TRIM is simply letting the
device know that a block has been deleted from the OS perspective.  In a
caching scenario, you aren't deleting anything, you're continually
over-writing.  How exactly do you foresee TRIM being useful when the command
wouldn't even be invoked?






  Logzilla is kind of in the same boat, it should constantly be filling and
 emptying as new data comes in.  I'd imagine the TRIM would just add
 unnecessary overhead.  It could in theory help there by zeroing out blocks
 ahead of time before a new batch of writes come in if you have a period of
 little I/O.  My thought is it would be far more work than it's worth, but
 I'll let the coders decide that one.


 The problem with TRIM is that its goal is to decrease write latency at
 low/medium writing loads, or at high load for a short duration.  It does not
 do anything to increase maximum sustained write performance since the
 maximum write performance then depends on how fast the device can erase
 blocks.  Some server environments will write to the device at close to 100%
 most of the time, and especially for relatively slow devices like the X25-E.


Right... you just repeated what I said with different wording.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Nicolas Williams
On Mon, Sep 07, 2009 at 09:58:19AM -0700, Richard Elling wrote:
 I only know of hole punching in the context of networking. ZFS doesn't
 do networking, so the pedantic answer is no.

But a VDEV may be an iSCSI device, thus there can be networking below
ZFS.

For some iSCSI targets (including ZVOL-based ones) a hole punchin
operation can be very useful since it explicitly tells the backend that
some contiguous block of space can be released for allocation to others.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Bob Friesenhahn

On Wed, 11 Nov 2009, Tim Cook wrote:


I'm well aware of the fact that SSD mfg's put extra blocks into the 
device to increase both performance and MTBF.  I'm not sure how that 
invalidates what I've said though, or even plays a roll, and you 
haven't done a very good job of explaining why you think I'm wrong.  
TRIM is simply letting the device know that a block has been deleted 
from the OS perspective.  In a caching scenario, you aren't deleting 
anything, you're continually over-writing.  How exactly do you 
foresee TRIM being useful when the command wouldn't even be invoked?


The act of over-writing requires erasing.  If the cache is going to 
expire seldom-used data, it could potentially use TRIM to start 
erasing pages while the new data is being retrieved from primary 
storage.


Regardless, it seems that smarter FLASH storage device design 
eliminates most of the value offered by TRIM.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-10 Thread George Janczuk
I've been following the use of SSD with ZFS and HSPs for some time now, and I 
am working (in an architectural capacity) with one of our IT guys to set up our 
own ZFS HSP (using a J4200 connected to an X2270).

The best practice seems to be to use an Intel X25-M for the L2ARC (Readzilla) 
and an Intel X25-E for the ZIL/SLOG (Logzilla).

However, whilst being a BIG thing in the Windows 7 world - I have pretty much 
heard nothing about Intel's G2 devices and updated firmware when Intel's SSDs 
are used in a ZFS HSP. In particular, does ZFS use or support the TRIM command? 
Is it even relevant or useful in a hierarchical (vs. primary) storage context?

Any comment would be appreciated. Some comment from the Fishworks guys in 
particular would be great!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-10 Thread Tim Cook
On Tue, Nov 10, 2009 at 6:51 PM, George Janczuk 
geor...@objectconsulting.com.au wrote:

 I've been following the use of SSD with ZFS and HSPs for some time now, and
 I am working (in an architectural capacity) with one of our IT guys to set
 up our own ZFS HSP (using a J4200 connected to an X2270).

 The best practice seems to be to use an Intel X25-M for the L2ARC
 (Readzilla) and an Intel X25-E for the ZIL/SLOG (Logzilla).

 However, whilst being a BIG thing in the Windows 7 world - I have pretty
 much heard nothing about Intel's G2 devices and updated firmware when
 Intel's SSDs are used in a ZFS HSP. In particular, does ZFS use or support
 the TRIM command? Is it even relevant or useful in a hierarchical (vs.
 primary) storage context?

 Any comment would be appreciated. Some comment from the Fishworks guys in
 particular would be great!



My personal thought would be that it doesn't really make sense to even have
it, at least for readzilla.  In theory, you always want the SSD to be full,
or nearly full, as it's a cache.  The whole point of TRIM, from my
understanding, is to speed up the drive by zeroing out unused blocks so they
next time you try to write to them, they don't have to be cleared, then
written to.  When dealing with a cache, there shouldn't (again in theory) be
any free blocks, a warmed cache should be full of data.

Logzilla is kind of in the same boat, it should constantly be filling and
emptying as new data comes in.  I'd imagine the TRIM would just add
unnecessary overhead.  It could in theory help there by zeroing out blocks
ahead of time before a new batch of writes come in if you have a period of
little I/O.  My thought is it would be far more work than it's worth, but
I'll let the coders decide that one.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-08 Thread Chris Csanady
2009/9/7 Ritesh Raj Sarraf r...@researchut.com:
 The Discard/Trim command is also available as part of the SCSI standard now.

 Now, if you look from a SAN perspective, you will need a little of both.
 Filesystems will need to be able to deallocate blocks and then the same 
 should be triggered as a SCSI Trim to the Storage Controller.
 For a virtualized environment, the filesystem should be able to punch holes 
 into virt image files.

 F_FREESP is only on XFS to my knowledge.

I found F_FREESP while looking through the OpenSolaris source, and it
is supported on all filesystems which implement VOP_SPACE.  (I was
initially investigating what it would take to transform writes of
zeroed blocks into block frees on ZFS.  Although it would not appear
to be too difficult, I'm not sure if it would be worth complicating
the code paths.)

 So how does ZFS tackle the above 2 problems ?

At least for file backed filesystems, ZFS already does its part.  It
is the responsibility of the hypervisor to execute the mentioned
fcntl(), wether it is triggered by a TRIM or whatever else.  ZFS does
not use TRIM itself, though it is not recommended to use it on top of
files anyway, nor is there a need for virtualization purposes.

It does appear that the ATA TRIM command should be used with great
care though, or avoided all together.  Not only does it need to wait
for the entire queue to empty, it can cause a delay of ~100ms if you
execute them without enough elapsed time.  (See the thread linked from
the article I mentioned.)

As far as I can tell, Solaris is missing the equivalent of a
DKIOCDISCARD ioctl().  Something like that should be implemented to
allow recovery of space on zvols and iSCSI backing stores. (Though,
the latter would require implementing the SCSI TRIM support as well,
if I understand correctly.)

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Richard Elling

On Sep 7, 2009, at 3:49 AM, Sriram Narayanan wrote:

Folks:

I gave a presentation last weekend on how one could use Zones, ZFS and
Crossbow to recreate deployments scenarios on one's computer (to the
extent possible).

I've received the following question, and would like to ask the ZFS
Community for answers.

-- Sriram


-- Forwarded message --
From: Ritesh Raj Sarraf r...@researchut.com
Date: Mon, Sep 7, 2009 at 2:20 PM
Subject: [ilugb] Does ZFS support Hole Punching/Discard
To: ilug-bengal...@googlegroups.com


Thanks to Sriram for the nice walk through on Beyond localhost.

There was one item I forgot to ask. Does ZFS support Hole Punching ?


I only know of hole punching in the context of networking. ZFS doesn't
do networking, so the pedantic answer is no.

After pushing off to BP is when I remembered of this issue. Here's a  
link about

this issue and its state in Linux.
http://lwn.net/Articles/293658/


This is an article about the new TRIM command. It would be important for
file systems which write their metadata to the same physical location  
or use
a MRU replacement algorithm. But ZFS is copy-on-write, so the metadata  
is
allocated from free space and ZFS is transactional, not directly MRU.  
It is not

expected that this problem would affect ZFS file systems for an extended
period of time, which you can further extend by using snapshots.

Interesting sidebar: you can measure how many times a block is  
rewritten in
a Solaris system, but the data collection and analysis is a rather  
large task.
I don't know of anyone patient enough to do it long enough to get near  
the

endurance of an SSD.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Bob Friesenhahn

On Mon, 7 Sep 2009, Richard Elling wrote:


This is an article about the new TRIM command. It would be important for
file systems which write their metadata to the same physical location or use
a MRU replacement algorithm. But ZFS is copy-on-write, so the metadata is
allocated from free space and ZFS is transactional, not directly MRU. It is


The purpose of the TRIM command is to allow the FLASH device to 
reclaim and erase storage at its leisure so that the writer does not 
need to wait for erasure once the device becomes full.  Otherwise the 
FLASH device does not know when an area stops being used.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Richard Elling

On Sep 7, 2009, at 10:20 AM, Bob Friesenhahn wrote:


On Mon, 7 Sep 2009, Richard Elling wrote:


This is an article about the new TRIM command. It would be  
important for
file systems which write their metadata to the same physical  
location or use
a MRU replacement algorithm. But ZFS is copy-on-write, so the  
metadata is
allocated from free space and ZFS is transactional, not directly  
MRU. It is


The purpose of the TRIM command is to allow the FLASH device to  
reclaim and erase storage at its leisure so that the writer does not  
need to wait for erasure once the device becomes full.  Otherwise  
the FLASH device does not know when an area stops being used.


Yep, it is there to try and solve the problem of rewrites in a small  
area,
smaller than the bulk erase size.  While it would be trivial to  
traverse the

spacemap and TRIM the free blocks, it might not improve performance
for COW file systems. My crystal ball says smarter flash controllers  
or a

form of managed flash will win and obviate the need for TRIM entirely.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Bob Friesenhahn

On Mon, 7 Sep 2009, Richard Elling wrote:


Yep, it is there to try and solve the problem of rewrites in a small area,
smaller than the bulk erase size.  While it would be trivial to traverse the
spacemap and TRIM the free blocks, it might not improve performance
for COW file systems. My crystal ball says smarter flash controllers or a
form of managed flash will win and obviate the need for TRIM entirely.


Without TRIM there is no way for the FLASH device to know that a 
region of data is free and can be reclaimed.  It is pretty difficult 
to be intelligent without that.


Regardless, TRIM only improves the perception of performance under 
relatively light loads where the device is able to erase faster than 
the amount of writes.  This is important for PCs, where perception is 
everything.  It does not improve sustained maximum write throughput.


As far as your crystal ball goes, people here might be interested in 
this article about an apparent Sun product which had its documentation 
released to the Sun web site a bit too early:


  http://www.theregister.co.uk/2009/09/03/sun_flash_array/

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Richard Elling

On Sep 7, 2009, at 11:48 AM, Bob Friesenhahn wrote:

On Mon, 7 Sep 2009, Richard Elling wrote:


Yep, it is there to try and solve the problem of rewrites in a  
small area,
smaller than the bulk erase size.  While it would be trivial to  
traverse the

spacemap and TRIM the free blocks, it might not improve performance
for COW file systems. My crystal ball says smarter flash  
controllers or a
form of managed flash will win and obviate the need for TRIM  
entirely.


Without TRIM there is no way for the FLASH device to know that a  
region of data is free and can be reclaimed.  It is pretty difficult  
to be intelligent without that.


Yes, it is a trade-off for the page size mismatch. But you could  
manage this
by reading the page, erasing, and writing... as long as you have some  
sort
of nonvolatility arrangement -- hence a managed solution. TRIM just  
tries to
eliminate the need for a nonvolatile buffer by pushing that decision  
to the OS.
The interesting question is what happens when the important page is  
never

free? I presume you just get stuck being slow.

Regardless, TRIM only improves the perception of performance under  
relatively light loads where the device is able to erase faster than  
the amount of writes.  This is important for PCs, where perception  
is everything.  It does not improve sustained maximum write  
throughput.


As far as your crystal ball goes, people here might be interested in  
this article about an apparent Sun product which had its  
documentation released to the Sun web site a bit too early:


 http://www.theregister.co.uk/2009/09/03/sun_flash_array/


The Flash Modules are well known and used in several products already.
http://www.sun.com/storage/flash/module.jsp

The rest is just packaging... (another famous last words :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Chris Csanady
2009/9/7 Richard Elling richard.ell...@gmail.com:
 On Sep 7, 2009, at 10:20 AM, Bob Friesenhahn wrote:

 The purpose of the TRIM command is to allow the FLASH device to reclaim
 and erase storage at its leisure so that the writer does not need to wait
 for erasure once the device becomes full.  Otherwise the FLASH device does
 not know when an area stops being used.

 Yep, it is there to try and solve the problem of rewrites in a small area,
 smaller than the bulk erase size.  While it would be trivial to traverse the
 spacemap and TRIM the free blocks, it might not improve performance
 for COW file systems. My crystal ball says smarter flash controllers or a
 form of managed flash will win and obviate the need for TRIM entirely.
  -- richard

I agree with this sentiment, although I still look forward to it being obviated
by a better memory technology instead, like PRAM.  In any case, the ATA
TRIM command may not be so useful after all, as it can't be queued:

http://lwn.net/Articles/347511/

As an aside, after a bit of digging, I came across fcntl(F_FREESP).
This will at least allow you to put the sparse back into sparse files if you
so desire.  Unfortunately, I don't see any way to do this for a zvol.

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-09-07 Thread Ritesh Raj Sarraf
The Discard/Trim command is also available as part of the SCSI standard now.

Now, if you look from a SAN perspective, you will need a little of both.
Filesystems will need to be able to deallocate blocks and then the same should 
be triggered as a SCSI Trim to the Storage Controller.
For a virtualized environment, the filesystem should be able to punch holes 
into virt image files.

F_FREESP is only on XFS to my knowledge.

So how does ZFS tackle the above 2 problems ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss