Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Bob Friesenhahn

On Wed, 5 May 2010, Edward Ned Harvey wrote:


In the L2ARC (cache) there is no ability to mirror, because cache device
removal has always been supported.  You can't mirror a cache device, because
you don't need it.


How do you know that I don't need it?  The ability seems useful to me.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Tomas Ögren
On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:

 On Wed, 5 May 2010, Edward Ned Harvey wrote:

 In the L2ARC (cache) there is no ability to mirror, because cache device
 removal has always been supported.  You can't mirror a cache device, because
 you don't need it.

 How do you know that I don't need it?  The ability seems useful to me.

The gain is quite minimal.. If the first device fails (which doesn't
happen too often I hope), then it will be read from the normal pool once
and then stored in ARC/L2ARC again. It just behaves like a cache miss
for that specific block... If this happens often enough to become a
performance problem, then you should throw away that L2ARC device
because it's broken beyond usability.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Robert Milkowski

On 06/05/2010 15:31, Tomas Ögren wrote:

On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:

   

On Wed, 5 May 2010, Edward Ned Harvey wrote:
 

In the L2ARC (cache) there is no ability to mirror, because cache device
removal has always been supported.  You can't mirror a cache device, because
you don't need it.
   

How do you know that I don't need it?  The ability seems useful to me.
 

The gain is quite minimal.. If the first device fails (which doesn't
happen too often I hope), then it will be read from the normal pool once
and then stored in ARC/L2ARC again. It just behaves like a cache miss
for that specific block... If this happens often enough to become a
performance problem, then you should throw away that L2ARC device
because it's broken beyond usability.

   


Well if a L2ARC device fails there might be an unacceptable drop in 
delivered performance.
If it were mirrored than a drop usually would be much smaller or there 
could be no drop if a mirror had an option to read only from one side.


Being able to mirror L2ARC might especially be useful once a persistent 
L2ARC is implemented as after a node restart or a resource failover in a 
cluster L2ARC will be kept warm. Then the only thing which might affect 
L2 performance considerably would be a L2ARC device failure...



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Brandon High
On Wed, May 5, 2010 at 8:47 PM, Michael Sullivan
michael.p.sulli...@mac.com wrote:
 While it explains how to implement these, there is no information regarding 
 failure of a device in a striped L2ARC set of SSD's.  I have been hard 
 pressed to find this information anywhere, short of testing it myself, but I 
 don't have the necessary hardware in a lab to test correctly.  If someone has 
 pointers to references, could you please provide them to chapter and verse, 
 rather than the advice to Go read the manual.

Yes, but the answer is in the man page. So reading it is a good idea:

If a read error is encountered on a cache device, that read I/O is
reissued to the original storage pool  device,  which  might be part
of a mirrored or raidz configuration.

 I'm running 2009.11 which is the latest OpenSolaris.  I should have made that 
 clear, and that I don't intend this to be on Solaris 10 system, and am 
 waiting for the next production build anyway.  As you say, it does not exist 
 in 2009.06, this is not the latest production Opensolaris which is 2009.11, 
 and I'd be more interested in its behavior than an older release.

The latest is b134, which contains many, many fixes over 2009.11,
though it's a dev release.

 From the information I've been reading about the loss of a ZIL device, it 
 will be relocated to the storage pool it is assigned to.  I'm not sure which 
 version this is in, but it would be nice if someone could provide the release 
 number it is included in (and actually works), it would be nice.  Also, will 
 this functionality be included in the mythical 2010.03 release?

It's went into somewhere around b118 I think, so it will be in the
next scheduled release.

 Also, I'd be interested to know what features along these lines will be 
 available in 2010.03 if it ever sees the light of day.

Look at the latest dev release. b134 was originally slated to be
2010.03, so the feature set of the final release should be very close.

 So what you are saying is that if a single device fails in a striped L2ARC 
 VDEV, then the entire VDEV is taken offline and the fallback is to simply use 
 the regular ARC and fetch from the pool whenever there is a cache miss.

The strict interpretation of the documentation is that the read is
re-issued. My understanding is that the block that failed to be read
would then be read from the original pool.

 Or, does what you are saying here mean that if I have a 4 SSD's in a stripe 
 for my L2ARC, and one device fails, the L2ARC will be reconfigured 
 dynamically using the remaining SSD's for L2ARC.

Auto-healing in zfs would resilver the block that failed to be read,
either onto the same device or another cache device in the pool,
exactly as if a read failed on a normal pool device. It wouldn't
reconfigure the cache devices, but each failed read would cause the
blocks to be reallocated to a functioning device which has the same
effect in the end.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Michael Sullivan
Everyone,

Thanks for the help.  I really appreciate it.

Well, I actually walked through the source code with an associate today and we 
found out how things work by looking at the code.

It appears that L2ARC is just assigned in round-robin fashion.  If a device 
goes offline, then it goes to the next and marks that one as offline.  The 
failure to retrieve the requested object is treated like a cache miss and 
everything goes along its merry way, as far as we can tell.

I would have hoped it to be different in some way.  Like if the L2ARC was 
striped for performance reasons, that would be really cool and using that 
device as an extension of the VM model it is modeled after.  Which would mean 
using the L2ARC as an extension of the virtual address space and striping it to 
make it more efficient.  Way cool.  If it took out the bad device and 
reconfigured the stripe device, that would be even way cooler.  Replacing it 
with a hot spare more cool too.  However, it appears from the source code that 
the L2ARC is just a (sort of) jumbled collection of ZFS objects.  Yes, it gives 
you better performance if you have it, but it doesn't really use it in a way 
you might expect something as cool as ZFS does.

I understand why it is read only, and it invalidates it's cache when a write 
occurs, to be expected for any object written.

If an object is not there because of a failure or because it has been removed 
from the cache, it is treated as a cache miss, all well and good - go fetch 
from the pool.

I also understand why the ZIL is important and that it should be mirrored if it 
is to be on a separate device.  Though I'm wondering how it is handled 
internally when there is a failure of one of it's default devices, but then 
again, it's on a regular pool and should be redundant enough, only just some 
degradation in speed.

Breaking these devices out from their default locations is great for 
performance, and I understand.  I just wish the knowledge of how they work and 
their internal mechanisms were not so much of a black box.  Maybe that is due 
to the speed at which ZFS is progressing and the features it adds with each 
subsequent release.

Overall, I am very impressed with ZFS, its flexibility and even more so, it's 
breaking all the rules about how storage should be managed and I really like 
it.  I have yet to see anything to come close in its approach to disk data 
management.  Let's just hope it keeps moving forward, it is truly a unique way 
to view disk storage.

Anyway, sorry for the ramble, but to everyone, thanks again for the answers.

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

On 7 May 2010, at 00:00 , Robert Milkowski wrote:

 On 06/05/2010 15:31, Tomas Ögren wrote:
 On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:
 
   
 On Wed, 5 May 2010, Edward Ned Harvey wrote:
 
 In the L2ARC (cache) there is no ability to mirror, because cache device
 removal has always been supported.  You can't mirror a cache device, 
 because
 you don't need it.
   
 How do you know that I don't need it?  The ability seems useful to me.
 
 The gain is quite minimal.. If the first device fails (which doesn't
 happen too often I hope), then it will be read from the normal pool once
 and then stored in ARC/L2ARC again. It just behaves like a cache miss
 for that specific block... If this happens often enough to become a
 performance problem, then you should throw away that L2ARC device
 because it's broken beyond usability.
 
   
 
 Well if a L2ARC device fails there might be an unacceptable drop in delivered 
 performance.
 If it were mirrored than a drop usually would be much smaller or there could 
 be no drop if a mirror had an option to read only from one side.
 
 Being able to mirror L2ARC might especially be useful once a persistent L2ARC 
 is implemented as after a node restart or a resource failover in a cluster 
 L2ARC will be kept warm. Then the only thing which might affect L2 
 performance considerably would be a L2ARC device failure...
 
 
 -- 
 Robert Milkowski
 http://milek.blogspot.com
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Marc Nicholas
Hi Michael,

What makes you think striping the SSDs would be faster than round-robin?

-marc

On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan michael.p.sulli...@mac.com
 wrote:

 Everyone,

 Thanks for the help.  I really appreciate it.

 Well, I actually walked through the source code with an associate today and
 we found out how things work by looking at the code.

 It appears that L2ARC is just assigned in round-robin fashion.  If a device
 goes offline, then it goes to the next and marks that one as offline.  The
 failure to retrieve the requested object is treated like a cache miss and
 everything goes along its merry way, as far as we can tell.

 I would have hoped it to be different in some way.  Like if the L2ARC was
 striped for performance reasons, that would be really cool and using that
 device as an extension of the VM model it is modeled after.  Which would
 mean using the L2ARC as an extension of the virtual address space and
 striping it to make it more efficient.  Way cool.  If it took out the bad
 device and reconfigured the stripe device, that would be even way cooler.
  Replacing it with a hot spare more cool too.  However, it appears from the
 source code that the L2ARC is just a (sort of) jumbled collection of ZFS
 objects.  Yes, it gives you better performance if you have it, but it
 doesn't really use it in a way you might expect something as cool as ZFS
 does.

 I understand why it is read only, and it invalidates it's cache when a
 write occurs, to be expected for any object written.

 If an object is not there because of a failure or because it has been
 removed from the cache, it is treated as a cache miss, all well and good -
 go fetch from the pool.

 I also understand why the ZIL is important and that it should be mirrored
 if it is to be on a separate device.  Though I'm wondering how it is handled
 internally when there is a failure of one of it's default devices, but then
 again, it's on a regular pool and should be redundant enough, only just some
 degradation in speed.

 Breaking these devices out from their default locations is great for
 performance, and I understand.  I just wish the knowledge of how they work
 and their internal mechanisms were not so much of a black box.  Maybe that
 is due to the speed at which ZFS is progressing and the features it adds
 with each subsequent release.

 Overall, I am very impressed with ZFS, its flexibility and even more so,
 it's breaking all the rules about how storage should be managed and I really
 like it.  I have yet to see anything to come close in its approach to disk
 data management.  Let's just hope it keeps moving forward, it is truly a
 unique way to view disk storage.

 Anyway, sorry for the ramble, but to everyone, thanks again for the
 answers.

 Mike

 ---
 Michael Sullivan
 michael.p.sulli...@me.com
 http://www.kamiogi.net/
 Japan Mobile: +81-80-3202-2599
 US Phone: +1-561-283-2034

 On 7 May 2010, at 00:00 , Robert Milkowski wrote:

  On 06/05/2010 15:31, Tomas Ögren wrote:
  On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:
 
 
  On Wed, 5 May 2010, Edward Ned Harvey wrote:
 
  In the L2ARC (cache) there is no ability to mirror, because cache
 device
  removal has always been supported.  You can't mirror a cache device,
 because
  you don't need it.
 
  How do you know that I don't need it?  The ability seems useful to me.
 
  The gain is quite minimal.. If the first device fails (which doesn't
  happen too often I hope), then it will be read from the normal pool once
  and then stored in ARC/L2ARC again. It just behaves like a cache miss
  for that specific block... If this happens often enough to become a
  performance problem, then you should throw away that L2ARC device
  because it's broken beyond usability.
 
 
 
  Well if a L2ARC device fails there might be an unacceptable drop in
 delivered performance.
  If it were mirrored than a drop usually would be much smaller or there
 could be no drop if a mirror had an option to read only from one side.
 
  Being able to mirror L2ARC might especially be useful once a persistent
 L2ARC is implemented as after a node restart or a resource failover in a
 cluster L2ARC will be kept warm. Then the only thing which might affect L2
 performance considerably would be a L2ARC device failure...
 
 
  --
  Robert Milkowski
  http://milek.blogspot.com
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Michael Sullivan
Hi Marc,

Well, if you are striping over multiple devices the you I/O should be spread 
over the devices and you should be reading them all simultaneously rather than 
just accessing a single device.  Traditional striping would give 1/n 
performance improvement rather than 1/1 where n is the number of disks the 
stripe is spread across.

The round-robin access I am referring to, is the way the L2ARC vdevs appear to 
be accessed.  So, any given object will be taken from a single device rather 
than from several devices simultaneously, thereby increasing the I/O 
throughput.  So, theoretically, a stripe spread over 4 disks would give 4 times 
the performance as opposed to reading from a single disk.  This also assumes 
the controller can handle multiple I/O as well or that you are striped over 
different disk controllers for each disk in the stripe.

SSD's are fast, but if I can read a block from more devices simultaneously, it 
will cut the latency of the overall read.

On 7 May 2010, at 02:57 , Marc Nicholas wrote:

 Hi Michael,
 
 What makes you think striping the SSDs would be faster than round-robin?
 
 -marc
 
 On Thu, May 6, 2010 at 1:09 PM, Michael Sullivan michael.p.sulli...@mac.com 
 wrote:
 Everyone,
 
 Thanks for the help.  I really appreciate it.
 
 Well, I actually walked through the source code with an associate today and 
 we found out how things work by looking at the code.
 
 It appears that L2ARC is just assigned in round-robin fashion.  If a device 
 goes offline, then it goes to the next and marks that one as offline.  The 
 failure to retrieve the requested object is treated like a cache miss and 
 everything goes along its merry way, as far as we can tell.
 
 I would have hoped it to be different in some way.  Like if the L2ARC was 
 striped for performance reasons, that would be really cool and using that 
 device as an extension of the VM model it is modeled after.  Which would mean 
 using the L2ARC as an extension of the virtual address space and striping it 
 to make it more efficient.  Way cool.  If it took out the bad device and 
 reconfigured the stripe device, that would be even way cooler.  Replacing it 
 with a hot spare more cool too.  However, it appears from the source code 
 that the L2ARC is just a (sort of) jumbled collection of ZFS objects.  Yes, 
 it gives you better performance if you have it, but it doesn't really use it 
 in a way you might expect something as cool as ZFS does.
 
 I understand why it is read only, and it invalidates it's cache when a write 
 occurs, to be expected for any object written.
 
 If an object is not there because of a failure or because it has been removed 
 from the cache, it is treated as a cache miss, all well and good - go fetch 
 from the pool.
 
 I also understand why the ZIL is important and that it should be mirrored if 
 it is to be on a separate device.  Though I'm wondering how it is handled 
 internally when there is a failure of one of it's default devices, but then 
 again, it's on a regular pool and should be redundant enough, only just some 
 degradation in speed.
 
 Breaking these devices out from their default locations is great for 
 performance, and I understand.  I just wish the knowledge of how they work 
 and their internal mechanisms were not so much of a black box.  Maybe that is 
 due to the speed at which ZFS is progressing and the features it adds with 
 each subsequent release.
 
 Overall, I am very impressed with ZFS, its flexibility and even more so, it's 
 breaking all the rules about how storage should be managed and I really like 
 it.  I have yet to see anything to come close in its approach to disk data 
 management.  Let's just hope it keeps moving forward, it is truly a unique 
 way to view disk storage.
 
 Anyway, sorry for the ramble, but to everyone, thanks again for the answers.
 
 Mike
 
 ---
 Michael Sullivan
 michael.p.sulli...@me.com
 http://www.kamiogi.net/
 Japan Mobile: +81-80-3202-2599
 US Phone: +1-561-283-2034
 
 On 7 May 2010, at 00:00 , Robert Milkowski wrote:
 
  On 06/05/2010 15:31, Tomas Ögren wrote:
  On 06 May, 2010 - Bob Friesenhahn sent me these 0,6K bytes:
 
 
  On Wed, 5 May 2010, Edward Ned Harvey wrote:
 
  In the L2ARC (cache) there is no ability to mirror, because cache device
  removal has always been supported.  You can't mirror a cache device, 
  because
  you don't need it.
 
  How do you know that I don't need it?  The ability seems useful to me.
 
  The gain is quite minimal.. If the first device fails (which doesn't
  happen too often I hope), then it will be read from the normal pool once
  and then stored in ARC/L2ARC again. It just behaves like a cache miss
  for that specific block... If this happens often enough to become a
  performance problem, then you should throw away that L2ARC device
  because it's broken beyond usability.
 
 
 
  Well if a L2ARC device fails there might be an unacceptable drop in 
  delivered performance.
  If it were mirrored than a 

Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Giovanni Tirloni
On Thu, May 6, 2010 at 1:18 AM, Edward Ned Harvey solar...@nedharvey.comwrote:

  From the information I've been reading about the loss of a ZIL device,
 What the heck?  Didn't I just answer that question?
 I know I said this is answered in ZFS Best Practices Guide.

 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa
 rate_Log_Devices

 Prior to pool version 19, if you have an unmirrored log device that fails,
 your whole pool is permanently lost.
 Prior to pool version 19, mirroring the log device is highly recommended.
 In pool version 19 or greater, if an unmirrored log device fails during
 operation, the system reverts to the default behavior, using blocks from
 the
 main storage pool for the ZIL, just as if the log device had been
 gracefully
 removed via the zpool remove command.



This week I've had a bad experience replacing a SSD device that was in a
hardware RAID-1 volume. While rebuilding, the source SSD failed and the
volume was brought off-line by the controller.

The server kept working just fine but seemed to have switched from the
30-second interval to all writes going directly to the disks. I could
confirm this with iostat.

We've had some compatibility issues between LSI MegaRAID cards and a few
MTRON SSDs and I didn't believe the SSD had really died. So I brought it
off-line and back on-line and everything started to work.

ZFS showed the log device c3t1d0 as removed. After the RAID-1 volume was
back I replaced that device with itself and a resilver process started. I
don't know what it was resilvering against but it took 2h10min. I should
have probably tried a zpool offline/online too.

So I think if a log device fails AND you've to import your pool later
(server rebooted, etc)... then you lost your pool (prior to version 19).
Right ?

This happened on OpenSolaris 2009.6.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Bob Friesenhahn

On Fri, 7 May 2010, Michael Sullivan wrote:


Well, if you are striping over multiple devices the you I/O should be spread 
over the devices and you
should be reading them all simultaneously rather than just accessing a single 
device.  Traditional
striping would give 1/n performance improvement rather than 1/1 where n is the 
number of disks the
stripe is spread across.


This is true.  Use of mirroring also improves performance since a 
mirror multiplies the read performance for the same data.  The value 
of the various approaches likely depends on the total size of the 
working set and the number of simultaneous requests.


Currently available L2ARC SSD devices are very good with a high number 
of I/Os, but they are quite a bottleneck for bulk reads as compared 
to L1ARC in RAM .


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Brandon High
On Thu, May 6, 2010 at 11:08 AM, Michael Sullivan
michael.p.sulli...@mac.com wrote:
 The round-robin access I am referring to, is the way the L2ARC vdevs appear
 to be accessed.  So, any given object will be taken from a single device
 rather than from several devices simultaneously, thereby increasing the I/O
 throughput.  So, theoretically, a stripe spread over 4 disks would give 4

I believe that the L2ARC behaves the same as a pool with multiple
top-level vdevs. It's not typical striping, where every write goes to
all devices. Writes may go to only one device, or may avoid a device
entirely while using several other. The decision about where to place
data is done at write time, so no fixed width stripes are created at
allocation time.

In your example, if the file had at least four blocks there is a
likelihood that it will be spread across the four top-level vdevs.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Robert Milkowski

On 06/05/2010 19:08, Michael Sullivan wrote:

Hi Marc,

Well, if you are striping over multiple devices the you I/O should be 
spread over the devices and you should be reading them all 
simultaneously rather than just accessing a single device. 
 Traditional striping would give 1/n performance improvement rather 
than 1/1 where n is the number of disks the stripe is spread across.


The round-robin access I am referring to, is the way the L2ARC vdevs 
appear to be accessed.  So, any given object will be taken from a 
single device rather than from several devices simultaneously, thereby 
increasing the I/O throughput.  So, theoretically, a stripe spread 
over 4 disks would give 4 times the performance as opposed to reading 
from a single disk.  This also assumes the controller can handle 
multiple I/O as well or that you are striped over different disk 
controllers for each disk in the stripe.


SSD's are fast, but if I can read a block from more devices 
simultaneously, it will cut the latency of the overall read.




Keep in mind that the largest block is currently 128KB and you always 
need to read an entire block.
Splitting a block across several L2ARC devices would probably decrease 
performance and would invalidate all blocks if only a single l2arc 
device would die. Additionally having each block only on one l2arc 
device allows to read from all of l2arc devices at the same time.


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread Richard Elling
On May 6, 2010, at 11:08 AM, Michael Sullivan wrote:
 Well, if you are striping over multiple devices the you I/O should be spread 
 over the devices and you should be reading them all simultaneously rather 
 than just accessing a single device.  Traditional striping would give 1/n 
 performance improvement rather than 1/1 where n is the number of disks the 
 stripe is spread across.

In theory, for bandwidth, yes, striping does improve by N.  For latency, 
striping
adds little, and in some cases is worse.  ZFS dynamic stripe tries to balance 
this 
tradeoff towards latency for HDDs by grouping blocks so that only one 
seek+rotate
is required. More below...

 The round-robin access I am referring to, is the way the L2ARC vdevs appear 
 to be accessed.  

RAID-0 striping is also round-robin.

 So, any given object will be taken from a single device rather than from 
 several devices simultaneously, thereby increasing the I/O throughput.  So, 
 theoretically, a stripe spread over 4 disks would give 4 times the 
 performance as opposed to reading from a single disk.  This also assumes the 
 controller can handle multiple I/O as well or that you are striped over 
 different disk controllers for each disk in the stripe.

All modern controllers handled multiple, concurrent I/O.

 SSD's are fast, but if I can read a block from more devices simultaneously, 
 it will cut the latency of the overall read.

OTOH, if you have to wait for N HDDs to seek+rotate, then the latency is that 
of the
slowest disk.  The classic analogy is: nine women cannot produce a baby in one 
month.
The difference is:

ZFS dynamic stripe:
latency per I/O = fixed latency of one vdev + (size / min(media 
bandwidth, path bandwidth))

RAID-0:
latency per I/O = max(fixed latency of devices) + (size / min((media 
bandwidth / N), path bandwidth))

For HDDs, the media bandwidth is around 100 MB/sec for many devices, far less 
than the
path bandwidth on a modern system.  For many SSDs, the media bandwidth is close 
to the 
path bandwidth. Newer SSDs have media bandwidth  3Gbps, but 6Gbps SAS is 
becoming
readily available. In other words, if the path bandwidth isn't  a problem, and 
the media 
bandwidth of an SSD is 3x that of a HDD, then the bandwidth requirement that 
dictated 
RAID-0 for HDDs is reduced by a factor of 3. Yet another reason why HDDs lost 
the 
performance battle.

This is also why not many folks choose to use HDDs for L2ARC -- the latency 
gain over
the pool is marginal for HDDs.

This is also one reason why there is no concatenation in ZFS.
 -- richard

-- 
ZFS storage and performance consulting at http://www.RichardElling.com










___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-06 Thread BM
On Fri, May 7, 2010 at 4:57 AM, Brandon High bh...@freaks.com wrote:
 I believe that the L2ARC behaves the same as a pool with multiple
 top-level vdevs. It's not typical striping, where every write goes to
 all devices. Writes may go to only one device, or may avoid a device
 entirely while using several other. The decision about where to place
 data is done at write time, so no fixed width stripes are created at
 allocation time.

That's nothing to believe or not to believe much.

Each write access to the L2ARC devices are grouped and sent
in-sequence. Queue is used to sort them out like to larger or fewer
chunks to write. L2ARC behaves in a rotor fashion, simply sweeping
writes through available space. That's all the magic, nothing very
special...

Answering to Mike's main question, behavior on failure is quite
simple: once some L2ARC device[s] gone, the others will continue to
function. Impact: a little performance losing, some time needs to warm
them up and sort things out. No serious consequences or data loss
here.

Take care, folks.

-- 
Kind regards, BM

Things, that are stupid at the beginning, rarely ends up wisely.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-05 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Michael Sullivan
 
 I have a question I cannot seem to find an answer to.

Google for ZFS Best Practices Guide  (on solarisinternals).  I know this
answer is there.


 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be
 relocated back to the spool.  I'd probably have it mirrored anyway,
 just in case.  However you cannot mirror the L2ARC, so...

Careful.  The log device removal feature exists, and is present in the
developer builds of opensolaris today.  However, it's not included in
opensolars 2009.06, and it's not included in the latest and greatest solaris
10 yet.  Which means, right now, if you lose an unmirrored ZIL (log) device,
your whole pool is lost, unless you're running a developer build of
opensolaris.


 What I want to know, is what happens if one of those SSD's goes bad?
 What happens to the L2ARC?  Is it just taken offline, or will it
 continue to perform even with one drive missing?

In the L2ARC (cache) there is no ability to mirror, because cache device
removal has always been supported.  You can't mirror a cache device, because
you don't need it.

If one of the cache devices fails, no harm is done.  That device goes
offline.  The rest stay online.


 Sorry, if these questions have been asked before, but I cannot seem to
 find an answer.

Since you said this twice, I'll answer it twice.  ;-)
I think the best advice regarding cache/log device mirroring is in the ZFS
Best Practices Guide.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-05 Thread Michael Sullivan
Hi Ed,

Thanks for your answers.  Seem to make sense, sort of…

On 6 May 2010, at 12:21 , Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Michael Sullivan
 
 I have a question I cannot seem to find an answer to.
 
 Google for ZFS Best Practices Guide  (on solarisinternals).  I know this
 answer is there.
 

My Google is very strong and I have the Best Practices Guide committed to 
bookmark as well as most of it to memory.

While it explains how to implement these, there is no information regarding 
failure of a device in a striped L2ARC set of SSD's.  I have been hard pressed 
to find this information anywhere, short of testing it myself, but I don't have 
the necessary hardware in a lab to test correctly.  If someone has pointers to 
references, could you please provide them to chapter and verse, rather than the 
advice to Go read the manual.

 
 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be
 relocated back to the spool.  I'd probably have it mirrored anyway,
 just in case.  However you cannot mirror the L2ARC, so...
 
 Careful.  The log device removal feature exists, and is present in the
 developer builds of opensolaris today.  However, it's not included in
 opensolars 2009.06, and it's not included in the latest and greatest solaris
 10 yet.  Which means, right now, if you lose an unmirrored ZIL (log) device,
 your whole pool is lost, unless you're running a developer build of
 opensolaris.
 

I'm running 2009.11 which is the latest OpenSolaris.  I should have made that 
clear, and that I don't intend this to be on Solaris 10 system, and am waiting 
for the next production build anyway.  As you say, it does not exist in 
2009.06, this is not the latest production Opensolaris which is 2009.11, and 
I'd be more interested in its behavior than an older release.

I am also well aware of the effect of losing a ZIL device will cause loss of 
the entire pool.  Which is why I would never have a ZIL device unless it was 
mirrored and on different controllers.

From the information I've been reading about the loss of a ZIL device, it will 
be relocated to the storage pool it is assigned to.  I'm not sure which 
version this is in, but it would be nice if someone could provide the release 
number it is included in (and actually works), it would be nice.  Also, will 
this functionality be included in the mythical 2010.03 release?

Also, I'd be interested to know what features along these lines will be 
available in 2010.03 if it ever sees the light of day.

 
 What I want to know, is what happens if one of those SSD's goes bad?
 What happens to the L2ARC?  Is it just taken offline, or will it
 continue to perform even with one drive missing?
 
 In the L2ARC (cache) there is no ability to mirror, because cache device
 removal has always been supported.  You can't mirror a cache device, because
 you don't need it.
 
 If one of the cache devices fails, no harm is done.  That device goes
 offline.  The rest stay online.
 

So what you are saying is that if a single device fails in a striped L2ARC 
VDEV, then the entire VDEV is taken offline and the fallback is to simply use 
the regular ARC and fetch from the pool whenever there is a cache miss.

Or, does what you are saying here mean that if I have a 4 SSD's in a stripe for 
my L2ARC, and one device fails, the L2ARC will be reconfigured dynamically 
using the remaining SSD's for L2ARC.

It would be good to get an answer to this from someone who has actually tested 
this or is more intimately familiar with the ZFS code rather than all the 
speculation I've been getting so far.

 
 Sorry, if these questions have been asked before, but I cannot seem to
 find an answer.
 
 Since you said this twice, I'll answer it twice.  ;-)
 I think the best advice regarding cache/log device mirroring is in the ZFS
 Best Practices Guide.
 

Been there read that, many, many times.  It's an invaluable reference, I agree.

Thanks

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-05 Thread Edward Ned Harvey
 From: Michael Sullivan [mailto:michael.p.sulli...@mac.com]
 
 My Google is very strong and I have the Best Practices Guide committed
 to bookmark as well as most of it to memory.
 
 While it explains how to implement these, there is no information
 regarding failure of a device in a striped L2ARC set of SSD's.  I have

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa
rate_Cache_Devices

It is not possible to mirror or use raidz on cache devices, nor is it
necessary. If a cache device fails, the data will simply be read from the
main pool storage devices instead.

I guess I didn't write this part, but:  If you have multiple cache devices,
they are all independent from each other.  Failure of one does not negate
the functionality of the others.

 
 I'm running 2009.11 which is the latest OpenSolaris.  

Quoi??  2009.06 is the latest available from opensolaris.com and
opensolaris.org.

If you want something newer, AFAIK, you have to go to developer build, such
as osol-dev-134

Sure you didn't accidentally get 2008.11?


 I am also well aware of the effect of losing a ZIL device will cause
 loss of the entire pool.  Which is why I would never have a ZIL device
 unless it was mirrored and on different controllers.

Um ... the log device is not special.  If you lose *any* unmirrored device,
you lose the pool.  Except for cache devices, or log devices on zpool =19


 From the information I've been reading about the loss of a ZIL device,
 it will be relocated to the storage pool it is assigned to.  I'm not
 sure which version this is in, but it would be nice if someone could
 provide the release number it is included in (and actually works), it
 would be nice.  

What the heck?  Didn't I just answer that question?
I know I said this is answered in ZFS Best Practices Guide.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa
rate_Log_Devices

Prior to pool version 19, if you have an unmirrored log device that fails,
your whole pool is permanently lost.
Prior to pool version 19, mirroring the log device is highly recommended.
In pool version 19 or greater, if an unmirrored log device fails during
operation, the system reverts to the default behavior, using blocks from the
main storage pool for the ZIL, just as if the log device had been gracefully
removed via the zpool remove command.


 Also, will this functionality be included in the
 mythical 2010.03 release?


Zpool 19 was released in build 125.  Oct 16, 2009.  You can rest assured it
will be included in 2010.03, or 04, or whenever that thing comes out.


 So what you are saying is that if a single device fails in a striped
 L2ARC VDEV, then the entire VDEV is taken offline and the fallback is
 to simply use the regular ARC and fetch from the pool whenever there is
 a cache miss.

It sounds like you're only going to believe it if you test it.  Go for it.
That's what I did before I wrote that section of the ZFS Best Practices
Guide.

In ZFS, there is no such thing as striping, although the term is commonly
used, because adding multiple devices creates all the benefit of striping,
plus all the benefit of concatenation, but colloquially, people think
concatenation is weird or unused or something, so people just naturally
gravitated to calling it a stripe in ZFS too, although that's not
technically correct according to the traditional RAID definition.  But
nobody bothered to create a new term stripecat or whatever, for ZFS.


 Or, does what you are saying here mean that if I have a 4 SSD's in a
 stripe for my L2ARC, and one device fails, the L2ARC will be
 reconfigured dynamically using the remaining SSD's for L2ARC.

No reconfiguration necessary, because it's not a stripe.  It's 4 separate
devices, which ZFS can use simultaneously if it wants to.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-05 Thread Michael Sullivan
On 6 May 2010, at 13:18 , Edward Ned Harvey wrote:

 From: Michael Sullivan [mailto:michael.p.sulli...@mac.com]
 
 While it explains how to implement these, there is no information
 regarding failure of a device in a striped L2ARC set of SSD's.  I have
 
 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa
 rate_Cache_Devices
 
 It is not possible to mirror or use raidz on cache devices, nor is it
 necessary. If a cache device fails, the data will simply be read from the
 main pool storage devices instead.
 

I understand this.

 I guess I didn't write this part, but:  If you have multiple cache devices,
 they are all independent from each other.  Failure of one does not negate
 the functionality of the others.
 

Ok, this is what I wanted to know.  The that the L2ARC devices assigned to the 
pool are not striped but are independent.  Loss of one drive will just cause a 
cache miss and force ZFS to go out to the pool for its objects.

But then I'm not talking about using RAIDZ on a cache device.  I'm talking 
about a striped device which would be RAID-0.  If the SSD's are all assigned to 
L2ARC, then they are not striped in any fashion (RAID-0), but are completely 
independent and the L2ARC will continue to operate, just missing a single SSD.

 
 I'm running 2009.11 which is the latest OpenSolaris.  
 
 Quoi??  2009.06 is the latest available from opensolaris.com and
 opensolaris.org.
 
 If you want something newer, AFAIK, you have to go to developer build, such
 as osol-dev-134
 
 Sure you didn't accidentally get 2008.11?
 

My mistake… snv_111b which is 2009.06.  I know it went up to 11 somewhere.

 
 I am also well aware of the effect of losing a ZIL device will cause
 loss of the entire pool.  Which is why I would never have a ZIL device
 unless it was mirrored and on different controllers.
 
 Um ... the log device is not special.  If you lose *any* unmirrored device,
 you lose the pool.  Except for cache devices, or log devices on zpool =19
 

Well, if I've got a separate ZIL which is mirrored for performance, and 
mirrored because I think my data is valuable and important, I will have 
something more than RAID-0 on my main storage pool too.  More than likely 
RAIDZ2 since I plan on using L2ARC to help improve performance along with 
separate SSD mirrored ZIL devices.

 
 From the information I've been reading about the loss of a ZIL device,
 it will be relocated to the storage pool it is assigned to.  I'm not
 sure which version this is in, but it would be nice if someone could
 provide the release number it is included in (and actually works), it
 would be nice.  
 
 What the heck?  Didn't I just answer that question?
 I know I said this is answered in ZFS Best Practices Guide.
 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Sepa
 rate_Log_Devices
 
 Prior to pool version 19, if you have an unmirrored log device that fails,
 your whole pool is permanently lost.
 Prior to pool version 19, mirroring the log device is highly recommended.
 In pool version 19 or greater, if an unmirrored log device fails during
 operation, the system reverts to the default behavior, using blocks from the
 main storage pool for the ZIL, just as if the log device had been gracefully
 removed via the zpool remove command.
 

No need to get defensive here, all I'm looking for is the spool version number 
which supports it and the version of OpenSolaris which supports that ZPOOL 
version.

I think that if you are building for performance, it would be almost intuitive 
to have a mirrored ZIL in the event of failure, and perhaps even a hot spare 
available as well.  I don't like the idea of my ZIL being transferred back to 
the pool, but having it transferred back is better than the alternative which 
would be data loss or corruption.

 
 Also, will this functionality be included in the
 mythical 2010.03 release?
 
 
 Zpool 19 was released in build 125.  Oct 16, 2009.  You can rest assured it
 will be included in 2010.03, or 04, or whenever that thing comes out.
 

Thanks, build 125.

 
 So what you are saying is that if a single device fails in a striped
 L2ARC VDEV, then the entire VDEV is taken offline and the fallback is
 to simply use the regular ARC and fetch from the pool whenever there is
 a cache miss.
 
 It sounds like you're only going to believe it if you test it.  Go for it.
 That's what I did before I wrote that section of the ZFS Best Practices
 Guide.
 
 In ZFS, there is no such thing as striping, although the term is commonly
 used, because adding multiple devices creates all the benefit of striping,
 plus all the benefit of concatenation, but colloquially, people think
 concatenation is weird or unused or something, so people just naturally
 gravitated to calling it a stripe in ZFS too, although that's not
 technically correct according to the traditional RAID definition.  But
 nobody bothered to create a new term stripecat or whatever, for ZFS.
 

Ummm, yes 

[zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Michael Sullivan
HI,

I have a question I cannot seem to find an answer to.

I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.

I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be 
relocated back to the spool.  I'd probably have it mirrored anyway, just in 
case.  However you cannot mirror the L2ARC, so...

What I want to know, is what happens if one of those SSD's goes bad?  What 
happens to the L2ARC?  Is it just taken offline, or will it continue to perform 
even with one drive missing?

Sorry, if these questions have been asked before, but I cannot seem to find an 
answer.
Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Tomas Ögren
On 05 May, 2010 - Michael Sullivan sent me these 0,9K bytes:

 HI,
 
 I have a question I cannot seem to find an answer to.
 
 I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.
 
 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will
 be relocated back to the spool.  I'd probably have it mirrored anyway,
 just in case.  However you cannot mirror the L2ARC, so...

Given enough opensolaris.. Otherwise, your pool is screwed iirc.

 What I want to know, is what happens if one of those SSD's goes bad?
 What happens to the L2ARC?  Is it just taken offline, or will it
 continue to perform even with one drive missing?

L2ARC is a pure cache thing, if it gives bad data (checksum error), it
will be ignored, if you yank it, it will be ignored. It's very safe to
have crap hardware there (as long as they don't start messing up some
bus or similar). They can be added/removed at any time as well.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Freddie Cash
On Tue, May 4, 2010 at 12:16 PM, Michael Sullivan 
michael.p.sulli...@mac.com wrote:

 I have a question I cannot seem to find an answer to.

 I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.

 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be
 relocated back to the spool.  I'd probably have it mirrored anyway, just in
 case.  However you cannot mirror the L2ARC, so...

 What I want to know, is what happens if one of those SSD's goes bad?  What
 happens to the L2ARC?  Is it just taken offline, or will it continue to
 perform even with one drive missing?

 Sorry, if these questions have been asked before, but I cannot seem to find
 an answer.


Data in the L2ARC is checksummed.  If a checksum fails, or the device
disappears, data is read from the pool.  The L2ARC is essentially a
throw-away cache for reads.  If it's there, reads can be faster as data is
not pulled from disk.  If it's not there, data just gets pulled from disk as
per normal.

There's nothing really special about the L2ARC devices.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Marc Nicholas
The L2ARC will continue to function.

-marc

On 5/4/10, Michael Sullivan michael.p.sulli...@mac.com wrote:
 HI,

 I have a question I cannot seem to find an answer to.

 I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.

 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be
 relocated back to the spool.  I'd probably have it mirrored anyway, just in
 case.  However you cannot mirror the L2ARC, so...

 What I want to know, is what happens if one of those SSD's goes bad?  What
 happens to the L2ARC?  Is it just taken offline, or will it continue to
 perform even with one drive missing?

 Sorry, if these questions have been asked before, but I cannot seem to find
 an answer.
 Mike

 ---
 Michael Sullivan
 michael.p.sulli...@me.com
 http://www.kamiogi.net/
 Japan Mobile: +81-80-3202-2599
 US Phone: +1-561-283-2034

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Loss of L2ARC SSD Behaviour

2010-05-04 Thread Michael Sullivan
Ok, thanks.

So, if I understand correctly, it will just remove the device from the VDEV and 
continue to use the good ones in the stripe.

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Japan Mobile: +81-80-3202-2599
US Phone: +1-561-283-2034

On 5 May 2010, at 04:34 , Marc Nicholas wrote:

 The L2ARC will continue to function.
 
 -marc
 
 On 5/4/10, Michael Sullivan michael.p.sulli...@mac.com wrote:
 HI,
 
 I have a question I cannot seem to find an answer to.
 
 I know I can set up a stripe of L2ARC SSD's with say, 4 SSD's.
 
 I know if I set up ZIL on SSD and the SSD goes bad, the the ZIL will be
 relocated back to the spool.  I'd probably have it mirrored anyway, just in
 case.  However you cannot mirror the L2ARC, so...
 
 What I want to know, is what happens if one of those SSD's goes bad?  What
 happens to the L2ARC?  Is it just taken offline, or will it continue to
 perform even with one drive missing?
 
 Sorry, if these questions have been asked before, but I cannot seem to find
 an answer.
 Mike
 
 ---
 Michael Sullivan
 michael.p.sulli...@me.com
 http://www.kamiogi.net/
 Japan Mobile: +81-80-3202-2599
 US Phone: +1-561-283-2034
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 -- 
 Sent from my mobile device

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss