Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-24 Thread Torrey McMahon

Gary Mills wrote:

On Wed, Jun 20, 2007 at 12:23:18PM -0400, Torrey McMahon wrote:
  

James C. McPherson wrote:


Roshan Perera wrote:
  

But Roshan, if your pool is not replicated from ZFS' point of view,
then all the multipathing and raid controller backup in the world will
not make a difference.
  

James, I Agree from ZFS point of view. However, from the EMC or the
customer point of view they want to do the replication at the EMC level
and not from ZFS. By replicating at the ZFS level they will loose some
storage and its doubling the replication. Its just customer use to
working with Veritas and UFS and they don't want to change their 
habbits.

I just have to convince the customer to use ZFS replication.


that's a great shame because if they actually want
to make use of the features of ZFS such as replication,
then they need to be serious about configuring their
storage to play in the ZFS world and that means
replication that ZFS knows about.
  
Also, how does replication at the ZFS level use more storage - I'm 
assuming raw block - then at the array level?



SAN storage generally doesn't work that way.  They use some magical
redundancy scheme, which may be RAID-5 or WAFL, from which the Storage
Administrator carves out virtual disks.  These are best viewed as an
array of blocks.  All disk administration, such as replacing failed
disks, takes place on the storage device without affecting the virtual
disks.  There's no need for disk administration or additional
redundancy on the client side.  If more space is needed on the client,
the Storage Administrator simply expands the virtual disk by extending
its blocks.  ZFS needs to play nicely in this environment because
that's what's available in large organizations that have centralized
their storage.  Asking for raw disks doesn't work.
  


Are we talking about replication - I have a copy of my data on an other 
system - or redundancy - I have a system where I can tolerate a local 
failure?


...and I understand the ZFS has to play nice with HW raid argument. :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-24 Thread Torrey McMahon

Victor Engle wrote:

On 6/20/07, Torrey McMahon [EMAIL PROTECTED] wrote:
Also, how does replication at the ZFS level use more storage - I'm
assuming raw block - then at the array level?
___



Just to add to the previous comments. In the case where you have a SAN
array providing storage to a host for use with ZFS the SAN storage
really needs to be redundant in the array AND the zpools need to be
redundant pools.

The reason the SAN storage should be redundant is that SAN arrays are
designed to serve logical units. The logical units are usually
allocated from a raid set, storage pool or aggregate of some kind. The
array side pool/aggregate may include 10 300GB disks and may have 100+
luns allocated from it for example. If redundancy is not used in the
array side pool/aggregate and then 1 disk failure will kill 100+ luns
at once. 


That makes a lot of sense in configurations where an array is exporting 
LUNs built on raid volumes to a set of heterogeneous hosts. If you're 
direct connected to a single box running ZFS or a set of boxes running 
ZFS you probably want to export something as close to the raw disks as 
possible while maintaining ZFS level redundancy. (Like two R5 LUNs in a 
ZFS mirror.) Creating a raid set, carving out lots of LUNs and then 
handing them all over to ZFS isn't going to buy you a lot and could 
cause performance issues. (LUN skew for example.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread Roshan Perera


 But Roshan, if your pool is not replicated from ZFS'
 point of view, then all the multipathing and raid
 controller backup in the world will not make a difference.

James, I Agree from ZFS point of view. However, from the EMC or the customer 
point of view they want to do the replication at the EMC level and not from 
ZFS. By replicating at the ZFS level they will loose some storage and its 
doubling the replication. Its just customer use to working with Veritas and UFS 
and they don't want to change their habbits. I just have to convince the 
customer to use ZFS replication.

Thanks again


 
 
 
 James C. McPherson
 --
 Solaris kernel software engineer
 Sun Microsystems
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread Roshan Perera
Hi all,

Is there a place where I can find ZFS best practices guide to use against DMX 
and a roadmap of ZFS ?

Also, the customer now looking at big ZFS installations in production. Would 
you guys happen to know or where I can find details of the numbers of current 
installations ? We are looking at akmost 10Terrabytes of data to be stored on 
DMX using ZFS (customer is not comfortable with the RaidZ solution in addition 
to their best practice of raiding at DMX levell) Any feedback, experiences and 
more importantly gotchas will be muchly appreciated. 

Thanks in advance.

Roshan



- Original Message -
From: Roshan Perera [EMAIL PROTECTED]
Date: Wednesday, June 20, 2007 10:49 am
Subject: Re: [zfs-discuss] Re: ZFS - SAN and Raid
To: [EMAIL PROTECTED]
Cc: Bruce McAlister [EMAIL PROTECTED], zfs-discuss@opensolaris.org, Richard 
Elling [EMAIL PROTECTED]

 
 
  But Roshan, if your pool is not replicated from ZFS'
  point of view, then all the multipathing and raid
  controller backup in the world will not make a difference.
 
 James, I Agree from ZFS point of view. However, from the EMC or 
 the customer point of view they want to do the replication at the 
 EMC level and not from ZFS. By replicating at the ZFS level they 
 will loose some storage and its doubling the replication. Its just 
 customer use to working with Veritas and UFS and they don't want 
 to change their habbits. I just have to convince the customer to 
 use ZFS replication.
 
 Thanks again
 
 
  
  
  
  James C. McPherson
  --
  Solaris kernel software engineer
  Sun Microsystems
  
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread James C. McPherson

Roshan Perera wrote:

Hi all,

Is there a place where I can find ZFS best practices guide to use against
DMX and a roadmap of ZFS ?
Also, the customer now looking at big ZFS installations in production.
Would you guys happen to know or where I can find details of the numbers
of current installations ? We are looking at akmost 10Terrabytes of data
to be stored on DMX using ZFS (customer is not comfortable with the RaidZ
solution in addition to their best practice of raiding at DMX levell) Any
feedback, experiences and more importantly gotchas will be muchly
appreciated.


http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
and

I know Ben Rockwood (now of Joyent) has blogged about how much
storage they're using, all managed with ZFS... I just can't
find the blog entry.

Hope this helps,
James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread James C. McPherson

Roshan Perera wrote:



But Roshan, if your pool is not replicated from ZFS' point of view,
then all the multipathing and raid controller backup in the world will
not make a difference.


James, I Agree from ZFS point of view. However, from the EMC or the
customer point of view they want to do the replication at the EMC level
and not from ZFS. By replicating at the ZFS level they will loose some
storage and its doubling the replication. Its just customer use to
working with Veritas and UFS and they don't want to change their habbits.
I just have to convince the customer to use ZFS replication.


Hi Roshan,
that's a great shame because if they actually want
to make use of the features of ZFS such as replication,
then they need to be serious about configuring their
storage to play in the ZFS world and that means
replication that ZFS knows about.



James C. McPherson
--
Solaris kernel software engineer
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread Torrey McMahon

James C. McPherson wrote:

Roshan Perera wrote:



But Roshan, if your pool is not replicated from ZFS' point of view,
then all the multipathing and raid controller backup in the world will
not make a difference.


James, I Agree from ZFS point of view. However, from the EMC or the
customer point of view they want to do the replication at the EMC level
and not from ZFS. By replicating at the ZFS level they will loose some
storage and its doubling the replication. Its just customer use to
working with Veritas and UFS and they don't want to change their 
habbits.

I just have to convince the customer to use ZFS replication.


Hi Roshan,
that's a great shame because if they actually want
to make use of the features of ZFS such as replication,
then they need to be serious about configuring their
storage to play in the ZFS world and that means
replication that ZFS knows about.



Also, how does replication at the ZFS level use more storage - I'm 
assuming raw block - then at the array level?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread Victor Engle

On 6/20/07, Torrey McMahon [EMAIL PROTECTED] wrote:
Also, how does replication at the ZFS level use more storage - I'm
assuming raw block - then at the array level?
___



Just to add to the previous comments. In the case where you have a SAN
array providing storage to a host for use with ZFS the SAN storage
really needs to be redundant in the array AND the zpools need to be
redundant pools.

The reason the SAN storage should be redundant is that SAN arrays are
designed to serve logical units. The logical units are usually
allocated from a raid set, storage pool or aggregate of some kind. The
array side pool/aggregate may include 10 300GB disks and may have 100+
luns allocated from it for example. If redundancy is not used in the
array side pool/aggregate and then 1 disk failure will kill 100+ luns
at once.

On 6/20/07, Torrey McMahon [EMAIL PROTECTED] wrote:

James C. McPherson wrote:
 Roshan Perera wrote:

 But Roshan, if your pool is not replicated from ZFS' point of view,
 then all the multipathing and raid controller backup in the world will
 not make a difference.

 James, I Agree from ZFS point of view. However, from the EMC or the
 customer point of view they want to do the replication at the EMC level
 and not from ZFS. By replicating at the ZFS level they will loose some
 storage and its doubling the replication. Its just customer use to
 working with Veritas and UFS and they don't want to change their
 habbits.
 I just have to convince the customer to use ZFS replication.

 Hi Roshan,
 that's a great shame because if they actually want
 to make use of the features of ZFS such as replication,
 then they need to be serious about configuring their
 storage to play in the ZFS world and that means
 replication that ZFS knows about.


Also, how does replication at the ZFS level use more storage - I'm
assuming raw block - then at the array level?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-20 Thread Gary Mills
On Wed, Jun 20, 2007 at 12:23:18PM -0400, Torrey McMahon wrote:
 James C. McPherson wrote:
 Roshan Perera wrote:
 
 But Roshan, if your pool is not replicated from ZFS' point of view,
 then all the multipathing and raid controller backup in the world will
 not make a difference.
 
 James, I Agree from ZFS point of view. However, from the EMC or the
 customer point of view they want to do the replication at the EMC level
 and not from ZFS. By replicating at the ZFS level they will loose some
 storage and its doubling the replication. Its just customer use to
 working with Veritas and UFS and they don't want to change their 
 habbits.
 I just have to convince the customer to use ZFS replication.
 
 that's a great shame because if they actually want
 to make use of the features of ZFS such as replication,
 then they need to be serious about configuring their
 storage to play in the ZFS world and that means
 replication that ZFS knows about.
 
 Also, how does replication at the ZFS level use more storage - I'm 
 assuming raw block - then at the array level?

SAN storage generally doesn't work that way.  They use some magical
redundancy scheme, which may be RAID-5 or WAFL, from which the Storage
Administrator carves out virtual disks.  These are best viewed as an
array of blocks.  All disk administration, such as replacing failed
disks, takes place on the storage device without affecting the virtual
disks.  There's no need for disk administration or additional
redundancy on the client side.  If more space is needed on the client,
the Storage Administrator simply expands the virtual disk by extending
its blocks.  ZFS needs to play nicely in this environment because
that's what's available in large organizations that have centralized
their storage.  Asking for raw disks doesn't work.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Richard Elling

Victor Engle wrote:

Roshan,

As far as I know, there is no problem at all with using SAN storage
with ZFS and it does look like you were having an underlying problem
with either powerpath or the array.


Correct.  A write failed.


The best practices guide on opensolaris does recommend replicated
pools even if your backend storage is redundant. There are at least 2
good reasons for that. ZFS needs a replica for the self healing
feature to work. Also there is no fsck like tool for ZFS so it is a
good idea to make sure self healing can work.


Yes, currently ZFS on Solaris will panic if a non-redundant write fails.
This is known and being worked on, but there really isn't a good solution
if a write fails, unless you have some ZFS-level redundancy.

NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.


I think first I would track down the cause of the messages just prior
to the zfs write error because even with replicated pools if several
devices error at once then the pool could be lost.


Yes, multiple failures can cause data loss.  No magic here.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Victor Engle


 The best practices guide on opensolaris does recommend replicated
 pools even if your backend storage is redundant. There are at least 2
 good reasons for that. ZFS needs a replica for the self healing
 feature to work. Also there is no fsck like tool for ZFS so it is a
 good idea to make sure self healing can work.




NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.



I understand that the on disk state is always consistent but the self
healing feature can correct blocks that have bad checksums if zfs is
able to retrieve the block from a good replica. So even though the
filesystem is consistent, the data can be corrupt in non-redundant
pools. I am unsure of what happens with a non-redundant pool when a
block has a bad checksum and perhaps you could clear that up. Does
this cause a problem for the pool or is it limited to the file or
files affected by the bad block and otherwise the pool is online and
healthy.

Thanks,
Vic
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 attached below the errors. But the question still remains is ZFS only happy
 with JBOD disks and not SAN storage with hardware raid. Thanks 

ZFS works fine on our SAN here.  You do get a kernel panic (Solaris-10U3)
if a LUN disappears for some reason (without ZFS-level redundancy), but
I understand that bug is fixed in a Nevada build;  I'm hoping to see the
fix in Solaris-10U4.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Richard Elling

Victor Engle wrote:


 The best practices guide on opensolaris does recommend replicated
 pools even if your backend storage is redundant. There are at least 2
 good reasons for that. ZFS needs a replica for the self healing
 feature to work. Also there is no fsck like tool for ZFS so it is a
 good idea to make sure self healing can work.




NB. fsck is not needed for ZFS because the on-disk format is always
consistent.  This is orthogonal to hardware faults.


I understand that the on disk state is always consistent but the self
healing feature can correct blocks that have bad checksums if zfs is
able to retrieve the block from a good replica.


Yes.  That is how it works.  By default, metatadata is replicated.
For real data, you can use copies, mirroring, or raidz[12]


 So even though the
filesystem is consistent, the data can be corrupt in non-redundant
pools.


No.  If the data is corrupt and cannot be reconstructed, it is lost.
Recall that UFS's fsck only corrects file system metadata, not real
data.  Most file systems which have any kind of preformance work this
way.  ZFS is safer, because of COW, ZFS won't overwrite existing data
leading to corruption -- but other file systems can (eg. UFS).


 I am unsure of what happens with a non-redundant pool when a
block has a bad checksum and perhaps you could clear that up. Does
this cause a problem for the pool or is it limited to the file or
files affected by the bad block and otherwise the pool is online and
healthy.


It depends on where the bad block is.  If it isn't being used, no foul[1].
If it is metadata, then we recover because of redundant metadata.  If it
is in a file with no redundancy (copies=1, by default) then an error will
be logged to FMA and the file name is visible to zpool status.  You can
decide if that file is important to you.

This is an area where there is continuing development, far beyond what
ZFS alone can do.  The ultimate goal is that we get to the point where
most faults can be tolerated.  No rest for the weary :-)

[1] this is different than software RAID systems which don't know if a
block is being used or not.  In ZFS, we only care about faults in blocks
which are being used, for the most part.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Roshan Perera
Thanks for all your replies. Lot of info to take it back. In this case it seems 
like emcp carried out a repair to a path to LUN Followed by a panic. 

Jun  4 16:30:12 su621dwdb emcp: [ID 801593 kern.notice] Info: Assigned volume 
Symm 000290100491 vol 0ffe to

I don't think panic should be the answer in this type of scenario, as there is 
redundant path to the LUN and Hardware Raid is in place inside SAN. From what I 
gather there is work being carried out to find a better solution. What is the 
proposed solution or when it will be availble is the question ?

Thanks again.

Roshan


- Original Message -
From: Richard Elling [EMAIL PROTECTED]
Date: Tuesday, June 19, 2007 6:28 pm
Subject: Re: [zfs-discuss] Re: ZFS - SAN and Raid
To: Victor Engle [EMAIL PROTECTED]
Cc: Bruce McAlister [EMAIL PROTECTED], zfs-discuss@opensolaris.org, Roshan 
Perera [EMAIL PROTECTED]

 Victor Engle wrote:
  Roshan,
  
  As far as I know, there is no problem at all with using SAN storage
  with ZFS and it does look like you were having an underlying problem
  with either powerpath or the array.
 
 Correct.  A write failed.
 
  The best practices guide on opensolaris does recommend replicated
  pools even if your backend storage is redundant. There are at 
 least 2
  good reasons for that. ZFS needs a replica for the self healing
  feature to work. Also there is no fsck like tool for ZFS so it 
 is a
  good idea to make sure self healing can work.
 
 Yes, currently ZFS on Solaris will panic if a non-redundant write 
 fails.This is known and being worked on, but there really isn't a 
 good solution
 if a write fails, unless you have some ZFS-level redundancy.
 
 NB. fsck is not needed for ZFS because the on-disk format is always
 consistent.  This is orthogonal to hardware faults.
 
  I think first I would track down the cause of the messages just 
 prior to the zfs write error because even with replicated pools 
 if several
  devices error at once then the pool could be lost.
 
 Yes, multiple failures can cause data loss.  No magic here.
  -- richard
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS - SAN and Raid

2007-06-19 Thread Gary Mills
On Wed, Jun 20, 2007 at 11:16:39AM +1000, James C. McPherson wrote:
 Roshan Perera wrote:
 
 I don't think panic should be the answer in this type of scenario, as
 there is redundant path to the LUN and Hardware Raid is in place inside
 SAN. From what I gather there is work being carried out to find a better
 solution. What is the proposed solution or when it will be availble is
 the question ?
 
 But Roshan, if your pool is not replicated from ZFS'
 point of view, then all the multipathing and raid
 controller backup in the world will not make a difference.

If the multipathing is working correctly, and one path to the data
remains intact, the SCSI level should retry the write error
successfully.  This certainly happens with UFS on our fibre-channel
SAN.  There's usually a SCSI bus reset message along with a message
the failover to the other path.  Of course, once the SCSI level
exhausts its retries, something else has to happen, just as it would
with a physical disk.  This must be when ZFS causes a panic.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss