Re: [zfs-discuss] ZFS and Storage

2011-05-10 Thread przemol...@poczta.fm
On Thu, Jun 29, 2006 at 10:01:15AM +0200, Robert Milkowski wrote:
 Hello przemolicc,
 
 Thursday, June 29, 2006, 8:01:26 AM, you wrote:
 
 ppf On Wed, Jun 28, 2006 at 03:30:28PM +0200, Robert Milkowski wrote:
  ppf What I wanted to point out is the Al's example: he wrote about 
  damaged data. Data
  ppf were damaged by firmware _not_ disk surface ! In such case ZFS 
  doesn't help. ZFS can
  ppf detect (and repair) errors on disk surface, bad cables, etc. But 
  cannot detect and repair
  ppf errors in its (ZFS) code.
  
  Not in its code but definitely in a firmware code in a controller.
 
 ppf As Jeff pointed out: if you mirror two different storage arrays.
 
 Not only I belive. There are some classes of problems that even in one
 array ZFS could help for fw problems (with many controllers in
 active-active config like Symetrix).

Any real example ?

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi storage for virtual machines

2007-08-08 Thread Peter Baumgartner


 I have hundreds of Xen-based virtual machines running off a ZFS/iSCSI
 service; yes, it's viable. I can't speak for CentOS specifically; our
 infrastructure is using Debian Etch with our own build of Xen.


How does ZFS handle snapshots of large files like VM images? Is replication
done on the bit/block level or by file? In otherwords, does a snapshot of a
changed VM image take up the same amount of space as the image or only the
amount of space of the bits that have changed within the image?

I'm also strongly considering going with NFS or AFS instead of iSCSI so I
don't have to deal with management of an extra filesystem layer. The VMWare
community is split on which is faster. Are there any significant benefits to
either one on the ZFS side?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi storage for virtual machines

2007-08-08 Thread Neil Perrin

 How does ZFS handle snapshots of large files like VM images? Is 
 replication done on the bit/block level or by file? In otherwords, does 
 a snapshot of a changed VM image take up the same amount of space as the 
 image or only the amount of space of the bits that have changed within 
 the image?  

ZFS uses Copy On Write to implement snap shots.
No replication is done. When changes are made only the
blocks changed are different (the originals are kept by the
snapshot).

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi storage for virtual machines

2007-07-17 Thread Joshua . Goodall
[EMAIL PROTECTED] wrote on 17/07/2007 05:12:49 AM:

 I'm going to be setting up about 6 virtual machines (Windows  
 Linux) in either VMWare Server or Xen on a CentOS 5 box. I'd like to
 connect to a ZFS iSCSI target to store the vm images and be able to 
 use zfs snapshots for backup. I have no experience with ZFS, so I 
 have a couple of questions before I move forward. 
 
 1. Is this a feasible setup? If not, is there any way to make 
 something like this work reliably?
 
 2. Since I'd most likely want to restore single machines at a time, 
 is it best to have a zpool for each machine? 
 
 Any insight is appreciated.

I have hundreds of Xen-based virtual machines running off a ZFS/iSCSI 
service; yes, it's viable. I can't speak for CentOS specifically; our 
infrastructure is using Debian Etch with our own build of Xen.

Your success criteria are:
1. Staggering virtual machine cron entries to avoid high I/O contention,
2. Reliable gigabit/10gigabit switching infrastructure. Skimp on this and 
your project is sunk. Use a high-quality managed switch, good NICs, and 
well manufactured cables.
3. If using a smart, battery-backed, order-preserving storage array, 
append set zfs:zfs_nocacheflush = 1 (sans quotes) to /etc/system.

Splitting your storage into multiple zpools will just cause wastage and 
administrative complexity.  I strongly advise using a single zpool.

JG



This email, including any attachments, is intended only for the use of the 
individual or entity named above and may contain information that is 
confidential and privileged. Any information contained in this email is not to 
be used or disclosed for any purpose other than the purpose for which you 
received it. If you are not the intended recipient you are notified that 
disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. If you have received this 
email by mistake, please delete this email permanently from your system. 
WARNING: Although Editure has taken reasonable precautions to ensure no viruses 
are present in this email, Editure can not accept responsibility for any losses 
or damages whatsoever, arising from the use of this email and/or its 
attachments.
www.editure.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs iscsi storage for virtual machines

2007-07-16 Thread Peter Baumgartner

I'm going to be setting up about 6 virtual machines (Windows  Linux) in
either VMWare Server or Xen on a CentOS 5 box. I'd like to connect to a ZFS
iSCSI target to store the vm images and be able to use zfs snapshots for
backup. I have no experience with ZFS, so I have a couple of questions
before I move forward.

1. Is this a feasible setup? If not, is there any way to make something like
this work reliably?

2. Since I'd most likely want to restore single machines at a time, is it
best to have a zpool for each machine?

Any insight is appreciated.

--
Pete
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi storage for virtual machines

2007-07-16 Thread Malachi de Ælfweald

I had originally considered something similar, but... for ZFS snapshot
abilities, I am leaning more towards zfs-hosted NFS... Most of the other VMs
(FreeBSD, for example) can install onto NFS, it wouldn't actually be going
over the network, and it would allow file-level restore instead of
drive-level restore.

Just my untested 2 cents

Malachi

On 7/16/07, Peter Baumgartner [EMAIL PROTECTED] wrote:


I'm going to be setting up about 6 virtual machines (Windows  Linux) in
either VMWare Server or Xen on a CentOS 5 box. I'd like to connect to a ZFS
iSCSI target to store the vm images and be able to use zfs snapshots for
backup. I have no experience with ZFS, so I have a couple of questions
before I move forward.

1. Is this a feasible setup? If not, is there any way to make something
like this work reliably?

2. Since I'd most likely want to restore single machines at a time, is it
best to have a zpool for each machine?

Any insight is appreciated.

--
Pete

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi storage for virtual machines

2007-07-16 Thread Richard Elling
Peter Baumgartner wrote:
 I'm going to be setting up about 6 virtual machines (Windows  Linux) in 
 either VMWare Server or Xen on a CentOS 5 box. I'd like to connect to a 
 ZFS iSCSI target to store the vm images and be able to use zfs snapshots 
 for backup. I have no experience with ZFS, so I have a couple of 
 questions before I move forward.
 
 1. Is this a feasible setup? If not, is there any way to make something 
 like this work reliably?

Use some sort of data redundancy on the data.  The simplest is a mirrored
zpoool.

 2. Since I'd most likely want to restore single machines at a time, is 
 it best to have a zpool for each machine?

I'd recommend one zpool, multiple file systems.  That way you can manage
each file system (iSCSI target) separately, but still have the flexibility
of a large zpool.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-29 Thread przemolicc
On Wed, Jun 28, 2006 at 03:30:28PM +0200, Robert Milkowski wrote:
 ppf What I wanted to point out is the Al's example: he wrote about damaged 
 data. Data
 ppf were damaged by firmware _not_ disk surface ! In such case ZFS doesn't 
 help. ZFS can
 ppf detect (and repair) errors on disk surface, bad cables, etc. But cannot 
 detect and repair
 ppf errors in its (ZFS) code.
 
 Not in its code but definitely in a firmware code in a controller.

As Jeff pointed out: if you mirror two different storage arrays.

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-29 Thread przemolicc
On Thu, Jun 29, 2006 at 10:01:15AM +0200, Robert Milkowski wrote:
 Hello przemolicc,
 
 Thursday, June 29, 2006, 8:01:26 AM, you wrote:
 
 ppf On Wed, Jun 28, 2006 at 03:30:28PM +0200, Robert Milkowski wrote:
  ppf What I wanted to point out is the Al's example: he wrote about 
  damaged data. Data
  ppf were damaged by firmware _not_ disk surface ! In such case ZFS 
  doesn't help. ZFS can
  ppf detect (and repair) errors on disk surface, bad cables, etc. But 
  cannot detect and repair
  ppf errors in its (ZFS) code.
  
  Not in its code but definitely in a firmware code in a controller.
 
 ppf As Jeff pointed out: if you mirror two different storage arrays.
 
 Not only I belive. There are some classes of problems that even in one
 array ZFS could help for fw problems (with many controllers in
 active-active config like Symetrix).

Any real example ?

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] ZFS and Storage

2006-06-29 Thread Robert Milkowski
Hello przemolicc,

Thursday, June 29, 2006, 10:08:23 AM, you wrote:

ppf On Thu, Jun 29, 2006 at 10:01:15AM +0200, Robert Milkowski wrote:
 Hello przemolicc,
 
 Thursday, June 29, 2006, 8:01:26 AM, you wrote:
 
 ppf On Wed, Jun 28, 2006 at 03:30:28PM +0200, Robert Milkowski wrote:
  ppf What I wanted to point out is the Al's example: he wrote about 
  damaged data. Data
  ppf were damaged by firmware _not_ disk surface ! In such case ZFS 
  doesn't help. ZFS can
  ppf detect (and repair) errors on disk surface, bad cables, etc. But 
  cannot detect and repair
  ppf errors in its (ZFS) code.
  
  Not in its code but definitely in a firmware code in a controller.
 
 ppf As Jeff pointed out: if you mirror two different storage arrays.
 
 Not only I belive. There are some classes of problems that even in one
 array ZFS could help for fw problems (with many controllers in
 active-active config like Symetrix).

ppf Any real example ?

I wouldn't say such problems are common.
The issue is we don't know. From time to time some files are bad,
sometimes fsck is needed with no apparent reason.

I think only the future will tell how and when ZFS will protect us.
All I can say there's big potential in ZFS.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] ZFS and Storage

2006-06-28 Thread Robert Milkowski
Hello przemolicc,

Wednesday, June 28, 2006, 10:57:17 AM, you wrote:

ppf On Tue, Jun 27, 2006 at 04:16:13PM -0500, Al Hopper wrote:
 Case in point, there was a gentleman who posted on the Yahoo Groups solx86
 list and described how faulty firmware on a Hitach HDS system damaged a
 bunch of data.  The HDS system moves disk blocks around, between one disk
 and another, in the background, to optimized the filesystem layout.  Long
 after he had written data, blocks from one data set were intermingled with
 blocks for other data sets/files causing extensive data corruption.

ppf Al,

ppf the problem you described comes probably from failures in code of firmware
ppf not the failure of disk surface.  Sun's engineers can also do some mistakes
ppf in ZFS code, right ?

But the point is that ZFS should detect also such errors and take
proper actions. Other filesystems can't.

And of course there are bugs in ZFS :P

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-28 Thread przemolicc
On Wed, Jun 28, 2006 at 02:23:32PM +0200, Robert Milkowski wrote:
 Hello przemolicc,
 
 Wednesday, June 28, 2006, 10:57:17 AM, you wrote:
 
 ppf On Tue, Jun 27, 2006 at 04:16:13PM -0500, Al Hopper wrote:
  Case in point, there was a gentleman who posted on the Yahoo Groups solx86
  list and described how faulty firmware on a Hitach HDS system damaged a
  bunch of data.  The HDS system moves disk blocks around, between one disk
  and another, in the background, to optimized the filesystem layout.  Long
  after he had written data, blocks from one data set were intermingled with
  blocks for other data sets/files causing extensive data corruption.
 
 ppf Al,
 
 ppf the problem you described comes probably from failures in code of 
 firmware
 ppf not the failure of disk surface.  Sun's engineers can also do some 
 mistakes
 ppf in ZFS code, right ?
 
 But the point is that ZFS should detect also such errors and take
 proper actions. Other filesystems can't.

Does it mean that ZFS can detect errors in ZFS's code itself ? ;-)

What I wanted to point out is the Al's example: he wrote about damaged data. 
Data
were damaged by firmware _not_ disk surface ! In such case ZFS doesn't help. 
ZFS can
detect (and repair) errors on disk surface, bad cables, etc. But cannot detect 
and repair
errors in its (ZFS) code.

I am comparing firmware code to ZFS code.

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-28 Thread Jeremy Teo

Hello,


What I wanted to point out is the Al's example: he wrote about damaged data. 
Data
were damaged by firmware _not_ disk surface ! In such case ZFS doesn't help. 
ZFS can
detect (and repair) errors on disk surface, bad cables, etc. But cannot detect 
and repair
errors in its (ZFS) code.

I am comparing firmware code to ZFS code.



Firmware doesn't do end to end checksumming. If ZFS code is buggy, the
checksums won't match up anyway, so you still detect errors.

Plus it is a lot easier to debug ZFS code than firmware.

--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-28 Thread Jeff Victor

[EMAIL PROTECTED] wrote:

On Wed, Jun 28, 2006 at 02:23:32PM +0200, Robert Milkowski wrote:

What I wanted to point out is the Al's example: he wrote about damaged data. 
Data
were damaged by firmware _not_ disk surface ! In such case ZFS doesn't help. 
ZFS can
detect (and repair) errors on disk surface, bad cables, etc. But cannot detect 
and repair
errors in its (ZFS) code.



If you mean ZFS doesn't help with firmware problems that is not true. For 
example, if ZFS is mirroring a pool across two different storage arrays, a 
firmware error in one of them will cause problems that ZFS will detect when it 
tries to read the data. Further, ZFS would be able to correct the error by reading 
from the other mirror, unless the second array also suffered from a firmware error.


There are categories of problems that ZFS cannot handle, mostly regarding data 
availability after catastophes (as Richard E described) but ZFS can help with many 
firmware problems.


--
--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-28 Thread Casper . Dik


Depends on your definition of firmware. In higher end arrays the data is 
checksummed when it comes in and a hash is written when it gets to disk. 
Of course this is no where near end to end but it is better then nothing.


The checksum is often stored with the data (so if the data is not written
or in the wrong location the checksum is still valid)

ZFS stores the checksum with the data pointer; so it knows more about
the data and whether is was proper.

ZFS also checksums before the data travels over the fabric.

... and code is code. Easier to debug is a context sensitive term.


Uhm, well, firmware, in production systems?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-28 Thread Nagakiran







Depends on your definition of firmware. In higher end arrays the data 
is checksummed when it comes in and a hash is written when it gets to 
disk. Of course this is no where near end to end but it is better then 
nothing.


... and code is code. Easier to debug is a context sensitive term.



Its unfortunate that so many posts hung about the code,
Its the design that protects your data and with ZFS you have a better 
design for
data integrity. If the code is faulty and now thats a bug. And design 
should protect you

unless your error detection and correction logic is faulty.

(I mean this is like anti-corruption buereau being corrupt :-)).


There is a huge difference between ability to detect corruption versus 
not knowing

that data is corrupted at all.

Now if the code is upto design or not, is what real world testing shows,
in most of the cases ZFS should help.

Kiran
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Mika Borner
The vdev can handle dynamic lun growth, but the underlying VTOC or  
EFI label
may need to be zero'd and reapplied if you setup the initial vdev on 

a slice.  If
you introduced the entire disk to the pool you should be fine, but I 

believe you'll
still need to offline/online the pool.

Fine, at least the vdev can handle this...

I asked about this feature in October and hoped that it would be
implemented when integrating ZFS into Sol10U2 ...

http://www.opensolaris.org/jive/thread.jspa?messageID=11646

Does anybody know something about when this feature is finally coming?
This would keep the number of  LUNs low on the host. Especially as
devicenames can be really ugly (long!).

//Mika

# mv Disclaimer.txt /dev/null


-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Mika Borner
but there may not be filesystem space for double the data.
Sounds like there is a need for a zfs-defragement-file utility
perhaps?
Or if you want to be politically cagey about naming choice, perhaps,
zfs-seq-read-optimize-file ?  :-)

For Datawarehouse and streaming applications a 
seq-read-omptimization could bring additional performance. For
normal databases this should be benchmarked...

This brings me back to another question. We have a production database,
that is cloned on every end of month for end-of-month processing
(currently with a feature on our storage array).

I'm thinking about a ZFS version of this task. Requirements: the
production database should not suffer from performance degradation,
whilst running the clone in parallel. As ZFS does not clone all the
blocks, I wonder how much the procution database will suffer from
sharing most of the data with the clone (concurrent access vs. caching)

Maybe we need a feature in ZFS to do a full clone (speak: copy all
blocks) inside the pool, if performance is an issue just like the
Quick Copy vs. Shadow Image -features on HDS Arrays... 






-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Roch

Philip Brown writes:
  Roch wrote:
   And, ifthe load can accomodate   a
   reorder, to  get top per-spindle read-streaming performance,
   a cp(1) of the file should do wonders on the layout.
   
  
  but there may not be filesystem space for double the data.
  Sounds like there is a need for a zfs-defragement-file utility perhaps?
  
  Or if you want to be politically cagey about naming choice, perhaps,
  
  zfs-seq-read-optimize-file ?  :-)
  

Possibly or may using fcntl ?

Now the goal is to take a file with scattered blocks and order
them in contiguous chunks. So this is contigent on the
existence of regions of free contiguous disk space. This
will get more difficult as we get close to full on the
storage.

-r




  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Gregory Shaw
Most controllers support a background-scrub that will read a volume  
and repair any bad stripes.  This addresses the bad block issue in  
most cases.


It still doesn't help when a double-failure occurs.   Luckily, that's  
very rare.  Usually, in that case, you need to evacuate the volume  
and try to restore what was damaged.


On Jun 26, 2006, at 6:40 PM, Eric Schrock wrote:


On Mon, Jun 26, 2006 at 05:26:24PM -0600, Gregory Shaw wrote:


You're using hardware raid.  The hardware raid controller will  
rebuild
the volume in the event of a single drive failure.  You'd need to  
keep

on top of it, but that's a given in the case of either hardware or
software raid.


True for total drive failure, but not there are a more failure modes
than that.  With hardware RAID, there is no way for the RAID  
controller

to know which block was bad, and therefore cannot repair the block.
With RAID-Z, we have the integrated checksum and can do combinatorial
analysis to know not only which drive was bad, but what the data
_should_ be, and can repair it to prevent more corruption in the  
future.


- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/ 
eschrock


-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Torrey McMahon

Bart Smaalders wrote:

Gregory Shaw wrote:

On Tue, 2006-06-27 at 09:09 +1000, Nathan Kroenert wrote:

How would ZFS self heal in this case?




You're using hardware raid.  The hardware raid controller will rebuild
the volume in the event of a single drive failure.  You'd need to keep
on top of it, but that's a given in the case of either hardware or
software raid.

If you've got requirements for surviving an array failure, the
recommended solution in that case is to mirror between volumes on
multiple arrays.   I've always liked software raid (mirroring) in that
case, as no manual intervention is needed in the event of an array
failure.  Mirroring between discrete arrays is usually reserved for
mission-critical applications that cost thousands of dollars per hour in
downtime.



In other words, it won't.  You've spent the disk space, but
because you're mirroring in the wrong place (the raid array)
all ZFS can do is tell you that your data is gone.  With luck,
subsequent reads _might_ get the right data, but maybe not.


Careful here when you say wrong place. There are many scenarios where 
mirroring in the hardware is the correct way to go even when running ZFS 
on top of it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Jeff Victor
Unfortunately, a storage-based RAID controller cannot detect errors which occurred 
between the filesystem layer and the RAID controller, in either direction - in or 
out.  ZFS will detect them through its use of checksums.


But ZFS can only fix them if it can access redundant bits.  It can't tell a 
storage device to provide the redundant bits, so it must use its own data 
protection system (RAIDZ or RAID1) in order to correct errors it detects.



Gregory Shaw wrote:
Most controllers support a background-scrub that will read a volume  and 
repair any bad stripes.  This addresses the bad block issue in  most cases.


It still doesn't help when a double-failure occurs.   Luckily, that's  
very rare.  Usually, in that case, you need to evacuate the volume  and 
try to restore what was damaged.


On Jun 26, 2006, at 6:40 PM, Eric Schrock wrote:


On Mon, Jun 26, 2006 at 05:26:24PM -0600, Gregory Shaw wrote:



You're using hardware raid.  The hardware raid controller will  rebuild
the volume in the event of a single drive failure.  You'd need to  keep
on top of it, but that's a given in the case of either hardware or
software raid.



True for total drive failure, but not there are a more failure modes
than that.  With hardware RAID, there is no way for the RAID  controller
to know which block was bad, and therefore cannot repair the block.
With RAID-Z, we have the integrated checksum and can do combinatorial
analysis to know not only which drive was bad, but what the data
_should_ be, and can repair it to prevent more corruption in the  future.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/ 
eschrock



-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
--
Jeff VICTOR  Sun Microsystemsjeff.victor @ sun.com
OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/zones/faq
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Gregory Shaw
Not at all.  ZFS is a quantum leap in Solaris filesystem/VM  
functionality.


However,  I don't see a lot of use for RAID-Z (or Z2) in large  
enterprise customers situations.  For instance, does ZFS enable Sun  
to walk into an account and say You can now replace all of your high- 
end (EMC) disk with JBOD.?  I don't think many customers would bite  
on that.


RAID-Z is an excellent feature, however, it doesn't address many of  
the reasons for using high-end arrays:


- Exporting snapshots to alternate systems (for live database or  
backup purposes)

- Remote replication
- Sharing of storage among multiple systems (LUN masking and equivalent)
- Storage management (migration between tiers of storage)
- No-downtime failure replacement (the system doesn't even know)
- Clustering

I know that ZFS is still a work in progress, so some of the above may  
arrive in future versions of the product.


I see the RAID-Z[2] value in small-to-mid size systems where the  
storage is relatively small and you don't have high availability  
requirements.


On Jun 27, 2006, at 8:48 AM, Darren J Moffat wrote:

So everything you are saying seems to suggest you think ZFS was a  
waste of engineering time since hardware raid solves all the  
problems ?


I don't believe it does but I'm no storage expert and maybe I've  
drank too much cool aid.  I'm software person and for me ZFS is  
brilliant it is so much easier than managing any of the hardware  
raid systems I've dealt with.


--
Darren J Moffat


-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Gregory Shaw
This is getting pretty picky.  You're saying that ZFS will detect any  
errors introduced after ZFS has gotten the data.  However, as stated  
in a previous post, that doesn't guarantee that the data given to ZFS  
wasn't already corrupted.


If you don't trust your storage subsystem, you're going to encounter  
issues regardless of the software use to store data.  We'll have to  
see if ZFS can 'save' customers in this situation.  I've found that  
regardless of the storage solution in question you can't anticipate  
all issues and when a brownout or other ugly loss-of-service occurs,  
you may or may not be intact, ZFS or no.


I've never seen a product that can deal with all possible situations.

On Jun 27, 2006, at 9:01 AM, Jeff Victor wrote:

Unfortunately, a storage-based RAID controller cannot detect errors  
which occurred between the filesystem layer and the RAID  
controller, in either direction - in or out.  ZFS will detect them  
through its use of checksums.


But ZFS can only fix them if it can access redundant bits.  It  
can't tell a storage device to provide the redundant bits, so it  
must use its own data protection system (RAIDZ or RAID1) in order  
to correct errors it detects.



Gregory Shaw wrote:
Most controllers support a background-scrub that will read a  
volume  and repair any bad stripes.  This addresses the bad block  
issue in  most cases.
It still doesn't help when a double-failure occurs.   Luckily,  
that's  very rare.  Usually, in that case, you need to evacuate  
the volume  and try to restore what was damaged.

On Jun 26, 2006, at 6:40 PM, Eric Schrock wrote:

On Mon, Jun 26, 2006 at 05:26:24PM -0600, Gregory Shaw wrote:



You're using hardware raid.  The hardware raid controller will   
rebuild
the volume in the event of a single drive failure.  You'd need  
to  keep

on top of it, but that's a given in the case of either hardware or
software raid.



True for total drive failure, but not there are a more failure modes
than that.  With hardware RAID, there is no way for the RAID   
controller

to know which block was bad, and therefore cannot repair the block.
With RAID-Z, we have the integrated checksum and can do  
combinatorial

analysis to know not only which drive was bad, but what the data
_should_ be, and can repair it to prevent more corruption in the   
future.


- Eric

--
Eric Schrock, Solaris Kernel Development   http:// 
blogs.sun.com/ eschrock

-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. -  
Linus  Torvalds

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
-- 

Jeff VICTOR  Sun Microsystemsjeff.victor @  
sun.com

OS AmbassadorSr. Technical Specialist
Solaris 10 Zones FAQ:http://www.opensolaris.org/os/community/ 
zones/faq
-- 



-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-8273
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive MS 4382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382 [EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Casper . Dik

This is getting pretty picky.  You're saying that ZFS will detect any  
errors introduced after ZFS has gotten the data.  However, as stated  
in a previous post, that doesn't guarantee that the data given to ZFS  
wasn't already corrupted.

But there's a big difference between the time ZFS gets the data
and the time your typical storage system gets it.

And your typical storage system does not store any information which
allows it to detect all but the most simple errors.

Storage systems are complicated and have many failure modes at many
different levels.

- disks not writing data or writing data in incorrect location
- disks not reporting failures when they occur
- bit errors in disk write buffers causing data corruption
- storage array software with bugs
- storage array with undetected hardware errors
- data corruption in the path (such as switches with mangle
  packets but keep the TCP checksum working


If you don't trust your storage subsystem, you're going to encounter  
issues regardless of the software use to store data.  We'll have to  
see if ZFS can 'save' customers in this situation.  I've found that  
regardless of the storage solution in question you can't anticipate  
all issues and when a brownout or other ugly loss-of-service occurs,  
you may or may not be intact, ZFS or no.

I've never seen a product that can deal with all possible situations.

ZFS attempts to deal with more problems than any of the current
existing solutions by giving end-to-end verification of the data.

One of the reasons why ZFS was created was a particular large customer
who had datacorruption which occured two years (!) before it was
detected.  The bad data had migrated and corrupted; the good data
was no longer available on backups (which weren't very relevant
anyway after such a long time)

ZFS tries to give one important guarantee: if the data is bad, we will
not return it.

One case in point is the person in MPK with a SATA controller which
corrupts memory; he didn't discover this using UFS (except for perhaps
a few strange events he noticed).  After switch to ZFS he started to
find corruption so now he uses a self-healing ZFS mirror (or RAIDZ).

ZFS helps at the low end as much as it does at the highend.

I'll bet that ZFS will generate more calls about broken hardware
and fingers will be pointed at ZFS at first because it's the new
kid; it will be some time before people realize that the data was
rotting all along.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Nicolas Williams
On Tue, Jun 27, 2006 at 09:41:10AM -0600, Gregory Shaw wrote:
 This is getting pretty picky.  You're saying that ZFS will detect any  
 errors introduced after ZFS has gotten the data.  However, as stated  
 in a previous post, that doesn't guarantee that the data given to ZFS  
 wasn't already corrupted.

There will always be some place where errors can be introduced and go on
undetected.  But some parts of the system are more error prone than
others, and ZFS targets the most error prone of them: rotating rust.

For the rest, make sure you have ECC memory, that you're using secure
NFS (with krb5i or krb5p), and the probability of undetectable data
corruption errors should be much closer to zero than what you'd get with
other systems.

That said, there's a proposal to add end-to-end data checksumming over
NFSv4 (see the IETF NFSv4 WG list archives).  That proposal can't
protect meta-data, and it doesn't remove any one type of data corruption
error on the client side, but it does on the server side.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Dale Ghent

Torrey McMahon wrote:

ZFS is greatfor the systems that can run it. However, any enterprise 
datacenter is going to be made up of many many hosts running many many 
OS. In that world you're going to consolidate on large arrays and use 
the features of those arrays where they cover the most ground. For 
example, if I've 100 hosts all running different OS and apps and I can 
perform my data replication and redundancy algorithms, in most cases 
Raid, in one spot then it will be much more cost efficient to do it there.


Exactly what I'm pondering.

In the near to mid term, Solaris with ZFS can be seen as sort of a 
storage virtualizer where it takes disks into ZFS pools and volumes and 
then presents them to other hosts and OSes via iSCSI, NFS, SMB and so 
on. At that point, those other OSes can enjoy the benefits of ZFS.


In the long term, it would be nice to see ZFS (or its concepts) 
integrated as the LUN provisioning and backing store mechanism on 
hardware RAID arrays themselves, supplanting the traditional RAID 
paradigms that have been in use for years.


/dale
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Torrey McMahon

Jason Schroeder wrote:

Torrey McMahon wrote:


[EMAIL PROTECTED] wrote:



I'll bet that ZFS will generate more calls about broken hardware
and fingers will be pointed at ZFS at first because it's the new
kid; it will be some time before people realize that the data was
rotting all along.




EhhhI don't think so. Most of our customers have HW arrays that 
have been scrubbing data for years and years as well as apps on the 
top that have been verifying the data. (Oracle for example.) Not to 
mention there will be a bit of time before people move over to ZFS in 
the high end.




Ahh... but there is the rub.  Today - you/we don't *really* know, do 
we?  Maybe there's bad juju blocks, maybe not.  Running ZFS, whether 
in a redundant vdev or not, will certainly turn the big spotlight on 
and give us the data that checksums matched, or they didn't.  



A spotlight on what? How is that data going to get into ZFS? The more I 
think about this more I realize it's going to do little for existing 
data sets. You're going to have to migrate that data from filesystem X 
into ZFS first. From that point on ZFS has no idea if the data was bad 
to begin with. If you can do an in place migration then you might be 
able to weed out some bad physical blocks/drives over time but I assert 
that the current disk scrubbing methodologies catch most of those.


Yes, it's great for new data sets where you started with ZFS. Sorry if I 
sound like I'm raining on the parade here folks. That's not the case, 
really, and I'm all for the great new features and EAU ZFS gives where 
applicable.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Darren J Moffat

Nicolas Williams wrote:

On Tue, Jun 27, 2006 at 09:41:10AM -0600, Gregory Shaw wrote:
This is getting pretty picky.  You're saying that ZFS will detect any  
errors introduced after ZFS has gotten the data.  However, as stated  
in a previous post, that doesn't guarantee that the data given to ZFS  
wasn't already corrupted.


There will always be some place where errors can be introduced and go on
undetected.  But some parts of the system are more error prone than
others, and ZFS targets the most error prone of them: rotating rust.

For the rest, make sure you have ECC memory, that you're using secure
NFS (with krb5i or krb5p), and the probability of undetectable data
corruption errors should be much closer to zero than what you'd get with
other systems.


Another alternative is using IPsec with just AH.

For the benefit of those outside of Sun MPK17 both krb5i and IPsec AH 
were used to diagnose and prove that we have a faulty router in a lab 
that was causing very strange build errors.  TCP/IP alone didn't catch 
the problems and sometimes they showed up with SCCS simple checksums and 
sometimes we had compile errors.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-27 Thread Darren J Moffat

Torrey McMahon wrote:

Darren J Moffat wrote:
So everything you are saying seems to suggest you think ZFS was a 
waste of engineering time since hardware raid solves all the problems ?


I don't believe it does but I'm no storage expert and maybe I've drank 
too much cool aid.  I'm software person and for me ZFS is brilliant it 
is so much easier than managing any of the hardware raid systems I've 
dealt with.



ZFS is greatfor the systems that can run it. However, any enterprise 
datacenter is going to be made up of many many hosts running many many 
OS. In that world you're going to consolidate on large arrays and use 
the features of those arrays where they cover the most ground. For 
example, if I've 100 hosts all running different OS and apps and I can 
perform my data replication and redundancy algorithms, in most cases 
Raid, in one spot then it will be much more cost efficient to do it there.


but you still need a local file system on those systems in many cases.

So back to where we started I guess, how to effectively use ZFS to 
benefit Solaris (and the other platforms it gets ported to) while still 
using Hardware RAID because you have no choice but to use it.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and Storage

2006-06-26 Thread Mika Borner
Hi

Now that Solaris 10 06/06 is finally downloadable I have some questions
about ZFS.

-We have a big storage sytem supporting RAID5 and RAID1. At the moment,
we only use RAID5 (for non-solaris systems as well). We are thinking
about using ZFS on those LUNs instead of UFS. As ZFS on Hardware RAID5
seems like overkill, an option would be to use RAID1 with RAID-Z. Then
again, this is a waist of space, as it needs more disks, due to the
mirroring. Later on, we might be using asynchronous replication to
another storage system using SAN, even more waste of space. This looks
somehow like storage virtualization as of today just doesn't work nicely
together. What we need, would be the feature to use JBODs.

-Does ZFS in the current version support LUN extension? With UFS, we
have to zero the VTOC, and then adjust the new disk geometry. How does
it look like with ZFS?

-I've read the threads about zfs and databases. Still I'm not 100%
convenienced about read performance. Doesn't the fragmentation of the
large database files (because of the concept of COW) impact
read-performance? 

-Does anybody have any experience in database cloning using the ZFS
mechanism? What factors influence the performance, when running the
cloned database in parallel? 
-I really like the idea to keep all needed databasefiles together, to
allow fast and consistent cloning.

Thanks

Mika


# mv Disclaimer.txt /dev/null





-
This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Roch

About:

  -I've read the threads about zfs and databases. Still I'm not 100%
  convenienced about read performance. Doesn't the fragmentation of the
  large database files (because of the concept of COW) impact
  read-performance? 

I do need to get back to this thread. The way I am currently 
looking at this is this:

ZFS will perform great at doing the transaction
component (say the small (8K) O_DSYNC writes)
because the ZIL will aggregate them in fewer larger
I/Os and the block allocation will stream them to the 
surface.

On the other hand, read streaming will require a
good prefetch code (under review) to get the read
performance we want.


If  the   requirements balances   random  writes   and  read
streaming, then  ZFS  should be  right there  with the  best
FS. If the critical requirement  focuses exclusively on read
streaming a file that was written randomly and, in addition,
the  number  of spindles is limited   then  that is  not the
sweetspot of ZFS.  Read performance  should still scale with
number of  spindles.  And, ifthe load can accomodate   a
reorder, to  get top per-spindle read-streaming performance,
a cp(1) of the file should do wonders on the layout.


-r


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Gregory Shaw


On Jun 26, 2006, at 1:15 AM, Mika Borner wrote:


Hi

Now that Solaris 10 06/06 is finally downloadable I have some  
questions

about ZFS.

-We have a big storage sytem supporting RAID5 and RAID1. At the  
moment,

we only use RAID5 (for non-solaris systems as well). We are thinking
about using ZFS on those LUNs instead of UFS. As ZFS on Hardware RAID5
seems like overkill, an option would be to use RAID1 with RAID-Z. Then
again, this is a waist of space, as it needs more disks, due to the
mirroring. Later on, we might be using asynchronous replication to
another storage system using SAN, even more waste of space. This looks
somehow like storage virtualization as of today just doesn't work  
nicely

together. What we need, would be the feature to use JBODs.



If you've got hardware raid-5, why not just run regular (non-raid)  
pools on top of the raid-5?


I wouldn't go back to JBOD.   Hardware arrays offer a number of  
advantages to JBOD:

- disk microcode management
- optimized access to storage
- large write caches
- RAID computation can be done in specialized hardware
	- SAN-based hardware products allow sharing of storage among  
multiple hosts.  This allows storage to be utilized more effectively.



-Does ZFS in the current version support LUN extension? With UFS, we
have to zero the VTOC, and then adjust the new disk geometry. How does
it look like with ZFS?



I don't understand what you're asking.  What problem is solved by  
zeroing the vtoc?



-I've read the threads about zfs and databases. Still I'm not 100%
convenienced about read performance. Doesn't the fragmentation of the
large database files (because of the concept of COW) impact
read-performance?



This is discussed elsewhere in the zfs discussion group.


-Does anybody have any experience in database cloning using the ZFS
mechanism? What factors influence the performance, when running the
cloned database in parallel?
-I really like the idea to keep all needed databasefiles together, to
allow fast and consistent cloning.

Thanks

Mika


# mv Disclaimer.txt /dev/null





-- 
---

This message is intended for the addressee only and may
contain confidential or privileged information. If you
are not the intended receiver, any disclosure, copying
to any person or any action taken or omitted to be taken
in reliance on this e-mail, is prohibited and may be un-
lawful. You must therefore delete this e-mail.
Internet communications may not be secure or error-free
and may contain viruses. They may be subject to possible
data corruption, accidental or on purpose. This e-mail is
not and should not be construed as an offer or the
solicitation of an offer to purchase or subscribe or sell
or redeem any investments.
-- 
---


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-
Gregory Shaw, IT Architect
Phone: (303) 673-8273Fax: (303) 673-2773
ITCTO Group, Sun Microsystems Inc.
1 StorageTek Drive ULVL4-382  [EMAIL PROTECTED] (work)
Louisville, CO 80028-4382[EMAIL PROTECTED] (home)
When Microsoft writes an application for Linux, I've Won. - Linus  
Torvalds




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Philip Brown

Roch wrote:

And, ifthe load can accomodate   a
reorder, to  get top per-spindle read-streaming performance,
a cp(1) of the file should do wonders on the layout.



but there may not be filesystem space for double the data.
Sounds like there is a need for a zfs-defragement-file utility perhaps?

Or if you want to be politically cagey about naming choice, perhaps,

zfs-seq-read-optimize-file ?  :-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Olaf Manczak

Eric Schrock wrote:

On Mon, Jun 26, 2006 at 05:26:24PM -0600, Gregory Shaw wrote:

You're using hardware raid.  The hardware raid controller will rebuild
the volume in the event of a single drive failure.  You'd need to keep
on top of it, but that's a given in the case of either hardware or
software raid.


True for total drive failure, but not there are a more failure modes
than that.  With hardware RAID, there is no way for the RAID controller
to know which block was bad, and therefore cannot repair the block.
With RAID-Z, we have the integrated checksum and can do combinatorial
analysis to know not only which drive was bad, but what the data
_should_ be, and can repair it to prevent more corruption in the future.


Keep in mind that each disk data block is accompanied by a pretty
long error correction code (ECC) which allows for (a) verification
of data integrity (b) repair of lost/misread bits (typically up to
about 10% of the block data).

Therefore, in case of single block errors there are several possible
situations:

- non-recoverable errors - the amount of correct bits in the combined
  data + ECC in insufficient - such errors are visible to the RAID
  controller, the controller can use a redundant copy of the data, and
  the controller can perform the repair

- recoverable errors - some bits can't be read correctly but they
  can be reconstructed  using ECC - these errors are not directly
  visible to either the RAID controller or ZFS. However, the disks
  keep the count of recoverable errors so disk scrubbers can identify
  disk areas with rotten blocks and force block relocation

- silent data corruption - it can happen in memory before the data
  was written to disk, it can occur in the disk cache, it can be caused
  by a bug in disk firmware. Here the disk controller can't do
  anything and the end-to-end checksums, which ZFS offers,
  are the only solution.

-- Olaf

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Bart Smaalders

Gregory Shaw wrote:

On Tue, 2006-06-27 at 09:09 +1000, Nathan Kroenert wrote:

How would ZFS self heal in this case?




You're using hardware raid.  The hardware raid controller will rebuild
the volume in the event of a single drive failure.  You'd need to keep
on top of it, but that's a given in the case of either hardware or
software raid.

If you've got requirements for surviving an array failure, the
recommended solution in that case is to mirror between volumes on
multiple arrays.   I've always liked software raid (mirroring) in that
case, as no manual intervention is needed in the event of an array
failure.  Mirroring between discrete arrays is usually reserved for
mission-critical applications that cost thousands of dollars per hour in
downtime.



In other words, it won't.  You've spent the disk space, but
because you're mirroring in the wrong place (the raid array)
all ZFS can do is tell you that your data is gone.  With luck,
subsequent reads _might_ get the right data, but maybe not.

- Bart

--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Storage

2006-06-26 Thread Richard Elling

Olaf Manczak wrote:

Eric Schrock wrote:

On Mon, Jun 26, 2006 at 05:26:24PM -0600, Gregory Shaw wrote:

You're using hardware raid.  The hardware raid controller will rebuild
the volume in the event of a single drive failure.  You'd need to keep
on top of it, but that's a given in the case of either hardware or
software raid.


True for total drive failure, but not there are a more failure modes
than that.  With hardware RAID, there is no way for the RAID controller
to know which block was bad, and therefore cannot repair the block.
With RAID-Z, we have the integrated checksum and can do combinatorial
analysis to know not only which drive was bad, but what the data
_should_ be, and can repair it to prevent more corruption in the future.


Keep in mind that each disk data block is accompanied by a pretty
long error correction code (ECC) which allows for (a) verification
of data integrity (b) repair of lost/misread bits (typically up to
about 10% of the block data).


AFAIK, typical disk ECC will correct 8 bytes.  I'd love for it to be
10% (51 bytes).  Do you have a pointer to such information?


Therefore, in case of single block errors there are several possible
situations:

- non-recoverable errors - the amount of correct bits in the combined
  data + ECC in insufficient - such errors are visible to the RAID
  controller, the controller can use a redundant copy of the data, and
  the controller can perform the repair

- recoverable errors - some bits can't be read correctly but they
  can be reconstructed  using ECC - these errors are not directly
  visible to either the RAID controller or ZFS. However, the disks
  keep the count of recoverable errors so disk scrubbers can identify
  disk areas with rotten blocks and force block relocation

- silent data corruption - it can happen in memory before the data
  was written to disk, it can occur in the disk cache, it can be caused
  by a bug in disk firmware. Here the disk controller can't do
  anything and the end-to-end checksums, which ZFS offers,
  are the only solution.


Another mode occurs when you use a format(1m)-like utility to scan
and repair disks.  For such utilities, if the data cannot be reconstructed
it is zero-filled.  If there was real data stored there, then ZFS will
detect it and the majority of other file systems will not detect it.
For an array, one should not be able to readily access such utilities,
and cause such corrective actions, but I would not bet the farm on it --
end-to-end error detection will always prevail.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss