Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-14 Thread JZ
Folks,
what can I post to the list to make the discussion go on?

Is this what you folks want to see? which I shared with King and High but 
not you folks?
http://www.excelsioritsolutions.com/jz/jzbrush/jzbrush.htm
This is not even IT stuff so that I never thought I should post this to the 
list...

This is getting too strange for an open discussion.
Please, folks.

Best,
z 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-13 Thread JZ
Ok, so someone is doing IT and has questions.
Thank you!
[I did not post this using another name, because I am too honorable to do 
that.]

This is a list discussion, should not be paused for one voice.

best,
z
[If Orvar has other questions that I have not addressed, please ask me 
off-list. It's ok.]


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-13 Thread JZ
Still not happy?
I guess I will have do more spam myself --

So, I have to explain why I didn't like Linux but I like MS and OpenSolaris?

I don't have any religious love for MS or Sun.
Just that I believe, talents are best utilized in an organized and 
systematic fashion, to benefit the whole.
Leadership and talent management is important in any organization.

Linux has a community.
A community is not an organization, but can be guided by an organization.
OpenSolaris is guided by Sun, by folks I trust that will lead in a 
constructive fashion.

And MS Storage, they had the courage years ago, to say, we can do storage, 
datacenter storage, just as well as we do desktop, because we can learn, and 
we can dream!
And they strived, in an organized and systematic fashion, under some 
constructive leadership.


Folks, please, I don't know why there has been no post to the list.
I would be very arrogant if I think I can cause the silence in technology 
discussions. Please.

Best,
z

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-13 Thread JZ

no one is working tonight?
where is the discussions?
ok, I will not be picking on Orvar all the time, if that's why...

the windows statements was heavy, but hey, I am at home, not at work, it was 
just because Orvar was suffering.

folks, are we not going to do IT just because I played with Orvar?



Ok, this is it for me tonight, no more spam. Happy?

Goodnight
z 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-13 Thread JZ
Ok, Orvar, that's why I always liked you. You really want to get to the 
point, otherwise you won't give up.


So this is all about the Windows thing huh?

Yes, I love MS Storage because we shared/sharing a common dream.
Yes, I love King and High because they are not arrogant if you respect them.
And Yes, I would love to see OpenSolaris and Windows become the best 
alternatives, not Linux and Windows.


So now we can be all happy and do IT?
Fulfilling?

best,
z

- Original Message - 
From: "Orvar Korvar" 
To: 
Sent: Tuesday, January 13, 2009 3:46 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Got some more information about HW raid vs ZFS:
> http://www.opensolaris.org/jive/thread.jspa?messageID=326654#326654
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-13 Thread Orvar Korvar
Got some more information about HW raid vs ZFS:
http://www.opensolaris.org/jive/thread.jspa?messageID=326654#326654
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-08 Thread JZ

[just for the beloved Orvar]

Ok, rule of thumb to save you some open time -- anything with "z", or "j", 
would probably be safe enough for your baby data.

And yeah, I manage my own lunch hours BTW.
:-)

best,
z

- Original Message - 
From: "Orvar Korvar" 

To: 
Sent: Thursday, January 08, 2009 10:01 AM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?



Thank you. How does raidz2 compare to raid-2? Safer? Less safe?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 
<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-08 Thread Will Murnane
On Thu, Jan 8, 2009 at 10:01, Orvar Korvar
 wrote:
> Thank you. How does raidz2 compare to raid-2? Safer? Less safe?
Raid-2 is much less used, for one, uses many more disks for parity,
for two, and is much slower in any application I can think of.
Suppose you have 11 100G disks.  Raid-2 would use 7 for data and 4 for
parity, total capacity 700G, and would be able to recover from any
single bit flips per data row (e.g., if any disk were lost or
corrupted (!), it could recover its contents).  This is not done using
checksums, but rather ECC.  One could implement checksums on top of
this, I suppose.  A major downside of raid-2 is that "efficient" use
of space only happens when the raid groups are of size 2**k-1 for some
integer k; this is because the Hamming code includes parity bits at
certain intervals (see [1]).

Raidz2, on the other hand, would take your 11 100G disks and use 9 for
data and 2 for parity, and put checksums on blocks.  This means that
recovering any two corrupt or missing disks (as opposed to one with
raid-2) is possible; with any two pieces of a block potentially
damaged, one can calculate all the possibilities for what the block
could have been before damage and accept the one whose calculated
checksum matches the stored one.  Thus, raidz2 is safer and more
storage-efficient than raid-2.

This is all mostly academic, as nobody uses raid-2.  It's only as safe
as raidz (can repair one error, or detect two) and space efficiency
for normal-sized arrays is fairly atrocious.  Use raidz{,2} and forget
about it.

Will

[1]: http://en.wikipedia.org/wiki/Hamming_code#General_algorithm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-08 Thread Scott Laird
RAID 2 is something weird that no one uses, and really only exists on
paper as part of Berkeley's original RAID paper, IIRC.  raidz2 is more
or less RAID 6, just like raidz is more or less RAID 5.  With raidz2,
you have to lose 3 drives per vdev before data loss occurs.


Scott

On Thu, Jan 8, 2009 at 7:01 AM, Orvar Korvar
 wrote:
> Thank you. How does raidz2 compare to raid-2? Safer? Less safe?
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-08 Thread Orvar Korvar
Thank you. How does raidz2 compare to raid-2? Safer? Less safe?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-07 Thread JZ
Folks, I have had much fun and caused much trouble.
I hope we now have learned the "open" spirit of storage.
I will be less involved with the list discussion going forward, since me too 
have much work to do in my super domain.
[but I still have lunch hours, so be good!]

As I always say, thank you very much for the love and the tolerance!

Take good care of you baby data in 2009, open folks!

Best,
zStorageAnalyst 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
Ok, folks, new news -  [feel free to comment in any fashion, since I don't 
know how yet.]


EMC ACQUIRES OPEN-SOURCE ASSETS FROM SOURCELABS 
http://go.techtarget.com/r/5490612/6109175






<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread JZ
[ok, no one replying, my spam then...]

Open folks just care about SMART so far.
http://www.mail-archive.com/linux-s...@vger.kernel.org/msg07346.html

Enterprise folks care more about spin-down.
(not an open thing yet, unless new practical industry standard is here that 
I don't know. yeah right.)

best,
z

- Original Message - 
From: "Anton B. Rang" 
To: 
Sent: Tuesday, January 06, 2009 9:07 AM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> For SCSI disks (including FC), you would use the FUA bit on the read 
> command.
>
> For SATA disks ... does anyone care?  ;-)
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-06 Thread Anton B. Rang
For SCSI disks (including FC), you would use the FUA bit on the read command.

For SATA disks ... does anyone care?  ;-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-05 Thread JZ
Darren, we have spent much time on this topic.

I have provided enough NetApp docs to you and you seem studied.
So please study the ZFS docs available at Sun.
Any thoughts you need folks to validate, please post.
But the list does not do the thinking for you.

The ways of implementing technologies are almost infinite and things can go 
wrong in different ways for different causes. And addtional elements in the 
solution will impact the overall chance of things going wrong.
That's as much as I can say, based on your question.

Goodnight!
z


- Original Message - 
From: "A Darren Dunham" 
To: 
Sent: Monday, January 05, 2009 2:42 AM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> On Sat, Jan 03, 2009 at 09:58:37PM -0500, JZ wrote:
>>> Under what situations would you expect any differences between the ZFS
>>> checksums and the Netapp checksums to appear?
>>>
>>> I have no evidence, but I suspect the only difference (modulo any bugs)
>>> is how the software handles checksum failures.
>
>> As I said, some NetApp folks I won't attack.
>> http://andyleonard.com/2008/03/05/on-parity-lost/
>>
>> And some I don't really care. RAID-DP was cool (but CPQ ProLiant RAID ADG
>> was there for a long time too...).
>> http://partners.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html
>>
>> And then some I have no idea who said what...
>> http://www.feedage.com/feeds/1625300/comments-on-unanswered-questions-about-netapp
>>
>
> Those documents discuss the Netapp checksum methods, but I don't get
> from them under what situations you would expect the ZFS and Netapp
> techniques would provide different levels of validation (or what
> conditions would cause one to fail but not the other).
>
> -- 
> Darren
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-05 Thread A Darren Dunham
On Sat, Jan 03, 2009 at 09:58:37PM -0500, JZ wrote:
>> Under what situations would you expect any differences between the ZFS
>> checksums and the Netapp checksums to appear?
>>
>> I have no evidence, but I suspect the only difference (modulo any bugs)
>> is how the software handles checksum failures.

> As I said, some NetApp folks I won't attack.
> http://andyleonard.com/2008/03/05/on-parity-lost/
>
> And some I don't really care. RAID-DP was cool (but CPQ ProLiant RAID ADG 
> was there for a long time too...).
> http://partners.netapp.com/go/techontap/matl/sample/0206tot_resiliency.html
>
> And then some I have no idea who said what...
> http://www.feedage.com/feeds/1625300/comments-on-unanswered-questions-about-netapp
>

Those documents discuss the Netapp checksum methods, but I don't get
from them under what situations you would expect the ZFS and Netapp
techniques would provide different levels of validation (or what
conditions would cause one to fail but not the other).

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-04 Thread Tim
On Sun, Jan 4, 2009 at 5:47 PM, Orvar Korvar  wrote:

> "ECC theory tells, that you need a minimum distance of 3
> to correct one error in a codeword, ergo neither RAID-5 or RAID-6
> are enough: you need RAID-2 (which nobody uses today)."
>
> What is "RAID-2"? Is it raidz2?
> --
>


Google is your friend ;)
http://www.pcguide.com/ref/hdd/perf/raid/levels/singleLevel2-c.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-04 Thread Orvar Korvar
"ECC theory tells, that you need a minimum distance of 3
to correct one error in a codeword, ergo neither RAID-5 or RAID-6
are enough: you need RAID-2 (which nobody uses today)."

What is "RAID-2"? Is it raidz2?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-03 Thread A Darren Dunham
On Wed, Dec 31, 2008 at 01:53:03PM -0500, Miles Nordin wrote:
> The thing I don't like about the checksums is that they trigger for
> things other than bad disks, like if your machine loses power during a
> resilver, or other corner cases and bugs.  I think the Netapp
> block-level RAID-layer checksums don't trigger for as many other
> reasons as the ZFS filesystem-level checksums, so chasing problems is
> easier.

Why does losing power during a resilver cause any issues for the
checksums in ZFS?  Admittedly, bugs can always cause problems, but
that's true for any software.  I'm not sure that I see a reason that the
integrated checksums and the separate checksums are more or less prone
to bugs. 

Under what situations would you expect any differences between the ZFS
checksums and the Netapp checksums to appear?

I have no evidence, but I suspect the only difference (modulo any bugs)
is how the software handles checksum failures.  

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread JZ
On second thought, let me further explain why I had the Linux link in the 
same post.

That was written a while ago, but I think the situation for the cheap RAID 
cards has not changed much, though the RAID ASICs in RAID enclosures are 
getting more and more robust, just not "open".

If you take risk management into consideration, that range of chance is just 
too much to take, when the data demand is not only for data accessing, but 
also for accessing the correct data.
We are talking about 0.001% of defined downtime headroom for a 4-9 SLA (that 
may be defined as "accessing the correct data").

You and me can wait half a day for network failures and the world can turn 
just as fine, but not for Joe Tucci.
Not to mention the additional solution ($$$) that must be implemented to 
handle possible user operational errors for highly risky users. [still not 
for you and me for the business case of this solution, not so sure about Mr. 
Tucci though.  ;-) ]

best,
z




http://www.nber.org:80/sys-admin/linux-nas-raid.html

Let's repeat the reliability calculation with our new knowledge of the 
situation. In our experience perhaps half of drives have at least one 
unreadable sector in the first year. Again assume a 6 percent chance of a 
single failure. The chance of at least one of the remaining two drives 
having a bad sector is 75% (1-(1-.5)^2). So the RAID 5 failure rate is about 
4.5%/year, which is .5% MORE than the 4% failure rate one would expect from 
a two drive RAID 0 with the same capacity. Alternatively, if you just had 
two drives with a partition on each and no RAID of any kind, the chance of a 
failure would still be 4%/year but only half the data loss per incident, 
which is considerably better than the RAID 5 can even hope for under the 
current reconstruction policy even with the most expensive hardware.
We don't know what the reconstruction policy is for other raid controllers, 
drivers or NAS devices. None of the boxes we bought acknowledged this 
"gotcha" but none promised to avoid it either. We assume Netapp and ECCS 
have this under control, since we have had several single drive failures on 
those devices with no difficulty resyncing. We have not had a single drive 
failure yet in the MVD based boxes, so we really don't know what they will 
do. [Since that was written we have had such failures, and they were able to 
reconstruct the failed drive, but we don't know if they could always do so].

Some mitigation of the danger is possible. You could read and write the 
entire drive surface periodically, and replace any drives with even a single 
uncorrectable block visible. A daemon Smartd is available for Linux that 
will scan the disk in background for errors and report them. We had been 
running that, but ignored errors on unwritten sectors, because we were used 
to such errors disappearing when the sector was written (and the bad sector 
remapped).

Our current inclination is to shift to a recent 3ware controller, which we 
understand has a "continue on error" rebuild policy available as an option 
in the array setup. But we would really like to know more about just what 
that means. What do the apparently similar RAID controllers from Mylex, LSI 
Logic and Adaptec do about this? A look at their web sites reveals no 
information.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread JZ
haha, this makes a cheerful new year start -- this kind of humor is only 
available at open storage.


BTW, I did not know the pyramids are crumbling now, since it was built with 
love.
but the GreatWall was crumbling, since it was built with hate (until we 
fixed part of that for tourist $$$).




(if this is a text-only email, see attached pic, hopefully the Solaris mail 
server does that conversion automatically like Lotus.)

-z

- Original Message - 
From: "Bob Friesenhahn" 

To: "JZ" 
Cc: 
Sent: Friday, January 02, 2009 10:31 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?



On Fri, 2 Jan 2009, JZ wrote:
We are talking about 0.001% of defined downtime headroom for a 4-9 SLA 
(that may be defined as "accessing the correct data").


It seems that some people spend a lot of time analyzing their own hairy 
navel and think that it must be the surely be center of the universe due 
to its resemblance to a black hole.


If you just turn the darn computer off, then you can claim that the reaons 
for the loss of availability is the computer and not the fault of data 
storage.  A MTTDL of 10^14 or 10^16 is much larger than any number of 
trailing 9s that people like to dream about.  Even the pyramids are 
crumbling now.


You and me can wait half a day for network failures and the world can 
turn just as fine, but not for Joe Tucci.


This guy only knows how to count beans.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Bob Friesenhahn
On Fri, 2 Jan 2009, JZ wrote:
> We are talking about 0.001% of defined downtime headroom for a 4-9 SLA (that 
> may be defined as "accessing the correct data").

It seems that some people spend a lot of time analyzing their own 
hairy navel and think that it must be the surely be center of the 
universe due to its resemblance to a black hole.

If you just turn the darn computer off, then you can claim that the 
reaons for the loss of availability is the computer and not the fault 
of data storage.  A MTTDL of 10^14 or 10^16 is much larger than any 
number of trailing 9s that people like to dream about.  Even the 
pyramids are crumbling now.

> You and me can wait half a day for network failures and the world can turn 
> just as fine, but not for Joe Tucci.

This guy only knows how to count beans.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread JZ
Yes, agreed.


However, for enterprises with risk management as a key factor building into 
their decision making processes --

what if the integrity risk is reflected on Joe Tucci's personal network 
data?
OMG, big impact to the SLA when the SLA is critical...
[right, Tim?]
;-)
-z

- Original Message - 
From: "Bob Friesenhahn" 
To: "JZ" 
Cc: 
Sent: Friday, January 02, 2009 8:21 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> On Fri, 2 Jan 2009, JZ wrote:
>>
>> I have not done a cost study on ZFS towards the 999s, but I guess we 
>> can
>> do better with more system and I/O based assurance over just RAID 
>> checksum,
>> so customers can get to more s with less redundant hardware and
>> software feature enablement fees.
>
> Even with a fairly trival ZFS setup using hot-swap drive bays, the primary 
> factor impacting "availability" are non-disk related factors such as 
> motherboard, interface cards, and operating system bugs. Unless you step 
> up to an exotic fault-tolerant system ($$$), an entry-level server will 
> offer as much availability as a mid-range server, and many "enterprise" 
> servers.  In fact, the simple entry-level server may offer more 
> availability due to being simpler. The charts on Richard Elling's blog 
> make that pretty obvious.
>
> Is is best not to confuse "data integrity" with "availability".
>
> Bob
> ==
> Bob Friesenhahn
> bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
> GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
> 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Bob Friesenhahn
On Fri, 2 Jan 2009, JZ wrote:
>
> I have not done a cost study on ZFS towards the 999s, but I guess we can
> do better with more system and I/O based assurance over just RAID checksum,
> so customers can get to more s with less redundant hardware and
> software feature enablement fees.

Even with a fairly trival ZFS setup using hot-swap drive bays, the 
primary factor impacting "availability" are non-disk related factors 
such as motherboard, interface cards, and operating system bugs. 
Unless you step up to an exotic fault-tolerant system ($$$), an 
entry-level server will offer as much availability as a mid-range 
server, and many "enterprise" servers.  In fact, the simple 
entry-level server may offer more availability due to being simpler. 
The charts on Richard Elling's blog make that pretty obvious.

Is is best not to confuse "data integrity" with "availability".

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread JZ
Folks feedback on my spam communications was that -- I jump from point to 
point too fast and am lazy to explain and often somewhat misleading.;-)

On the NetApp thing, please note they had their time talking about SW RAID 
can be as good as/better than HW RAID.  However, from a customer point of 
view, the math is done in a reversed fashion.

Roughly,
for a 3-9 (99.9%) availability, customer has 8 hours of annual downtime, and 
RAID could help;
for a 4-9 (99.99%) availability, customer has 45 minutes of annual downtime, 
and RAID alone won't do, H/A clustering may be needed (without clustering, a 
big iron box, such as ES7, can do 99.98%, but hard to reach 99.99%, in 
our past field studies).
for a 5-9 (99.999%) availability, customer has 5 minutes of annual downtime, 
and H/A clustering with automated stateful failover is a must.

So, for every additional 9, the customer needs to learn additional pages in 
the NetApp price book, which I think that's the real issue with NetApp 
(enterprise customers with the checkbooks may have absolutely no idea about 
how RAID checksum would impact their SLO/SLA costs.)

I have not done a cost study on ZFS towards the 999s, but I guess we can 
do better with more system and I/O based assurance over just RAID checksum, 
so customers can get to more s with less redundant hardware and 
software feature enablement fees.

Also note that the upcoming NetApp ONTAP/GX converged release would 
hopefully improve the NetApp solution cost structure at some level, but I 
cannot discuss that until it's officially released [beyond keep screaming 
"6920+ZFS"].
;-)

best,
z


- Original Message - 
From: "Richard Elling" 
To: "Tim" 
Cc: ; "Ulrich Graef" 
Sent: Friday, January 02, 2009 2:35 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Tim wrote:
>>
>>
>>
>> The Netapp paper mentioned by JZ
>> 
>> (http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt
>> 
>> <http://pages.cs.wisc.edu/%7Ekrioukov/ParityLostAndParityRegained-FAST08.ppt>)
>> talks about write verify.
>>
>> Would this feature make sense in a ZFS environment? I'm not sure if
>> there is any advantage. It seems quite unlikely, when data is
>> written in
>> a redundant way to two different disks, that both disks lose or
>> misdirect the same writes.
>>
>> Maybe ZFS could have an option to enable instant readback of written
>> blocks, if one wants to be absolutely sure, data is written
>> correctly to
>> disk.
>>
>>
>> Seems to me it would make a LOT of sense in a WORM type system.
>
> Since ZFS only deals with block devices, how would we guarantee
> that the subsequent read was satisfied from the media rather than a
> cache?  If the answer is that we just wait long enough for the caches
> to be emptied, then the existing scrub should work, no?
> -- richard
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Richard Elling
Tim wrote:
>
>
>
> The Netapp paper mentioned by JZ
> (http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt
> 
> )
> talks about write verify.
>
> Would this feature make sense in a ZFS environment? I'm not sure if
> there is any advantage. It seems quite unlikely, when data is
> written in
> a redundant way to two different disks, that both disks lose or
> misdirect the same writes.
>
> Maybe ZFS could have an option to enable instant readback of written
> blocks, if one wants to be absolutely sure, data is written
> correctly to
> disk.
>
>
> Seems to me it would make a LOT of sense in a WORM type system.

Since ZFS only deals with block devices, how would we guarantee
that the subsequent read was satisfied from the media rather than a
cache?  If the answer is that we just wait long enough for the caches
to be emptied, then the existing scrub should work, no?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Tim
On Fri, Jan 2, 2009 at 10:47 AM, Mika Borner  wrote:

> Ulrich Graef wrote:
> > You need not to wade through your paper...
> > ECC theory tells, that you need a minimum distance of 3
> > to correct one error in a codeword, ergo neither RAID-5 or RAID-6
> > are enough: you need RAID-2 (which nobody uses today).
> >
> > Raid-Controllers today take advantage of the fact that they know,
> > which disk is returning the bad block, because this disk returns
> > a read error.
> >
> > ZFS is even able to correct, when an error in the data exist,
> > but no disk is reporting a read error,
> > because ZFS ensures the integrity from root-block to the data blocks
> > with a long checksum accompanying the block pointers.
> >
> >
>
> The Netapp paper mentioned by JZ
> (http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt
> )
> talks about write verify.
>
> Would this feature make sense in a ZFS environment? I'm not sure if
> there is any advantage. It seems quite unlikely, when data is written in
> a redundant way to two different disks, that both disks lose or
> misdirect the same writes.
>
> Maybe ZFS could have an option to enable instant readback of written
> blocks, if one wants to be absolutely sure, data is written correctly to
> disk.
>

Seems to me it would make a LOT of sense in a WORM type system.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Mika Borner
Ulrich Graef wrote:
> You need not to wade through your paper...
> ECC theory tells, that you need a minimum distance of 3
> to correct one error in a codeword, ergo neither RAID-5 or RAID-6
> are enough: you need RAID-2 (which nobody uses today).
>
> Raid-Controllers today take advantage of the fact that they know,
> which disk is returning the bad block, because this disk returns
> a read error.
>
> ZFS is even able to correct, when an error in the data exist,
> but no disk is reporting a read error,
> because ZFS ensures the integrity from root-block to the data blocks
> with a long checksum accompanying the block pointers.
>
>   

The Netapp paper mentioned by JZ 
(http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt) 
talks about write verify.

Would this feature make sense in a ZFS environment? I'm not sure if 
there is any advantage. It seems quite unlikely, when data is written in 
a redundant way to two different disks, that both disks lose or 
misdirect the same writes.

Maybe ZFS could have an option to enable instant readback of written 
blocks, if one wants to be absolutely sure, data is written correctly to 
disk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread Ulrich Graef
Hi Carsten,

Carsten Aulbert wrote:
> Hi Marc,
> 
> Marc Bevand wrote:
>> Carsten Aulbert  aei.mpg.de> writes:
>>> In RAID6 you have redundant parity, thus the controller can find out
>>> if the parity was correct or not. At least I think that to be true
>>> for Areca controllers :)
>> Are you sure about that ? The latest research I know of [1] says that 
>> although an algorithm does exist to theoretically recover from
>> single-disk corruption in the case of RAID-6, it is *not* possible to
>> detect dual-disk corruption with 100% certainty. And blindly running
>> the said algorithm in such a case would even introduce corruption on a
>> third disk.
>>
> 
> Well, I probably need to wade through the paper (and recall Galois field
> theory) before answering this. We did a few tests in a 16 disk RAID6
> where we wrote data to the RAID, powered the system down, pulled out one
> disk, inserted it into another computer and changed the sector checksum
> of a few sectors (using hdparm's utility makebadsector).
 > ...

You need not to wade through your paper...
ECC theory tells, that you need a minimum distance of 3
to correct one error in a codeword, ergo neither RAID-5 or RAID-6
are enough: you need RAID-2 (which nobody uses today).

Raid-Controllers today take advantage of the fact that they know,
which disk is returning the bad block, because this disk returns
a read error.

ZFS is even able to correct, when an error in the data exist,
but no disk is reporting a read error,
because ZFS ensures the integrity from root-block to the data blocks
with a long checksum accompanying the block pointers.

A disk can deliver bad data without returning a read error by
  - misdirected read (bad positioning of disk head before reading)
  - previously misdirected write (on writing this sector)
  - unfourtunate sector error (data wrong, but checksum is ok)

These events can happen and are documented on disk vendors web pages:

  a) A bad head positioning is estimated per 10^8 to 10^9 head moves.
 => this is more than once in 8 weeks on a fully loaded disk

  b) Unrecoverable data error (bad data on disk)
 is around one sector per 10^16 Bytes read.
 => one unrecoverable error per 177 TByte read.

OK, these numbers seem pretty good, but when you have 1000 disks
in your datacenter, you will have at least one of these errors
each day...

Therfore: Use ZFS in a redundant configuration!

Regards,

Ulrich

-- 
| Ulrich Graef, Senior System Engineer, OS Ambassador\
|  Operating Systems, Performance \ Platform Technology   \
|   Mail: ulrich.gr...@sun.com \ Global Systems Enginering \
|Phone: +49 6103 752 359\ Sun Microsystems Inc  \

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-02 Thread JZ
Nice...

More on sector checksum -- 

* anything prior to 2005 would be sort of out-of-date/fashion, because
http://www.patentstorm.us/patents/6952797/description.html

* the software RAID - NetApp view
http://pages.cs.wisc.edu/~krioukov/ParityLostAndParityRegained-FAST08.ppt

* the Linux view
http://www.nber.org/sys-admin/linux-nas-raid.html

best,
z

- Original Message - 
From: "Marc Bevand" 
To: 
Sent: Thursday, January 01, 2009 6:40 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Mattias Pantzare  gmail.com> writes:
>> On Tue, Dec 30, 2008 at 11:30, Carsten Aulbert wrote:
>> > [...]
>> > where we wrote data to the RAID, powered the system down, pulled out 
>> > one
>> > disk, inserted it into another computer and changed the sector checksum
>> > of a few sectors (using hdparm's utility makebadsector).
>>
>> You are talking about diffrent types of errors. You tested errors that
>> the disk can detect. That is not a problem on any RAID, that is what
>> it is designed for.
>
> Mattias pointed out to me in a private email I missed Carsten's mention of
> hdparm --make-bad-sector. Duh!
>
> So Carsten: Mattias is right, you did not simulate a silent data 
> corruption
> error. hdparm --make-bad-sector just introduces a regular media error that
> *any* RAID level can detect and fix.
>
> -marc
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-01 Thread Carsten Aulbert
Hi Marc (and all the others),

Marc Bevand wrote:

> So Carsten: Mattias is right, you did not simulate a silent data corruption 
> error. hdparm --make-bad-sector just introduces a regular media error that 
> *any* RAID level can detect and fix.

OK, I'll need to go back to our tests performed months ago, but my
feeling is now that we didn't it right in the first place. Will take
some time to retest that.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-01 Thread Marc Bevand
Mattias Pantzare  gmail.com> writes:
> On Tue, Dec 30, 2008 at 11:30, Carsten Aulbert wrote:
> > [...]
> > where we wrote data to the RAID, powered the system down, pulled out one
> > disk, inserted it into another computer and changed the sector checksum
> > of a few sectors (using hdparm's utility makebadsector).
> 
> You are talking about diffrent types of errors. You tested errors that
> the disk can detect. That is not a problem on any RAID, that is what
> it is designed for.

Mattias pointed out to me in a private email I missed Carsten's mention of 
hdparm --make-bad-sector. Duh!

So Carsten: Mattias is right, you did not simulate a silent data corruption 
error. hdparm --make-bad-sector just introduces a regular media error that 
*any* RAID level can detect and fix.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread JZ
Happy new year!
Snowing here and my new year party was cancelled. Ok, let me do more boring IT 
stuff then.

Orvar, sorry I misunderstood you.
Please feel free to explore the limitations of hardware RAID, and hopefully one 
day you will come to a conclusion that -- it was invented for saving CPU juice 
from disk management in order to better fulfill the application needs, and that 
fundamental driver is weakening day by day.  

NetApp argued that with today's CPU power and server technologies, software 
RAID can be as efficient and even better if it is done right. And Datacore went 
beyond NetApp by enabling a software delivery to customers, instead of an 
integrated platform...

Anyway, if you are still into checking out HW RAID capabilities, I would 
suggest to do that in a categorized fashion. As you can see, there are many 
many RAID cards at very very different price points. It is not fair to make a 
statement that covers all of them. (and I can go to china tomorrow and burn any 
firmware into a RAID ASIC and challenge that statement...) Hence your request 
was a bit too difficult -- if you tell the list which HW RAID adapter you are 
focusing on, I am sure the list will knock that one off in no time.   ;-)
http://www.ciao.com/Sun_StorageTek_SAS_RAID_Host_Bus_Adapter__15537063

Best,
z, bored

  - Original Message - 
  From: Tim 
  To: Miles Nordin 
  Cc: zfs-discuss@opensolaris.org 
  Sent: Wednesday, December 31, 2008 3:20 PM
  Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?





  On Wed, Dec 31, 2008 at 12:58 PM, Miles Nordin  wrote:

>>>>> "db" == Dave Brown  writes:

   db> CRC/Checksum Error Detection In SANmelody and SANsymphony,
   db> enhanced error detection can be provided by enabling Cyclic
   db> Redundancy Check (CRC) [...] The CRC bits may
   db> be added to either Data Digest, Header Digest, or both.

Thanks for the plug, but that sounds like an iSCSI feature, between
storage controller and client, not between storage controller and
disk.  It sounds suspiciously like they're advertising something many
vendors do without bragging, but I'm not sure.  Anyway we're talking
about something different: writing to the disk in checksummed packets,
so the storage controller can tell when the disk has silently returned
bad data or another system has written to part of the disk, stuff like
that---checksums to protect data as time passes, not as it travels
through space.


  The CRC checking is at least standard on QLogic hardware HBA's.  I would 
imagine most vendors have it in their software stacks as well since it's part 
of the iSCSI standard.  It was more of a corner case for iSCSI to try to say 
"look, I'm as good as Fibre Channel" than anything else (IMO).  Although that 
opinion may very well be inaccurate :)  


  --Tim



--


  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread JZ
"The problem is that there is no such thing as "hardware RAID" there is
only "software RAID."  The "HW RAID" controllers are processors
running software and the features of the product are therefore limited by
the software developer and processor capabilities.  I goes without saying
that the processors are very limited, compared to the main system CPU
found on modern machines.  It also goes without saying that the software
(or firmware, if you prefer) is closed.  Good luck cracking that nut." --  
Richard

Yes, thx!
And beyond that, there are HW RAID adapters and HW RAID chips embedded into 
disk enclosures, they are all HW RAID ASICs with closed software, not very 
Open Storage.   ;-)

best,
z



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Tim
On Wed, Dec 31, 2008 at 12:58 PM, Miles Nordin  wrote:

> > "db" == Dave Brown  writes:
>
>db> CRC/Checksum Error Detection In SANmelody and SANsymphony,
>db> enhanced error detection can be provided by enabling Cyclic
>db> Redundancy Check (CRC) [...] The CRC bits may
>db> be added to either Data Digest, Header Digest, or both.
>
> Thanks for the plug, but that sounds like an iSCSI feature, between
> storage controller and client, not between storage controller and
> disk.  It sounds suspiciously like they're advertising something many
> vendors do without bragging, but I'm not sure.  Anyway we're talking
> about something different: writing to the disk in checksummed packets,
> so the storage controller can tell when the disk has silently returned
> bad data or another system has written to part of the disk, stuff like
> that---checksums to protect data as time passes, not as it travels
> through space.
>

The CRC checking is at least standard on QLogic hardware HBA's.  I would
imagine most vendors have it in their software stacks as well since it's
part of the iSCSI standard.  It was more of a corner case for iSCSI to try
to say "look, I'm as good as Fibre Channel" than anything else (IMO).
Although that opinion may very well be inaccurate :)


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Miles Nordin
> "db" == Dave Brown  writes:

db> CRC/Checksum Error Detection In SANmelody and SANsymphony,
db> enhanced error detection can be provided by enabling Cyclic
db> Redundancy Check (CRC) [...] The CRC bits may
db> be added to either Data Digest, Header Digest, or both.

Thanks for the plug, but that sounds like an iSCSI feature, between
storage controller and client, not between storage controller and
disk.  It sounds suspiciously like they're advertising something many
vendors do without bragging, but I'm not sure.  Anyway we're talking
about something different: writing to the disk in checksummed packets,
so the storage controller can tell when the disk has silently returned
bad data or another system has written to part of the disk, stuff like
that---checksums to protect data as time passes, not as it travels
through space.


pgpSKNH303Ncw.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Miles Nordin
> "ca" == Carsten Aulbert  writes:
> "ok" == Orvar Korvar  writes:

ca> (using hdparm's utility makebadsector)

I haven't used that before, but it sounds like what you did may give
the RAID layer some extra information.  If one of the disks reports
``read error---I have no idea what's stored in that sector,'' then
RAID5/6 knows which disk is wrong because the disk confessed.  If all
the disks successfully return data, but one returns the wrong data,
RAID5/6 has to determine the wrong disk by math, not by device driver
error returns.

I don't think RAID6 reads whole stripes, so even if the dual parity
has some theoretical/implemented ability to heal single-disk silent
corruption, it'd do this healing only during some scrub-like
procedure, not during normal read.  The benefit is better seek
bandwidth than raidz.  If the corruption is not silent (the disk
returns an error) then it could use the hypothetical magical
single-disk healing ability during normal read too.

ca> powered it up and ran a volume check and the controller did
ca> indeed find the corrupted sector

sooo... (1) make your corrupt sector with dd rather than hdparm, like
dd if=/dev/zero of=/dev/disk bs=512 count=1 seek=12345 conv=notrunc,
and (2) check for the corrupt sector by reading the disk
normally---either make sure the corrupt sector is inside a checksummed
file like a tar or gz and use tar t or gzip -t, or use dd
if=/dev/raidvol | md5sum before and after corrupting, something like
that, NOT a ``volume check''.  Make both 1, 2 changes and I think the
corruption will get through.  Make only the first change but not the
second, and you can look for this hypothetical math-based healing
ability you're saying RAID6 has from having more parity than it needs
for the situation.

ok> "upon writing data to disc, ZFS reads it back and compares to
ok> the data in RAM and corrects it otherwise".

I don't think it does read-after-write.  That'd be really slow.

The thing I don't like about the checksums is that they trigger for
things other than bad disks, like if your machine loses power during a
resilver, or other corner cases and bugs.  I think the Netapp
block-level RAID-layer checksums don't trigger for as many other
reasons as the ZFS filesystem-level checksums, so chasing problems is
easier.

The good thing is that they are probably helping survive the corner
cases and bugs, too.


pgpn4n8gOCDNs.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Dave Brown
There is a company (DataCore Software) that has been making / shipping 
products for many years that I believe would help in this area.  I've 
used them before, they're very solid and have been leveraging the use of 
commodity server and disk hardware to build massive storage arrays (FC & 
iSCSI), one of the same things ZFS is working to do.  I looked at some 
of the documentation for this topic of discussion and this is what I found:

CRC/Checksum Error Detection
In SANmelody and SANsymphony, enhanced error detection can be provided 
by enabling Cyclic Redundancy Check (CRC), a form of sophisticated 
redundancy check. When CRC/Checksum is enabled, the iSCSI driver adds a 
bit scheme to the iSCSI packet when it is transmitted. The iSCSI driver 
then verifies the bits in the packet when it is received to ensure data 
integrity. This error detection method provides a low probability of 
undetected errors compared to standard error checking performed by 
TCP/IP. The CRC bits may be added to either Data Digest, Header Digest, 
or both.

DataCore has been really good at implementing all the features of the 
'high end' arrays for the 'low end' price point.

Dave


Richard Elling wrote:
> Orvar Korvar wrote:
>   
>> Ive studied all links here. But I want information of the HW raid 
>> controllers. Not about ZFS, because I have plenty of ZFS information now. 
>> The closest thing I got was
>> www.baarf.org
>>   
>> 
>
> [one of my favorite sites ;-)]
> The problem is that there is no such thing as "hardware RAID" there is
> only "software RAID."  The "HW RAID" controllers are processors
> running software and the features of the product are therefore limited by
> the software developer and processor capabilities.  I goes without saying
> that the processors are very limited, compared to the main system CPU
> found on modern machines.  It also goes without saying that the software
> (or firmware, if you prefer) is closed.  Good luck cracking that nut.
>
>   
>> Where in one article he states that "raid5 never does parity check on 
>> reads". Ive wrote that to the Linux guys. And also "raid6 guesses when it 
>> tries to repair some errors with a chance of corrupting more". Thats hard 
>> facts. 
>>   
>> 
>
> The high-end RAID arrays have better, more expensive processors and
> a larger feature set. Some even add block-level checksumming, which has
> led to some fascinating studies on field failures.  But I think it is 
> safe to
> assume that those features will not exist on the low-end systems for some
> time.
>  -- richard
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>   
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Richard Elling
Orvar Korvar wrote:
> Ive studied all links here. But I want information of the HW raid 
> controllers. Not about ZFS, because I have plenty of ZFS information now. The 
> closest thing I got was
> www.baarf.org
>   

[one of my favorite sites ;-)]
The problem is that there is no such thing as "hardware RAID" there is
only "software RAID."  The "HW RAID" controllers are processors
running software and the features of the product are therefore limited by
the software developer and processor capabilities.  I goes without saying
that the processors are very limited, compared to the main system CPU
found on modern machines.  It also goes without saying that the software
(or firmware, if you prefer) is closed.  Good luck cracking that nut.

> Where in one article he states that "raid5 never does parity check on reads". 
> Ive wrote that to the Linux guys. And also "raid6 guesses when it tries to 
> repair some errors with a chance of corrupting more". Thats hard facts. 
>   

The high-end RAID arrays have better, more expensive processors and
a larger feature set. Some even add block-level checksumming, which has
led to some fascinating studies on field failures.  But I think it is 
safe to
assume that those features will not exist on the low-end systems for some
time.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Orvar Korvar
Ive studied all links here. But I want information of the HW raid controllers. 
Not about ZFS, because I have plenty of ZFS information now. The closest thing 
I got was
www.baarf.org
Where in one article he states that "raid5 never does parity check on reads". 
Ive wrote that to the Linux guys. And also "raid6 guesses when it tries to 
repair some errors with a chance of corrupting more". Thats hard facts. 

Anymore?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-31 Thread Marc Bevand
Mattias Pantzare  gmail.com> writes:
> 
> He was talking about errors that the disk can't detect (errors
> introduced by other parts of the system, writes to the wrong sector or
> very bad luck). You can simulate that by writing diffrent data to the
> sector,

Well yes you can. Carsten and I are both talking about silent data corruption 
errors, and the way to simulate them is to do what Carsten did. However I 
pointed out that he may have tested only easy corruption cases (affecting the 
P or Q parity only) -- it is tricky to simulate hard-to-recover corruption 
errors...

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread JZ
Orvar, did you see my post on consistency check and data integrity?
It does not matter what HW RAID has, the point is what HW RAID does not 
have...

Please, for the respect for Bill, please study, here are more.

THE LAST WORD IN FILE SYSTEMS
http://www.sun.com/software/solaris/zfs_lc_preso.pdf

Best,
z

- Original Message - 
From: "Orvar Korvar" 
To: 
Sent: Tuesday, December 30, 2008 8:21 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Que? So what can we deduce about HW raid? There are some controller cards 
> that do background concistency checks? And error detection of various 
> kind?
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Orvar Korvar
Que? So what can we deduce about HW raid? There are some controller cards that 
do background concistency checks? And error detection of various kind?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Mattias Pantzare
On Tue, Dec 30, 2008 at 11:30, Carsten Aulbert
 wrote:
> Hi Marc,
>
> Marc Bevand wrote:
>> Carsten Aulbert  aei.mpg.de> writes:
>>> In RAID6 you have redundant parity, thus the controller can find out
>>> if the parity was correct or not. At least I think that to be true
>>> for Areca controllers :)
>>
>> Are you sure about that ? The latest research I know of [1] says that
>> although an algorithm does exist to theoretically recover from
>> single-disk corruption in the case of RAID-6, it is *not* possible to
>> detect dual-disk corruption with 100% certainty. And blindly running
>> the said algorithm in such a case would even introduce corruption on a
>> third disk.
>>
>
> Well, I probably need to wade through the paper (and recall Galois field
> theory) before answering this. We did a few tests in a 16 disk RAID6
> where we wrote data to the RAID, powered the system down, pulled out one
> disk, inserted it into another computer and changed the sector checksum
> of a few sectors (using hdparm's utility makebadsector). The we
> reinserted this into the original box, powered it up and ran a volume
> check and the controller did indeed find the corrupted sector and
> repaired the correct one without destroying data on another disk (as far
> as we know and tested).

You are talking about diffrent types of errors. You tested errors that
the disk can detect. That is not a problem on any RAID, that is what
it is designed for.

He was talking about errors that the disk can't detect (errors
introduced by other parts of the system, writes to the wrong sector or
very bad luck). You can simulate that by writing diffrent data to the
sector,
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Marc Bevand
Carsten Aulbert  aei.mpg.de> writes:
>
> Well, I probably need to wade through the paper (and recall Galois field
> theory) before answering this. We did a few tests in a 16 disk RAID6
> where we wrote data to the RAID, powered the system down, pulled out one
> disk, inserted it into another computer and changed the sector checksum
> of a few sectors (using hdparm's utility makebadsector). The we
> reinserted this into the original box, powered it up and ran a volume
> check and the controller did indeed find the corrupted sector and
> repaired the correct one without destroying data on another disk (as far
> as we know and tested).

Note that there are cases of single-disk corruption that are trivially
recoverable (for example if the corruption affects the P or Q parity 
block, as opposed to the data blocks). Maybe that's what you
inadvertently tested ? Overwrite a number of contiguous sectors to
span 3 stripes on a single disk to be sure to correctly stress-test
the self-healing mechanism.

> For the other point: dual-disk corruption can (to my understanding)
> never be healed by the controller since there is no redundant
> information available to check against. I don't recall if we performed
> some tests on that part as well, but maybe we should do that to learn
> how the controller will behave. As a matter of fact at that point it
> should just start crying out loud and tell me, that it cannot recover
> for that. 

The paper explains that the best RAID-6 can do is use probabilistic 
methods to distinguish between single and dual-disk corruption, eg. 
"there are 95% chances it is single-disk corruption so I am going to
fix it assuming that, but there are 5% chances I am going to actually
corrupt more data, I just can't tell". I wouldn't want to rely on a
RAID controller that takes gambles :-)

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Carsten Aulbert
Hi Marc,

Marc Bevand wrote:
> Carsten Aulbert  aei.mpg.de> writes:
>> In RAID6 you have redundant parity, thus the controller can find out
>> if the parity was correct or not. At least I think that to be true
>> for Areca controllers :)
> 
> Are you sure about that ? The latest research I know of [1] says that 
> although an algorithm does exist to theoretically recover from
> single-disk corruption in the case of RAID-6, it is *not* possible to
> detect dual-disk corruption with 100% certainty. And blindly running
> the said algorithm in such a case would even introduce corruption on a
> third disk.
>

Well, I probably need to wade through the paper (and recall Galois field
theory) before answering this. We did a few tests in a 16 disk RAID6
where we wrote data to the RAID, powered the system down, pulled out one
disk, inserted it into another computer and changed the sector checksum
of a few sectors (using hdparm's utility makebadsector). The we
reinserted this into the original box, powered it up and ran a volume
check and the controller did indeed find the corrupted sector and
repaired the correct one without destroying data on another disk (as far
as we know and tested).

For the other point: dual-disk corruption can (to my understanding)
never be healed by the controller since there is no redundant
information available to check against. I don't recall if we performed
some tests on that part as well, but maybe we should do that to learn
how the controller will behave. As a matter of fact at that point it
should just start crying out loud and tell me, that it cannot recover
for that. But the chance of this happening should be relatively small
unless the backplane/controller had a bad hiccup when writing that stripe.

> This is the reason why, AFAIK, no RAID-6 implementation actually
> attempts to recover from single-disk corruption (someone correct me if
> I am wrong).
> 

As I said I know that our Areca 1261ML does detect and correct those
errors - if these are single-disk corruptions

> The exception is ZFS of course, but it accomplishes single and
> dual-disk corruption self-healing by using its own checksum, which is
> one layer above RAID-6 (therefore unrelated to it).

Yes, very helpful and definitely desirable to have :)
> 
> [1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

Thanks for the pointer

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Marc Bevand
Carsten Aulbert  aei.mpg.de> writes:
> 
> In RAID6 you have redundant parity, thus the controller can find out
> if the parity was correct or not. At least I think that to be true
> for Areca controllers :)

Are you sure about that ? The latest research I know of [1] says that 
although an algorithm does exist to theoretically recover from
single-disk corruption in the case of RAID-6, it is *not* possible to
detect dual-disk corruption with 100% certainty. And blindly running
the said algorithm in such a case would even introduce corruption on a
third disk.

This is the reason why, AFAIK, no RAID-6 implementation actually
attempts to recover from single-disk corruption (someone correct me if
I am wrong).

The exception is ZFS of course, but it accomplishes single and
dual-disk corruption self-healing by using its own checksum, which is
one layer above RAID-6 (therefore unrelated to it).

[1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-29 Thread Vincent Fox
To answer original post, simple answer:

Almost all old RAID designs have holes in their logic where they are 
insufficiently paranoid on the writes or read, and sometimes both.  One example 
is the infamous RAID-5 write hole.

Look at simple example of mirrored SVM versus ZFS in page 15&16  of this 
presentation:

http://opensolaris.org/os/community/zfs/docs/zfs_last.pdf

Critical metadata is triple duped, and all metadata is at least double-duped on 
even a single disk configuration.  Almost all other filesystems are kludges 
with insufficient paranoia by default, and only become sufficiently paranoid by 
twiddling knobs & adding things like EMC did.   After using ZFS for a while 
there is no other filesystem as good.   I haven't played with Linux BTRFS 
though maybe it has some good stuff but last I heard was still in alpha.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
BTW, the following text from another discussion may be helpful towards your 
concerns.
What to use for RAID is not a fixed answer, but using ZFS can be a good 
thing for many cases and reasons, such as the price/performance concern as 
Bob highlighted.

And note Bob said "client OSs". To me, that should read "host OSs", since 
again, I am an enterprise guy, and my ideal way of using ZFS may differ from 
most folks today.

To me, I would take ZFS for SAN-based virtualization and as a file/IP-block 
services gateway to applications (and file services to clients is one of the 
"enterprise applications" by me.) For example, I would then use different 
implementations for CIFS and NFS serving, not using the ZFS native NAS 
support to clients, but the ZFS storage pooling and SAM-FS management 
features.
(I would use ZFS in a 6920 fashion, if you don't know what I am talking 
about --
http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1245572,00.html)

Sorry, I don't want to lead the discussion into file systems and NFV, but to 
me, ZFS is very close to WAFL design point, and the file system involvements 
in RAID and PiT and HSM/ILM and application/data security/protection and 
HA/BC functions are vital.
:-)
z

___

I do agree that when multiple client OSs are involved it is still useful if 
storage looks like a
legacy disk drive.  Luckly Solaris already offers iSCSI in Solaris 10
and OpenSolaris is now able to offer high performance fiber channel
target and fiber channel over ethernet layers on top of reliable ZFS.
The full benefit of ZFS is not provided, but the storage is
successfully divorced from the client with a higher degree of data
reliability and performance than is available from current firmware
based RAID arrays.

Bob
==
Bob Friesenhahn



- Original Message - 
From: "JZ" 
To: "Orvar Korvar" ; 

Sent: Sunday, December 28, 2008 7:55 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> The hyper links didn't work, here are the urls --
>
> http://queue.acm.org/detail.cfm?id=1317400
>
> http://www.sun.com/bigadmin/features/articles/zfs_part1.scalable.jsp#integrity
>
> >>> This message posted from opensolaris.org
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
The hyper links didn't work, here are the urls --

http://queue.acm.org/detail.cfm?id=1317400

http://www.sun.com/bigadmin/features/articles/zfs_part1.scalable.jsp#integrity


- Original Message - 
From: "JZ" 
To: "Orvar Korvar" ; 

Sent: Sunday, December 28, 2008 7:50 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Nice discussion. Let my chip in my old timer view --
>
> Until a few years ago, the understanding of "HW RAID doesn't proactively
> check for consistency of data vs. parity unless required" was true.   But
> LSI had added background consistency check (auto starts 5 mins after the
> drive is created) on its RAID cards.  Since Sun is primarily selling LSI 
> HW
> RAID cards, I guess at that high level, both HW RAID and ZFS provides some
> proactive consistency/integrity assurance.
>
> HOWEVER, I really think the ZFS way is much more advanced (PiT integrated)
> and can be used with other ZFS ECC/EDC features with memory-based data
> consistency/inegrity assurance, to achieve an overall considerably better
> data availability and business continuity.   I guess I just like the
> "enterprise flavor" as such.   ;-)
>
>
>
>
> Below are some tech details. -- again, please, do not compare HW RAID with
> ZFS at features level. RAID was invented for both data protection and
> performance, and there are different ways to do those with ZFS, resulting 
> in
> very different solution architectures (according to the customer segments,
> and sometimes it could be beneficial to use HW RAID, e.g. when hetero HW
> RAID disks are deployed in a unified fashion and ZFS does not handle the
> enterprise-wide data protection..).
>
>
> ZFS does automatic error correction even when using a single hard drive,
> including by using end-to-end checksumming, separating the checksum from 
> the
> file, and using copy-on-write redundancy so it is always both verifying 
> the
> data and creating another copy (not overwriting) when writing a change to 
> a
> file.
> Sun Distinguished Engineer Bill Moore developed ZFS:
>
>  ... one of the design principles we set for ZFS was: never, ever trust 
> the
> underlying hardware. As soon as an application generates data, we generate 
> a
> checksum for the data while we're still in the same fault domain where the
> application generated the data, running on the same CPU and the same 
> memory
> subsystem. Then we store the data and the checksum separately on disk so
> that a single failure cannot take them both out.
>
>  When we read the data back, we validate it against that checksum and see
> if it's indeed what we think we wrote out before. If it's not, we employ 
> all
> sorts of recovery mechanisms. Because of that, we can, on very cheap
> hardware, provide more reliable storage than you could get with the most
> reliable external storage. It doesn't matter how perfect your storage is, 
> if
> the data gets corrupted in flight - and we've actually seen many customer
> cases where this happens - then nothing you can do can recover from that.
> With ZFS, on the other hand, we can actually authenticate that we got the
> right answer back and, if not, enact a bunch of recovery scenarios. That's
> data integrity."
>
> See more details about ZFS Data Integrity and Security.
>
>
> Best,
> z
>
>
>
> - Original Message - 
> From: "Orvar Korvar" 
> To: 
> Sent: Sunday, December 28, 2008 4:16 PM
> Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?
>
>
>> This is good information guys. Do we have some more facts and links about
>> HW raid and it's data integrity, or lack of?
>> -- 
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
Nice discussion. Let my chip in my old timer view --

Until a few years ago, the understanding of "HW RAID doesn't proactively 
check for consistency of data vs. parity unless required" was true.   But 
LSI had added background consistency check (auto starts 5 mins after the 
drive is created) on its RAID cards.  Since Sun is primarily selling LSI HW 
RAID cards, I guess at that high level, both HW RAID and ZFS provides some 
proactive consistency/integrity assurance.

HOWEVER, I really think the ZFS way is much more advanced (PiT integrated) 
and can be used with other ZFS ECC/EDC features with memory-based data 
consistency/inegrity assurance, to achieve an overall considerably better 
data availability and business continuity.   I guess I just like the 
"enterprise flavor" as such.   ;-)




Below are some tech details. -- again, please, do not compare HW RAID with 
ZFS at features level. RAID was invented for both data protection and 
performance, and there are different ways to do those with ZFS, resulting in 
very different solution architectures (according to the customer segments, 
and sometimes it could be beneficial to use HW RAID, e.g. when hetero HW 
RAID disks are deployed in a unified fashion and ZFS does not handle the 
enterprise-wide data protection..).


ZFS does automatic error correction even when using a single hard drive, 
including by using end-to-end checksumming, separating the checksum from the 
file, and using copy-on-write redundancy so it is always both verifying the 
data and creating another copy (not overwriting) when writing a change to a 
file.
Sun Distinguished Engineer Bill Moore developed ZFS:

  ... one of the design principles we set for ZFS was: never, ever trust the 
underlying hardware. As soon as an application generates data, we generate a 
checksum for the data while we're still in the same fault domain where the 
application generated the data, running on the same CPU and the same memory 
subsystem. Then we store the data and the checksum separately on disk so 
that a single failure cannot take them both out.

  When we read the data back, we validate it against that checksum and see 
if it's indeed what we think we wrote out before. If it's not, we employ all 
sorts of recovery mechanisms. Because of that, we can, on very cheap 
hardware, provide more reliable storage than you could get with the most 
reliable external storage. It doesn't matter how perfect your storage is, if 
the data gets corrupted in flight - and we've actually seen many customer 
cases where this happens - then nothing you can do can recover from that. 
With ZFS, on the other hand, we can actually authenticate that we got the 
right answer back and, if not, enact a bunch of recovery scenarios. That's 
data integrity."

See more details about ZFS Data Integrity and Security.


Best,
z



- Original Message - 
From: "Orvar Korvar" 
To: 
Sent: Sunday, December 28, 2008 4:16 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> This is good information guys. Do we have some more facts and links about 
> HW raid and it's data integrity, or lack of?
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Orvar Korvar
This is good information guys. Do we have some more facts and links about HW 
raid and it's data integrity, or lack of?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi Bob,

Bob Friesenhahn wrote:

>> AFAIK this is not done during the normal operation (unless a disk asked
>> for a sector cannot get this sector).
> 
> ZFS checksum validates all returned data.  Are you saying that this fact
> is incorrect?
> 

No sorry, too long in front of a computer today I guess: I was referring
to hardware RAID controllers, AFAIK these usually do not check the
validity of data unless a disc returns an error. My knowledge regarding
ZFS is exactly that, that data is checked in the CPU against the stored
checksum.

>> That's exactly what volume checking for standard HW controllers does as
>> well. Read all data and compare it with parity.
> 
> What if the data was corrupted prior to parity generation?
> 

Well, that is bad luck, same is true if your ZFS box has faulty memory
and the computed checksum is right for the data on disk, but wrong in
the sense of the file under consideration.

Sorry for the confusion

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Bob Friesenhahn
On Sun, 28 Dec 2008, Carsten Aulbert wrote:
>> ZFS does check the data correctness (at the CPU) for each read while
>> HW raid depends on the hardware detecting a problem, and even if the
>> data is ok when read from disk, it may be corrupted by the time it
>> makes it to the CPU.
>
> AFAIK this is not done during the normal operation (unless a disk asked
> for a sector cannot get this sector).

ZFS checksum validates all returned data.  Are you saying that this 
fact is incorrect?

> That's exactly what volume checking for standard HW controllers does as
> well. Read all data and compare it with parity.

What if the data was corrupted prior to parity generation?

> This is exactly the point why RAID6 should always be chosen over RAID5,
> because in the event of a wrong parity check and RAID5 the controller
> can only say, oops, I have found a problem but cannot correct it - since
> it does not know if the parity is correct or any of the n data bits. In
> RAID6 you have redundant parity, thus the controller can find out if the
> parity was correct or not. At least I think that to be true for Areca
> controllers :)

Good point.  Luckily, ZFS's raidz does not have this problem since it 
is able to tell if the "corrected" data is actually correct (within 
the checksum computation's margin for error). If applying parity does 
not result in the correct checksum, then it knows that the data is 
toast.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi all,

Bob Friesenhahn wrote:
> My understanding is that ordinary HW raid does not check data 
> correctness.  If the hardware reports failure to successfully read a 
> block, then a simple algorithm is used to (hopefully) re-create the 
> lost data based on data from other disks.  The difference here is that 
> ZFS does check the data correctness (at the CPU) for each read while 
> HW raid depends on the hardware detecting a problem, and even if the 
> data is ok when read from disk, it may be corrupted by the time it 
> makes it to the CPU.

AFAIK this is not done during the normal operation (unless a disk asked
for a sector cannot get this sector).

> 
> ZFS's scrub algorithm forces all of the written data to be read, with 
> validation against the stored checksum.  If a problem is found, then 
> an attempt to correct is made from redundant storage using traditional 
> RAID methods.

That's exactly what volume checking for standard HW controllers does as
well. Read all data and compare it with parity.

This is exactly the point why RAID6 should always be chosen over RAID5,
because in the event of a wrong parity check and RAID5 the controller
can only say, oops, I have found a problem but cannot correct it - since
it does not know if the parity is correct or any of the n data bits. In
RAID6 you have redundant parity, thus the controller can find out if the
parity was correct or not. At least I think that to be true for Areca
controllers :)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Bob Friesenhahn
On Sun, 28 Dec 2008, Orvar Korvar wrote:

> On a Linux forum, Ive spoken about ZFS end to end data integrity. I 
> wrote things as "upon writing data to disc, ZFS reads it back and 
> compares to the data in RAM and corrects it otherwise". I also wrote 
> that ordinary HW raid doesnt do this check. After a heated 
> discussion, I now start to wonder if I this is correct? Am I wrong?

You are somewhat wrong.  When ZFS writes the data, it also stores a 
checksum for the data.  When the data is read, it is checksummed again 
and the checksum is verified against the stored checksum.  It is not 
possible to compare with data in RAM since usually the RAM memory is 
too small to cache the entire disk, and it would not survive reboots.

> So, do ordinary HW raid check data correctness? The Linux guys wants 
> to now this. For instance, Adaptec's HW raid controllers doesnt do a 
> check? Anyone knows more on this?

My understanding is that ordinary HW raid does not check data 
correctness.  If the hardware reports failure to successfully read a 
block, then a simple algorithm is used to (hopefully) re-create the 
lost data based on data from other disks.  The difference here is that 
ZFS does check the data correctness (at the CPU) for each read while 
HW raid depends on the hardware detecting a problem, and even if the 
data is ok when read from disk, it may be corrupted by the time it 
makes it to the CPU.

ZFS's scrub algorithm forces all of the written data to be read, with 
validation against the stored checksum.  If a problem is found, then 
an attempt to correct is made from redundant storage using traditional 
RAID methods.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss