Re: [zfs-discuss] bit-flipping in RAM...

2010-04-03 Thread Orvar Korvar
Have not the ZFS data corruption researchers been in touch with Jeff Bonwick 
and the ZFS team?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Daniel Carosone
On Thu, Apr 01, 2010 at 12:38:29AM +0100, Robert Milkowski wrote:
> So I wasn't saying that it can work or that it can work in all  
> circumstances but rather I was trying to say that it probably shouldn't  
> be dismissed on a performance argument alone as for some use cases 

It would be of great utility even if considered only as a diagnostic
measure - ie, for qualifying tests or when something else raises
suspicion and you want to eliminate/confirm sources of problems. 

With a suitable pointer in a FAQ/troubleshooting guide, it could
reduce the number / improve the quality of problem reports related to
bad h/w. 

--
Dan.


pgp2jYRc6bDBB.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Xin LI
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 2010/03/31 05:13, Darren J Moffat wrote:
> On 31/03/2010 10:27, Erik Trimble wrote:
>> Orvar's post over in opensol-discuss has me thinking:
>>
>> After reading the paper and looking at design docs, I'm wondering if
>> there is some facility to allow for comparing data in the ARC to it's
>> corresponding checksum. That is, if I've got the data I want in the ARC,
>> how can I be sure it's correct (and free of hardware memory errors)? I'd
>> assume the way is to also store absolutely all the checksums for all
>> blocks/metadatas being read/written in the ARC (which, of course, means
>> that only so much RAM corruption can be compensated for), and do a
>> validation when that every time that block is used/written from the ARC.
>> You'd likely have to do constant metadata consistency checking, and
>> likely have to hold multiple copies of metadata in-ARC to compensate for
>> possible corruption. I'm assuming that this has at least been explored,
>> right?
> 
> A subset of this is already done. The ARC keeps its own in memory
> checksum (because some buffers in the ARC are not yet on stable storage
> so don't have a block pointer checksum yet).
> 
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c
> 
> 
> arc_buf_freeze()
> arc_buf_thaw()
> arc_cksum_verify()
> arc_cksum_compute()
> 
> It isn't done on every access but it can detect in memory corruption -
> I've seen it happen on several occasions but all due to errors in my
> code not bad physical memory.
> 
> Doing in more frequently could cause a significant performance problem.

Agreed.

I think it's probably not a very good idea to check it everywhere.  It
would be great if we can do some checks occasionally especially for
critical data structures, but, if it's the memory we can not trust, how
can we trust that the checksum checker to behave correctly?

I had some questions about the FAST paper mentioned by Erik, which was
not answered during the conference which makes me feel that the paper,
while pointed out some interesting issues, but failed to prove it being
a real world problem:

 - How much probability a bit flipping can happen on a non-ECC system?
say, how much bits would be flipped per terabytes processed, or
transactions or something?
 - Among these flipped bits, how much would happen on a file system
buffer?  What happens when, say, the application's memory hit a flipped
bit, and when the file system itself have no problem with its buffer?
 - How much performance penalty would be if we check the checksums every
time the data is being accessed?  How good will the check be compared to
an ECC in terms of correctness?

Cheers,
- -- 
Xin LI http://www.delphij.net/
FreeBSD - The Power to Serve!  Live free or die
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (FreeBSD)

iQEcBAEBAgAGBQJLs+UZAAoJEATO+BI/yjfBfE0H/0+iG/pgrs/JNId814g5JMki
eZ2tJx2Lf7+DIlrHczvcwyWAtAke7ojUMeNEw6HIqMfTQHVcgMk2XNdxWZn0sJsy
PUPj9Qcg+nkHcewAoWvG0VUZN0fSBX1OtJcVG78Kt5drWmT+g5jiMH+BFCEAiISJ
Kcfswp9r0JbYmI010fwqugc74bAZnMhUXMCvvplJZUE3iaDCq499TanKIVmKu4vq
JsDNYXZT9Nqbb20DB4TKluauP1QVUJnBAeqfQCYZ/+CqK5+phnUgzyaBTiMKBHd0
Q0l1bvGEvjLRarlGk7/702Udu7HC4UKs09pKtBIb+cw8CmyYaZ8Vuth0Ri0drzM=
=S5WS
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Robert Milkowski

On 31/03/2010 16:44, Bob Friesenhahn wrote:

On Wed, 31 Mar 2010, Robert Milkowski wrote:


or there might be an extra zpool level (or system wide) property to 
enable checking checksums onevery access from ARC - there will be a 
siginificatn performance impact but then it might be acceptable for 
really paranoid folks especially with modern hardware.


How would this checking take place for memory mapped files?



Well, and it wouldn't help if data were corrupted in an application 
internal buffer after read() succeeded, or just before an application 
does a write().


So I wasn't saying that it can work or that it can work in all 
circumstances but rather I was trying to say that it probably shouldn't 
be dismissed on a performance argument alone as for some use cases with 
modern HW it might well be that the performance will still be acceptable 
while providing still better protection and data correctness guarantee.


But even then while mmap() issue is probably solvable the read() and 
write() cases are probably not.


--
Robert Milkowski
http://milek.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Bob Friesenhahn

On Wed, 31 Mar 2010, Robert Milkowski wrote:


or there might be an extra zpool level (or system wide) property to enable 
checking checksums onevery access from ARC - there will be a siginificatn 
performance impact but then it might be acceptable for really paranoid folks 
especially with modern hardware.


How would this checking take place for memory mapped files?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Robert Milkowski



On 31/03/2010 10:27, Erik Trimble wrote:

Orvar's post over in opensol-discuss has me thinking:

After reading the paper and looking at design docs, I'm wondering if
there is some facility to allow for comparing data in the ARC to it's
corresponding checksum. That is, if I've got the data I want in the ARC,
how can I be sure it's correct (and free of hardware memory errors)? I'd
assume the way is to also store absolutely all the checksums for all
blocks/metadatas being read/written in the ARC (which, of course, means
that only so much RAM corruption can be compensated for), and do a
validation when that every time that block is used/written from the ARC.
You'd likely have to do constant metadata consistency checking, and
likely have to hold multiple copies of metadata in-ARC to compensate for
possible corruption. I'm assuming that this has at least been explored,
right?


A subset of this is already done. The ARC keeps its own in memory 
checksum (because some buffers in the ARC are not yet on stable 
storage so don't have a block pointer checksum yet).


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c 



arc_buf_freeze()
arc_buf_thaw()
arc_cksum_verify()
arc_cksum_compute()

It isn't done on every access but it can detect in memory corruption - 
I've seen it happen on several occasions but all due to errors in my 
code not bad physical memory.


Doing in more frequently could cause a significant performance problem.



or there might be an extra zpool level (or system wide) property to 
enable checking checksums onevery access from ARC - there will be a 
siginificatn performance impact but then it might be acceptable for 
really paranoid folks especially with modern hardware.


--
Robert Milkowski
http://milek.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Darren J Moffat



On 31/03/2010 10:27, Erik Trimble wrote:

Orvar's post over in opensol-discuss has me thinking:

After reading the paper and looking at design docs, I'm wondering if
there is some facility to allow for comparing data in the ARC to it's
corresponding checksum. That is, if I've got the data I want in the ARC,
how can I be sure it's correct (and free of hardware memory errors)? I'd
assume the way is to also store absolutely all the checksums for all
blocks/metadatas being read/written in the ARC (which, of course, means
that only so much RAM corruption can be compensated for), and do a
validation when that every time that block is used/written from the ARC.
You'd likely have to do constant metadata consistency checking, and
likely have to hold multiple copies of metadata in-ARC to compensate for
possible corruption. I'm assuming that this has at least been explored,
right?


A subset of this is already done. The ARC keeps its own in memory 
checksum (because some buffers in the ARC are not yet on stable storage 
so don't have a block pointer checksum yet).


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c

arc_buf_freeze()
arc_buf_thaw()
arc_cksum_verify()
arc_cksum_compute()

It isn't done on every access but it can detect in memory corruption - 
I've seen it happen on several occasions but all due to errors in my 
code not bad physical memory.


Doing in more frequently could cause a significant performance problem.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Zhu Han
The ECC enabled RAM should be very cheap quickly if the industry embraces it
in every computer. :-)

best regards,
hanzhu


On Wed, Mar 31, 2010 at 5:46 PM, Erik Trimble wrote:

> casper@sun.com wrote:
>
>>
>>
>>> I'm not saying that ZFS should consider doing this - doing a validation
>>> for in-memory data is non-trivially expensive in performance terms, and
>>> there's only so much you can do and still expect your machine to survive.  I
>>> mean, I've used the old NonStop stuff, and yes, you can shoot them with a
>>> .45 and it likely will still run, but wacking them with a bazooka still is
>>> guarantied to make them, well, Non-NonStop.
>>>
>>>
>>
>> If we scrub the memory anyway, why not include the check of the ZFS
>> checksums which are already in memory?
>>
>> OTOH, zfs gets a lot of mileage out of cheap hardware and we know what the
>> limitations are when you don't use ECC; the industry must start to require
>> that all chipsets support ECC.
>>
>> Caspe
>>
> Reading the paper was interesting, as it highlighted all the places where
> ZFS "skips" validation.  There's a lot of places. In many ways, fixing this
> would likely make ZFS similar to AppleTalk whose notorious performance
> (relative to Ethernet) was caused by what many called the "Are You Sure?"
> design.  Double and Triple checking absolutely everything has it's costs.
>
> And, yes, we really should just force computer manufacturers to use ECC in
> more places (not just RAM) - as densities and data volumes increase, we are
> more likely to see errors, and without proper hardware checking, we're
> really going out on a limb here to be able to trust what the hardware says.
> And, let's face it - hardware error correction is /so/ much faster than
> doing it in software.
>
>
>
>
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-123
> Phone:  x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Erik Trimble

casper@sun.com wrote:
  
I'm not saying that ZFS should consider doing this - doing a validation 
for in-memory data is non-trivially expensive in performance terms, and 
there's only so much you can do and still expect your machine to 
survive.  I mean, I've used the old NonStop stuff, and yes, you can 
shoot them with a .45 and it likely will still run, but wacking them 
with a bazooka still is guarantied to make them, well, Non-NonStop.



If we scrub the memory anyway, why not include the check of the ZFS 
checksums which are already in memory?


OTOH, zfs gets a lot of mileage out of cheap hardware and we know what the 
limitations are when you don't use ECC; the industry must start to require 
that all chipsets support ECC.


Caspe
Reading the paper was interesting, as it highlighted all the places 
where ZFS "skips" validation.  There's a lot of places. In many ways, 
fixing this would likely make ZFS similar to AppleTalk whose notorious 
performance (relative to Ethernet) was caused by what many called the 
"Are You Sure?" design.  Double and Triple checking absolutely 
everything has it's costs.


And, yes, we really should just force computer manufacturers to use ECC 
in more places (not just RAM) - as densities and data volumes increase, 
we are more likely to see errors, and without proper hardware checking, 
we're really going out on a limb here to be able to trust what the 
hardware says. And, let's face it - hardware error correction is /so/ 
much faster than doing it in software.






--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Casper . Dik


>I'm not saying that ZFS should consider doing this - doing a validation 
>for in-memory data is non-trivially expensive in performance terms, and 
>there's only so much you can do and still expect your machine to 
>survive.  I mean, I've used the old NonStop stuff, and yes, you can 
>shoot them with a .45 and it likely will still run, but wacking them 
>with a bazooka still is guarantied to make them, well, Non-NonStop.

If we scrub the memory anyway, why not include the check of the ZFS 
checksums which are already in memory?

OTOH, zfs gets a lot of mileage out of cheap hardware and we know what the 
limitations are when you don't use ECC; the industry must start to require 
that all chipsets support ECC.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] bit-flipping in RAM...

2010-03-31 Thread Erik Trimble

Orvar's post over in opensol-discuss has me thinking:

After reading the paper and looking at design docs, I'm wondering if 
there is some facility to allow for comparing data in the ARC to it's 
corresponding checksum.  That is, if I've got the data I want in the 
ARC, how can I be sure it's correct (and free of hardware memory 
errors)?  I'd assume the way is to also store absolutely all the 
checksums for all blocks/metadatas being read/written in the ARC (which, 
of course, means that only so much RAM corruption can be compensated 
for), and do a validation when that every time that block is 
used/written from the ARC.  You'd likely have to do constant metadata 
consistency checking, and likely have to hold multiple copies of 
metadata in-ARC to compensate for possible corruption.  I'm assuming 
that this has at least been explored, right?


(the researchers used non-ECC RAM, so honestly, I think it's a bit 
unrealistic to expect that your car will win the Indy 500 if you put a 
Yugo engine in it) - normally, this problem is exactly what you have 
hardware ECC and memory scrubbing for at the hardware level.


I'm not saying that ZFS should consider doing this - doing a validation 
for in-memory data is non-trivially expensive in performance terms, and 
there's only so much you can do and still expect your machine to 
survive.  I mean, I've used the old NonStop stuff, and yes, you can 
shoot them with a .45 and it likely will still run, but wacking them 
with a bazooka still is guarantied to make them, well, Non-NonStop.


-Erik





 Original Message 
Subject:Re: [osol-discuss] Any news about 2010.3?
Date:   Wed, 31 Mar 2010 01:06:45 PDT
From:   Orvar Korvar 
To: opensolaris-disc...@opensolaris.org



If you value your data, you should reconsider. But if your data is not 
important, then skip ZFS.

File system data corruption test by researcher:
http://blogs.zdnet.com/storage/?p=169

ZFS data corruption test by researchers:
http://www.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
--
This message posted from opensolaris.org
___
opensolaris-discuss mailing list
opensolaris-disc...@opensolaris.org


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss