Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
BTW, the following text from another discussion may be helpful towards your 
concerns.
What to use for RAID is not a fixed answer, but using ZFS can be a good 
thing for many cases and reasons, such as the price/performance concern as 
Bob highlighted.

And note Bob said "client OSs". To me, that should read "host OSs", since 
again, I am an enterprise guy, and my ideal way of using ZFS may differ from 
most folks today.

To me, I would take ZFS for SAN-based virtualization and as a file/IP-block 
services gateway to applications (and file services to clients is one of the 
"enterprise applications" by me.) For example, I would then use different 
implementations for CIFS and NFS serving, not using the ZFS native NAS 
support to clients, but the ZFS storage pooling and SAM-FS management 
features.
(I would use ZFS in a 6920 fashion, if you don't know what I am talking 
about --
http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1245572,00.html)

Sorry, I don't want to lead the discussion into file systems and NFV, but to 
me, ZFS is very close to WAFL design point, and the file system involvements 
in RAID and PiT and HSM/ILM and application/data security/protection and 
HA/BC functions are vital.
:-)
z

___

I do agree that when multiple client OSs are involved it is still useful if 
storage looks like a
legacy disk drive.  Luckly Solaris already offers iSCSI in Solaris 10
and OpenSolaris is now able to offer high performance fiber channel
target and fiber channel over ethernet layers on top of reliable ZFS.
The full benefit of ZFS is not provided, but the storage is
successfully divorced from the client with a higher degree of data
reliability and performance than is available from current firmware
based RAID arrays.

Bob
==
Bob Friesenhahn



- Original Message - 
From: "JZ" 
To: "Orvar Korvar" ; 

Sent: Sunday, December 28, 2008 7:55 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> The hyper links didn't work, here are the urls --
>
> http://queue.acm.org/detail.cfm?id=1317400
>
> http://www.sun.com/bigadmin/features/articles/zfs_part1.scalable.jsp#integrity
>
> >>> This message posted from opensolaris.org
>>> ___
>>> zfs-discuss mailing list
>>> zfs-discuss@opensolaris.org
>>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
The hyper links didn't work, here are the urls --

http://queue.acm.org/detail.cfm?id=1317400

http://www.sun.com/bigadmin/features/articles/zfs_part1.scalable.jsp#integrity


- Original Message - 
From: "JZ" 
To: "Orvar Korvar" ; 

Sent: Sunday, December 28, 2008 7:50 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> Nice discussion. Let my chip in my old timer view --
>
> Until a few years ago, the understanding of "HW RAID doesn't proactively
> check for consistency of data vs. parity unless required" was true.   But
> LSI had added background consistency check (auto starts 5 mins after the
> drive is created) on its RAID cards.  Since Sun is primarily selling LSI 
> HW
> RAID cards, I guess at that high level, both HW RAID and ZFS provides some
> proactive consistency/integrity assurance.
>
> HOWEVER, I really think the ZFS way is much more advanced (PiT integrated)
> and can be used with other ZFS ECC/EDC features with memory-based data
> consistency/inegrity assurance, to achieve an overall considerably better
> data availability and business continuity.   I guess I just like the
> "enterprise flavor" as such.   ;-)
>
>
>
>
> Below are some tech details. -- again, please, do not compare HW RAID with
> ZFS at features level. RAID was invented for both data protection and
> performance, and there are different ways to do those with ZFS, resulting 
> in
> very different solution architectures (according to the customer segments,
> and sometimes it could be beneficial to use HW RAID, e.g. when hetero HW
> RAID disks are deployed in a unified fashion and ZFS does not handle the
> enterprise-wide data protection..).
>
>
> ZFS does automatic error correction even when using a single hard drive,
> including by using end-to-end checksumming, separating the checksum from 
> the
> file, and using copy-on-write redundancy so it is always both verifying 
> the
> data and creating another copy (not overwriting) when writing a change to 
> a
> file.
> Sun Distinguished Engineer Bill Moore developed ZFS:
>
>  ... one of the design principles we set for ZFS was: never, ever trust 
> the
> underlying hardware. As soon as an application generates data, we generate 
> a
> checksum for the data while we're still in the same fault domain where the
> application generated the data, running on the same CPU and the same 
> memory
> subsystem. Then we store the data and the checksum separately on disk so
> that a single failure cannot take them both out.
>
>  When we read the data back, we validate it against that checksum and see
> if it's indeed what we think we wrote out before. If it's not, we employ 
> all
> sorts of recovery mechanisms. Because of that, we can, on very cheap
> hardware, provide more reliable storage than you could get with the most
> reliable external storage. It doesn't matter how perfect your storage is, 
> if
> the data gets corrupted in flight - and we've actually seen many customer
> cases where this happens - then nothing you can do can recover from that.
> With ZFS, on the other hand, we can actually authenticate that we got the
> right answer back and, if not, enact a bunch of recovery scenarios. That's
> data integrity."
>
> See more details about ZFS Data Integrity and Security.
>
>
> Best,
> z
>
>
>
> - Original Message - 
> From: "Orvar Korvar" 
> To: 
> Sent: Sunday, December 28, 2008 4:16 PM
> Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?
>
>
>> This is good information guys. Do we have some more facts and links about
>> HW raid and it's data integrity, or lack of?
>> -- 
>> This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread JZ
Nice discussion. Let my chip in my old timer view --

Until a few years ago, the understanding of "HW RAID doesn't proactively 
check for consistency of data vs. parity unless required" was true.   But 
LSI had added background consistency check (auto starts 5 mins after the 
drive is created) on its RAID cards.  Since Sun is primarily selling LSI HW 
RAID cards, I guess at that high level, both HW RAID and ZFS provides some 
proactive consistency/integrity assurance.

HOWEVER, I really think the ZFS way is much more advanced (PiT integrated) 
and can be used with other ZFS ECC/EDC features with memory-based data 
consistency/inegrity assurance, to achieve an overall considerably better 
data availability and business continuity.   I guess I just like the 
"enterprise flavor" as such.   ;-)




Below are some tech details. -- again, please, do not compare HW RAID with 
ZFS at features level. RAID was invented for both data protection and 
performance, and there are different ways to do those with ZFS, resulting in 
very different solution architectures (according to the customer segments, 
and sometimes it could be beneficial to use HW RAID, e.g. when hetero HW 
RAID disks are deployed in a unified fashion and ZFS does not handle the 
enterprise-wide data protection..).


ZFS does automatic error correction even when using a single hard drive, 
including by using end-to-end checksumming, separating the checksum from the 
file, and using copy-on-write redundancy so it is always both verifying the 
data and creating another copy (not overwriting) when writing a change to a 
file.
Sun Distinguished Engineer Bill Moore developed ZFS:

  ... one of the design principles we set for ZFS was: never, ever trust the 
underlying hardware. As soon as an application generates data, we generate a 
checksum for the data while we're still in the same fault domain where the 
application generated the data, running on the same CPU and the same memory 
subsystem. Then we store the data and the checksum separately on disk so 
that a single failure cannot take them both out.

  When we read the data back, we validate it against that checksum and see 
if it's indeed what we think we wrote out before. If it's not, we employ all 
sorts of recovery mechanisms. Because of that, we can, on very cheap 
hardware, provide more reliable storage than you could get with the most 
reliable external storage. It doesn't matter how perfect your storage is, if 
the data gets corrupted in flight - and we've actually seen many customer 
cases where this happens - then nothing you can do can recover from that. 
With ZFS, on the other hand, we can actually authenticate that we got the 
right answer back and, if not, enact a bunch of recovery scenarios. That's 
data integrity."

See more details about ZFS Data Integrity and Security.


Best,
z



- Original Message - 
From: "Orvar Korvar" 
To: 
Sent: Sunday, December 28, 2008 4:16 PM
Subject: Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?


> This is good information guys. Do we have some more facts and links about 
> HW raid and it's data integrity, or lack of?
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] separate home "partition"?

2008-12-28 Thread scott
thanks for the input. since i have no interest in multibooting (virtualbox will 
suit my needs), i created a 10gb partition on my 500gb drive for opensolaris 
and reserved the rest for files (130gb worth).

after installing the os and fdisking the rest of the space to solaris2, i 
created a zpool called DOCUMENTS (good tips with the upper case), which i then 
mounted to Documents in my home folder.

the logic is, if i have to reinstall, i just export DOCUMENTS and re-import it 
into the reinstalled os (or import -f in a worst-case scenario).

after having done all the setup, i partitioned drive 2 using identical cylinder 
locs and mirrored each into their respective pools (rpool and DOCUMENTS). 
replacing drive 1 with 2 and starting back up, everything boots fine and i see 
all my data, so it worked.

obviously i'm a noob, and yet even i find my own method a little suspicious. i 
look at the disk usage analyzer and see that / is 100% used. while i'm sure 
that this is in some kind of "virtual" sense, it leaves me with a feeling that 
i've done a goofy thing.

comments about this last concern are greatly appreciated!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot manager service dependency error

2008-12-28 Thread Robert Bauer
I got this in the file system-filesystem-zfs-auto-snapshot:daily.log:
...
[ Dez 28 23:13:44 Enabled. ]
[ Dez 28 23:13:53 Executing start method ("/lib/svc/method/zfs-auto-snapshot 
start"). ]
Checking for non-recursive missed // snapshots  rpool
Checking for recursive missed // snapshots home rpool/firefox rpool/ROOT
Last snapshot for svc:/system/filesystem/zfs/auto-snapshot:daily taken on So 
Dez 14  0:36 2008
which was greater than the 1 days schedule. Taking snapshot now.
cannot create snapshot 
'home/export/home/s...@zfs-auto-snap:daily-2008-12-28-23:13': dataset is busy
no snapshots were created
Error: Unable to take recursive snapshots of 
h...@zfs-auto-snap:daily-2008-12-28-23:13.
Moving service svc:/system/filesystem/zfs/auto-snapshot:daily to maintenance 
mode.
cannot create snapshot 
'rpool/ROOT/opensola...@zfs-auto-snap:daily-2008-12-28-23:13': dataset is busy
no snapshots were created
Error: Unable to take recursive snapshots of 
rpool/r...@zfs-auto-snap:daily-2008-12-28-23:13.
Moving service svc:/system/filesystem/zfs/auto-snapshot:daily to maintenance 
mode.
[ Dez 28 23:13:59 Method "start" exited with status 0. ]
[ Dez 28 23:13:59 Stopping for maintenance due to administrative_request. ]
[ Dez 28 23:13:59 Executing stop method ("/lib/svc/method/zfs-auto-snapshot 
stop"). ]
[ Dez 28 23:13:59 Method "stop" exited with status 0. ]
[ Dez 28 23:13:59 Stopping for maintenance due to administrative_request. ]
[ Dez 28 23:13:59 Stopping for maintenance due to administrative_request. ]
[ Dez 28 23:13:59 Stopping for maintenance due to administrative_request. ]

What does it mean: "dataset is busy"? How is this possible?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Upgrade from UFS -> ZFS on a single disk?

2008-12-28 Thread Josh Rivel
So we have roughly 700 OpenSolaris snv_81 boxes out in the field.  We're 
looking to upgrade them all to probable OpenSolaris 11.08 or the latest snv_10x 
build soon.  Currently all boxes have a single 80gb HD (these are small 
appliance type devices, so we can't add a second hard drive).  What we'd like 
to do is figure out a way via liveupgrade to upgrade to a newer SXCE release 
(or OpenSolaris 11.08) AND migrate the filesystems from UFS to ZFS at the same 
time.

The reason for wanting to go to ZFS is for the filesystem improvements and 
resiliency, as a lot of these boxes get power cycled regularly..

Here is the current partition table:

/dev/dsk/c0d0s0 /  8gb
/dev/dsk/c0d0s1 swap 1.5gb
/dev/dsk/c0d0s3 /var 4gb
/dev/dsk/c0d0s4 /backup 2gb
/dev/dsk/c0d0s5 /luroot 8gb
/dev/dsk/c0d0s6 /luvar 4gb
/dev/dsk/c0d0s7 /export 45gb

/backup, /luroot and /luvar are not in use.
/export contains all the zones (each box has 3 non-global zones on it)

I was thinking about removing the /backup, /luroot, and /luvar partitions and 
using that to create a ZFS rpool on it, then installing the new OS onto that, 
but /export needs to stay with the zones on it.

An alternative option is to just re-install the boxes from scratch, and devise 
a way to auto-configure them when they boot back up (they are all remotely 
located and there's no console access to any of them, so that's a bit tricky)

Anyway, just wanted to toss the idea out, see if anyone has done a UFS -> ZFS 
in place upgrade while doing a live upgrade (or post live upgrade is also an 
option of course, so long as it can be scripted)

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Degraded zpool without any kind of alert

2008-12-28 Thread Matt Harrison
Bob Friesenhahn wrote:
> On Sun, 28 Dec 2008, Robert Bauer wrote:
> 
>> It would be nice if gnome could notify me automatically when one of 
>> my zpools are degraded or if any kind of ZFS error occurs.
> 
> Yes.  It is a weird failing of Solaris to have an advanced fault 
> detection system without a useful reporting mechanism.
> 
>> I would also accept if it could be possible at least to send 
>> automatically a mail when a zpool is degraded.
> 
> This the script (run as root via crontab) I use to have an email sent 
> to 'root' if a fault is detected.  It has already reported a fault:
> 
> #!/bin/sh
> REPORT=/tmp/faultreport.txt
> SYSTEM=$1
> rm -f $REPORT
> /usr/sbin/fmadm faulty 2>&1 > $REPORT
> if test -s $REPORT
> then
>/usr/ucb/Mail -s "$SYSTEM Fault Alert" root < $REPORT
> fi
> rm -f $REPORT

I do much the same thing, although I had to fiddle it a bit to exclude a 
certain report type. A while ago, a server here started to send out 
errors from:

Fault class : defect.sunos.eft.undiagnosable_problem

The guys on the fault-discuss list have been unable to enlighten me as 
to the problem, and I'm ashamed to say I have not taken any steps. 
Except to replace the entire machine, I have no idea what to try.

I just wanted to note that although the fault detection is very good, it 
isn't always possible to work out what the fault really is.

Matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Degraded zpool without any kind of alert

2008-12-28 Thread Bob Friesenhahn
On Sun, 28 Dec 2008, Robert Bauer wrote:

> It would be nice if gnome could notify me automatically when one of 
> my zpools are degraded or if any kind of ZFS error occurs.

Yes.  It is a weird failing of Solaris to have an advanced fault 
detection system without a useful reporting mechanism.

> I would also accept if it could be possible at least to send 
> automatically a mail when a zpool is degraded.

This the script (run as root via crontab) I use to have an email sent 
to 'root' if a fault is detected.  It has already reported a fault:

#!/bin/sh
REPORT=/tmp/faultreport.txt
SYSTEM=$1
rm -f $REPORT
/usr/sbin/fmadm faulty 2>&1 > $REPORT
if test -s $REPORT
then
   /usr/ucb/Mail -s "$SYSTEM Fault Alert" root < $REPORT
fi
rm -f $REPORT

==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Degraded zpool without any kind of alert

2008-12-28 Thread Robert Bauer
I just saw by luck that one of my zpool is degraded!:

$ zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
home   97,5G   773M  96,7G 0%  ONLINE  -
rpool  10,6G  7,78G  2,85G73%  DEGRADED  -

It would be nice if gnome could notify me automatically when one of my zpools 
are degraded or if any kind of ZFS error occurs.
It should be possible. I know that in Ubuntu they have included a mechanism to 
create manually notify messages on the desktop, like the "update notification" 
messages.

I would also accept if it could be possible at least to send automatically a 
mail when a zpool is degraded.

Could it be possible to implement for the next release a ZFS monitoring or 
alert mechanism?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Orvar Korvar
This is good information guys. Do we have some more facts and links about HW 
raid and it's data integrity, or lack of?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using zfs mirror as a simple backup mechanism for time-slider.

2008-12-28 Thread Tim
On Sat, Dec 27, 2008 at 3:24 PM, Miles Nordin  wrote:

> > "t" == Tim   writes:
>
> t> couldn't you simply do a detach before removing the disk, and
> t> do a re-attach everytime you wanted to re-mirror?
>
> no, for two reasons.  First, when you detach a disk, ZFS writes
> something to the disk that makes it unrecoverable.  The simple-UI
> wallpaper blocks your access to the detached disk, so you have no
> redundancy while detached.  In this thread is a workaround to disable
> the checks (AIUI they're explaining a more fundamental problem with a
> multi-vdev pool because you can't detach one-mirror-half of each vdev
> at exactly the same instant, but multi-vdev is not part of Niall's
> case):
>
>  http://opensolaris.org/jive/thread.jspa?threadID=58780


Gotcha, that's more than a bit ridiculous.  If I detach a disk, I guess I'd
expect to have to clear metadata if that's what I wanted, rather than it
automatically doing so.  I guess I almost feel there should either be a
secondary command, or some flags added for just such situations as this.
Personally I'd much rather have attach/detach commands than having to do a
zfs send.  Perhaps I'm alone in that feeling though.


>
> second, when you attach rather than online/clear/notice-its-back, ZFS
> will treat the newly-attached disk as empty and will resilver
> everything, not just your changes.  It's the difference between taking
> 5 minutes and taking all night.  and you don't have redundancy until
> the resilver finishes.
>

Odd, my experience was definitely not the same.  When I re-attached, it did
not sync the entire disk.  Good to know that the expected behavior is
different than what I saw.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi Bob,

Bob Friesenhahn wrote:

>> AFAIK this is not done during the normal operation (unless a disk asked
>> for a sector cannot get this sector).
> 
> ZFS checksum validates all returned data.  Are you saying that this fact
> is incorrect?
> 

No sorry, too long in front of a computer today I guess: I was referring
to hardware RAID controllers, AFAIK these usually do not check the
validity of data unless a disc returns an error. My knowledge regarding
ZFS is exactly that, that data is checked in the CPU against the stored
checksum.

>> That's exactly what volume checking for standard HW controllers does as
>> well. Read all data and compare it with parity.
> 
> What if the data was corrupted prior to parity generation?
> 

Well, that is bad luck, same is true if your ZFS box has faulty memory
and the computed checksum is right for the data on disk, but wrong in
the sense of the file under consideration.

Sorry for the confusion

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Bob Friesenhahn
On Sun, 28 Dec 2008, Carsten Aulbert wrote:
>> ZFS does check the data correctness (at the CPU) for each read while
>> HW raid depends on the hardware detecting a problem, and even if the
>> data is ok when read from disk, it may be corrupted by the time it
>> makes it to the CPU.
>
> AFAIK this is not done during the normal operation (unless a disk asked
> for a sector cannot get this sector).

ZFS checksum validates all returned data.  Are you saying that this 
fact is incorrect?

> That's exactly what volume checking for standard HW controllers does as
> well. Read all data and compare it with parity.

What if the data was corrupted prior to parity generation?

> This is exactly the point why RAID6 should always be chosen over RAID5,
> because in the event of a wrong parity check and RAID5 the controller
> can only say, oops, I have found a problem but cannot correct it - since
> it does not know if the parity is correct or any of the n data bits. In
> RAID6 you have redundant parity, thus the controller can find out if the
> parity was correct or not. At least I think that to be true for Areca
> controllers :)

Good point.  Luckily, ZFS's raidz does not have this problem since it 
is able to tell if the "corrected" data is actually correct (within 
the checksum computation's margin for error). If applying parity does 
not result in the correct checksum, then it knows that the data is 
toast.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi all,

Bob Friesenhahn wrote:
> My understanding is that ordinary HW raid does not check data 
> correctness.  If the hardware reports failure to successfully read a 
> block, then a simple algorithm is used to (hopefully) re-create the 
> lost data based on data from other disks.  The difference here is that 
> ZFS does check the data correctness (at the CPU) for each read while 
> HW raid depends on the hardware detecting a problem, and even if the 
> data is ok when read from disk, it may be corrupted by the time it 
> makes it to the CPU.

AFAIK this is not done during the normal operation (unless a disk asked
for a sector cannot get this sector).

> 
> ZFS's scrub algorithm forces all of the written data to be read, with 
> validation against the stored checksum.  If a problem is found, then 
> an attempt to correct is made from redundant storage using traditional 
> RAID methods.

That's exactly what volume checking for standard HW controllers does as
well. Read all data and compare it with parity.

This is exactly the point why RAID6 should always be chosen over RAID5,
because in the event of a wrong parity check and RAID5 the controller
can only say, oops, I have found a problem but cannot correct it - since
it does not know if the parity is correct or any of the n data bits. In
RAID6 you have redundant parity, thus the controller can find out if the
parity was correct or not. At least I think that to be true for Areca
controllers :)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Bob Friesenhahn
On Sun, 28 Dec 2008, Orvar Korvar wrote:

> On a Linux forum, Ive spoken about ZFS end to end data integrity. I 
> wrote things as "upon writing data to disc, ZFS reads it back and 
> compares to the data in RAM and corrects it otherwise". I also wrote 
> that ordinary HW raid doesnt do this check. After a heated 
> discussion, I now start to wonder if I this is correct? Am I wrong?

You are somewhat wrong.  When ZFS writes the data, it also stores a 
checksum for the data.  When the data is read, it is checksummed again 
and the checksum is verified against the stored checksum.  It is not 
possible to compare with data in RAM since usually the RAM memory is 
too small to cache the entire disk, and it would not survive reboots.

> So, do ordinary HW raid check data correctness? The Linux guys wants 
> to now this. For instance, Adaptec's HW raid controllers doesnt do a 
> check? Anyone knows more on this?

My understanding is that ordinary HW raid does not check data 
correctness.  If the hardware reports failure to successfully read a 
block, then a simple algorithm is used to (hopefully) re-create the 
lost data based on data from other disks.  The difference here is that 
ZFS does check the data correctness (at the CPU) for each read while 
HW raid depends on the hardware detecting a problem, and even if the 
data is ok when read from disk, it may be corrupted by the time it 
makes it to the CPU.

ZFS's scrub algorithm forces all of the written data to be read, with 
validation against the stored checksum.  If a problem is found, then 
an attempt to correct is made from redundant storage using traditional 
RAID methods.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs mount hangs

2008-12-28 Thread Magnus Bergman
Hi,

System: Netra 1405, 4x450Mhz, 4GB RAM and 2x146GB (root pool) and  
2x146GB (space pool). snv_98.

After a panic the system hangs on boot and manual attempts to mount  
(at least) one dataset in single user mode, hangs.

The Panic:

Dec 27 04:42:11 base ^Mpanic[cpu0]/thread=300021c1a20:
Dec 27 04:42:11 base unix: [ID 521688 kern.notice] [AFT1] errID  
0x00167f73.1c737868 UE Error(s)
Dec 27 04:42:11 base See previous message(s) for details
Dec 27 04:42:11 base unix: [ID 10 kern.notice]
Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 02a10433efc0  
SUNW,UltraSPARC-II:cpu_aflt_log+5b4 (3, 2a10433f208, 2a10433f2e0, 10,  
2a10433f207, 2a10433f208)
Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:  
02a10433f0cb 000f 012ccc00 012cd000
Dec 27 04:42:11 base   %l4-7: 02a10433f208 0170  
012ccc00 0001
Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 02a10433f210  
SUNW,UltraSPARC-II:cpu_async_error+cdc (7fe0, 0, 18020, 40, 0,  
a0b7ff60)
Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:  
 0180c000  02a10433f3d4
Dec 27 04:42:11 base   %l4-7: 012cc400 7e60  
012cc400 0001
Dec 27 04:42:11 base genunix: [ID 723222 kern.notice] 02a10433f410  
unix:ktl0+48 (2a10433fec0, 2a10433ff80, 180e580, 6, 180c000, 180)
Dec 27 04:42:11 base genunix: [ID 179002 kern.notice]   %l0-3:  
0002 1400 80001601 012c1578
Dec 27 04:42:11 base   %l4-7: 000ae394c629 060017a32260  
000b 02a10433f4c0
Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 02a10433f560  
unix:resume+240 (300021c1a20, 180c000, 1835c40, 6001c1f20c8, 16,  
30001e4cc40)
Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:  
  0180048279c0 02a1035dbca0
Dec 27 04:42:12 base   %l4-7: 0001 01867800  
25be86dc 018bbc00
Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 02a10433f610  
genunix:cv_wait+3c (3001365ba10, 3001365ba10, 1, 18d0c00, c44000, 0)
Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:  
00c44002 018d0e58 0001 00c44002
Dec 27 04:42:12 base   %l4-7:  0001  
0002 01326e5c
Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 02a10433f6c0  
zfs:zio_wait+30 (3001365b778, 6001cdcf7e8, 3001365ba18, 3001365ba10,  
30034dc1f48, 1)
Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:  
06001cdcf7f0  0100 fc00
Dec 27 04:42:12 base   %l4-7: 018d7000 0c6eefd9  
0c6eefd8 0c6eefd8
Dec 27 04:42:12 base genunix: [ID 723222 kern.notice] 02a10433f770  
zfs:zil_commit_writer+2d0 (6001583be00, 4b0, 1b1a4d54, 42a03, cfc67, 0)
Dec 27 04:42:12 base genunix: [ID 179002 kern.notice]   %l0-3:  
060018b5d068  060010ce1040 06001583be88
Dec 27 04:42:12 base   %l4-7: 060013760380 00c0  
03002bf81138 03001365b778
Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 02a10433f820  
zfs:zil_commit+68 (6001583be00, 1b1a5ae5, 38bc5, 6001583be7c,  
1b1a5ae5, 0)
Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:  
0001 0001 0600177fe080 06001c1f2ad8
Dec 27 04:42:13 base   %l4-7: 01c0 0001  
060010c78000 
Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 02a10433f8d0  
zfs:zfs_fsync+f8 (18e5800, 0, 134fc00, 3001c2c4860, 134fc00, 134fc00)
Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:  
03001e94d948 0001 0180c008 0008
Dec 27 04:42:13 base   %l4-7: 060013760458   
0134fc00 018d2000
Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 02a10433f980  
genunix:fop_fsync+40 (300131ed600, 10, 60011c08b68, 0, 60010c77200,  
30028320b40)
Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:  
060011f6c828 0007 06001c1f20c8 013409d8
Dec 27 04:42:13 base   %l4-7:  0001  
 018bcc00
Dec 27 04:42:13 base genunix: [ID 723222 kern.notice] 02a10433fa30  
genunix:fdsync+40 (7, 10, 0, 184, 10, 30007adda40)
Dec 27 04:42:13 base genunix: [ID 179002 kern.notice]   %l0-3:  
 f071 f071 f071
Dec 27 04:42:13 base   %l4-7: 0001 0180c000  
 
Dec 27 04:42:14 base unix: [ID 10 kern.notice]
Dec 27 04:42:14 base genunix: [ID 672855 kern.notice] syncing file  
systems...
Dec 27 04:42:14 base genunix: [ID 904073 kern.notice]  done
Dec 27 04:42:15 base SUNW,UltraSPARC-II: [ID 201454 kern.warning]  
WARNING: [AFT1] Uncorrectable M

[zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Orvar Korvar
On a Linux forum, Ive spoken about ZFS end to end data integrity. I wrote 
things as "upon writing data to disc, ZFS reads it back and compares to the 
data in RAM and corrects it otherwise". I also wrote that ordinary HW raid 
doesnt do this check. After a heated discussion, I now start to wonder if I 
this is correct? Am I wrong?

So, do ordinary HW raid check data correctness? The Linux guys wants to now 
this. For instance, Adaptec's HW raid controllers doesnt do a check? Anyone 
knows more on this?

Several Linux guys wants to try out ZFS now. :o)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using zfs mirror as a simple backup mechanism for time-slider.

2008-12-28 Thread Kees Nuyt
On Sun, 28 Dec 2008 15:27:00 +0100, dick hoogendijk
 wrote:

>On Sat, 27 Dec 2008 14:29:58 PST
>Ross  wrote:
>
>> All of which sound like good reasons to use send/receive and a 2nd
>> zfs pool instead of mirroring.
>> 
>> Send/receive has the advantage that the receiving filesystem is
>> guaranteed to be in a stable state.
>
>Can send/receive be used on a multiuser running server system? 

Yes.

>Will this slowdown the services on the server much?

Only if the server is busy, that is, has no idle CPU and I/O
capacity. It may help to "nice" the send process.

Sending a complete pool can be a considerable load for a
considerable time; an incremental send of a snapshot with
few changes relative to the previous one will be fast.

>Can the zfs receiving "end" be transformed into a normal file.bz2
>or has it always have to be a zfs system as a result?

Send streams are version dependent, it is advised to receive
it immediately.

If the receiving zfs pool uses a file as its block device,
you could export the pool and bzip that file.
-- 
  (  Kees Nuyt
  )
c[_]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using zfs mirror as a simple backup mechanism for time-slider.

2008-12-28 Thread Volker A. Brandt
> > Send/receive has the advantage that the receiving filesystem is
> > guaranteed to be in a stable state.
> 
> Can send/receive be used on a multiuser running server system?

Yes.

> Will
> this slowdown the services on the server much?

"Depends".  On a modern box with good disk layout it shouldn't.

> Can the zfs receiving "end" be transformed into a normal file.bz2

Yes.  However, you have to carefully match the sending and receiving
ZFS versions, not all versions can read all streams.  If you delay
receiving the stream, it can happen that you won't be able to
unpack it any more.


Regards -- Volker
-- 

Volker A. Brandt  Consulting and Support for Sun Solaris
Brandt & Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: v...@bb-c.de
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 45
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using zfs mirror as a simple backup mechanism for time-slider.

2008-12-28 Thread dick hoogendijk
On Sat, 27 Dec 2008 14:29:58 PST
Ross  wrote:

> All of which sound like good reasons to use send/receive and a 2nd
> zfs pool instead of mirroring.
> 
> Send/receive has the advantage that the receiving filesystem is
> guaranteed to be in a stable state.

Can send/receive be used on a multiuser running server system? Will
this slowdown the services on the server much?

Can the zfs receiving "end" be transformed into a normal file.bz2 or
has it always have to be a zfs system as a result?

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv104 ++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss