[zfs-discuss] Is my 3-way mirror completely lost?

2008-07-29 Thread Emiel van de Laar
Hello list,

My ZFS pool has found it's way into a bad state after a period of
neglect and now I'm having trouble recovering. The pool is a three-way
mirror of which 2 disks started showing errors and thus the pool
was degraded.

I shut down the system and started at the lowest level by using ES Tool
(Samsung) to do a diagnostic. Sure enough the 2 disks were showing bad
sectors. After a low level format I attempted to reintroduce these disks
back into the mirror. However, when resilvering the system would
hang/freeze at about 50% and I needed to reset the system.

My next attempt was to just leave the single good disk in the system
(detach mirrors) and attempt a scrub. Again the system hangs at 50%.

Final attempt was to just try and copy the data to a new pool using
a 'cp -R'. Ran great for some time but the copy did not complete. It
hung just like resilver and scrub.

The good disk still comes through the Samsung (full) diagnostic with
no issues found.

I'm not sure what to do next. Is my final pool completely lost?

I'll try other hardware (power supply, memory) next...

FYI: I'm using OpenSolaris Nevada (76 I believe) but have also tried
the OpenSolaris 2008.05 Live CD.

Regards,

 - Emiel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Justin Vassallo
 Actually, my ideal setup would be:

Shuttle XPC w/ 2x PCI-e x8 or x16 lanes
2x PCI-e eSATA cards (each with 4 eSATA port multiplier ports)

Mike, may I ask which eSATA controllers you used? I searched the Solaris HCL
and found very few listed there

Thanks
justin


smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread mike
I didn't use any.

That would be my -ideal- setup :)

I waited and waited, and still no eSATA/Port Multiplier support out there, or 
isn't stable enough. So I scrapped it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Steve
I agree with mike503. If you create the awareness (of the instability of 
recorded information) there is a large potential market waiting for a ZFS/NAS 
little server!

Very nice the thin client idea. It will be good to also use the NAS server as a 
full server and use remotely with a very thin client! (in this sense it can be 
larger ;-)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Errors in ZFS/NFSv4 ACL Documentation

2008-07-29 Thread Marc Bevand
I noticed some errors in ls(1), acl(5) and the ZFS Admin Guide about ZFS/NFSv4 
ACLs:

ls(1): read_acl (r)  Permission  to  read  the ACL of a file. The compact 
representation of read_acl is c, not r.

ls(1): -c | -vThe same as -l, and in addition displays the [...] The 
options are in fact -/ c or -/ v. 

ls(1): The display in verbose mode (/ v) uses full attribute [...].  This 
should read (-/ v).

acl(5): execute (X). The x should be lowercase: (x)

acl(5) does not document 3 ACEs: success access (S), failed access (F), 
inherited (I).

The ZFS Admin Guide does not document the same 3 ACEs.

The ZFS Admin Guide gives examples listing a compact representation of ACLs 
containing only 6 inheritance flags instead of 7. For example in the 
section Setting and Displaying ACLs on ZFS Files in Compact Format:
# ls -V file.1
-rw-r--r-- 1 root   root  206663 Feb 16 11:00 file.1
owner@:--x---:--:deny
  ^^
 7th position for flag 'I' missing

By the way, where can I find the latest version of the ls(1) manpage online ? 
I cannot find it, neither on src.opensolaris.org, nor in the manpage 
consolidation download center [1]. I'd like to check whether the errors I 
found in ls(1) are fixed before submitting a bug report.

[1] http://opensolaris.org/os/downloads/manpages/

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding slices from same drive to a pool

2008-07-29 Thread Darren J Moffat
Ian Collins wrote:
 I'd like to extend my ZFS root pool by adding the old swap and root slice 
 left over from the previous LU BE. 
 
 Are there any known issues with concatenating slices from the same drive? 

Having done this in the past (many builds ago) I found the performance 
wasn't good.  It is really bad if you tried to mirror between different 
slices of the same drive (use copies=2 instead).

If the old swap and root slices are after the ZFS root pool then you 
should be able to use format to delete them and add them onto the end of 
the slice the pool is in.  If they are before it then I think you are 
out of luck.

If they are before the ZFS root pool one possible thing you might be 
able to do would be to boot failsafe (which runs compeltely from RAM) 
then making sure the pool as NOT imported use some dd to shift the stuff 
(overlaping slices might help with this).  But I'd make sure you have a 
full and verified backup before trying that.  Note that I've not tried 
this myself but I might be tempted to give it a go on my OpenSolaris 
2008.05 (upgrade to snv_93) system since it has swap as a separate slice 
at the start of the disk.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread Ross Smith

A little more information today.  I had a feeling that ZFS would continue quite 
some time before giving an error, and today I've shown that you can carry on 
working with the filesystem for at least half an hour with the disk removed.
 
I suspect on a system with little load you could carry on working for several 
hours without any indication that there is a problem.  It looks to me like ZFS 
is caching reads  writes, and that provided requests can be fulfilled from the 
cache, it doesn't care whether the disk is present or not.
 
I would guess that ZFS is attempting to write to the disk in the background, 
and that this is silently failing.
 
Here's the log of the tests I did today.  After removing the drive, over a 
period of 30 minutes I copied folders to the filesystem, created an archive, 
set permissions, and checked properties.  I did this both in the command line 
and with the graphical file manager tool in Solaris.  Neither reported any 
errors, and all the data could be read  written fine.  Until the reboot, at 
which point all the data was lost, again without error.
 
If you're not interested in the detail, please skip to the end where I've got 
some thoughts on just how many problems there are here.
 
 
# zpool status test  pool: test state: ONLINE scrub: none requestedconfig:
NAMESTATE READ WRITE CKSUMtestONLINE   
0 0 0  c2t7d0ONLINE   0 0 0
errors: No known data errors# zfs list testNAME   USED  AVAIL  REFER  
MOUNTPOINTtest   243M   228G   242M  /test# zpool list testNAME   SIZE   USED  
AVAILCAP  HEALTH  ALTROOTtest   232G   243M   232G 0%  ONLINE  -
 
-- drive removed --
 
# cfgadm |grep sata1/7sata1/7sata-portempty
unconfigured ok
 
 
-- cfgadmin knows the drive is removed.  How come ZFS does not? --
 
# cp -r /rc-pool/copytest /test/copytest# zpool list testNAME  SIZE   USED  
AVAILCAP  HEALTH  ALTROOTtest  232G  73.4M   232G 0%  ONLINE  -# 
zfs list testNAME   USED  AVAIL  REFER  MOUNTPOINTtest   142K   228G18K  
/test
 
 
-- Yup, still up.  Let's start the clock --
 
# dateTue Jul 29 09:31:33 BST 2008# du -hs /test/copytest 667K /test/copytest
 
 
-- 5 minutes later, still going strong --
 
# dateTue Jul 29 09:36:30 BST 2008# zpool list testNAME  SIZE   USED  AVAIL 
   CAP  HEALTH  ALTROOTtest  232G  73.4M   232G 0%  ONLINE  -# cp -r 
/rc-pool/copytest /test/copytest2# ls /testcopytest   copytest2# du -h -s /test 
1.3M /test# zpool list testNAME   SIZE   USED  AVAILCAP  HEALTH  
ALTROOTtest   232G  73.4M   232G 0%  ONLINE  -# find /test | wc -l  
   2669# find //test/copytest | wc -l1334# find 
/rc-pool/copytest | wc -l1334# du -h -s /rc-pool/copytest 5.3M 
/rc-pool/copytest
 
 
-- Not sure why the original pool has 5.3MB of data when I use du. --
-- File Manager reports that they both have the same size --
 
 
-- 15 minutes later it's still working.  I can read data fine --
# dateTue Jul 29 09:43:04 BST 2008# chmod 777 /test/*# mkdir /rc-pool/test2# cp 
-r /test/copytest2 /rc-pool/test2/copytest2# find /rc-pool/test2/copytest2 | wc 
-l1334# zpool list testNAME  SIZE   USED  AVAILCAP  HEALTH  
ALTROOTtest  232G  73.4M   232G 0%  ONLINE  -
 
 
-- and yup, the drive is still offline --
 
# cfgadm | grep sata1/7sata1/7sata-portempty
unconfigured ok
-- And finally, after 30 minutes the pool is still going strong --
 
# dateTue Jul 29 09:59:56 BST 2008
# tar -cf /test/copytest.tar /test/copytest/*# ls -ltotal 3drwxrwxrwx   3 root  
   root   3 Jul 29 09:30 copytest-rwxrwxrwx   1 root root 
4626432 Jul 29 09:59 copytest.tardrwxrwxrwx   3 root root   3 Jul 
29 09:39 copytest2# zpool list testNAME   SIZE   USED  AVAILCAP  HEALTH  
ALTROOTtest   232G  73.4M   232G 0%  ONLINE  -
 
After a full 30 minutes there's no indication whatsoever of any problem.  
Checking properties of the folder in File Browser reports 2665 items, totalling 
9.0MB.
 
At this point I tried # zfs set sharesmb=on test.  I didn't really expect it 
to work, and sure enough, that command hung.  zpool status also hung, so I had 
to reboot the server.
 
 
-- Rebooted server --
 
 
Now I found that not only are all the files I've written in the last 30 minutes 
missing, but in fact files that I had deleted several minutes prior to removing 
the drive have re-appeared.
 
 
-- /test mount point is still present, I'll probably have to remove that 
manually --
 
 
# cd /# lsbin export  media   procsystemboot
homemnt rc-pool testdev kernel  net 
rc-usb  tmpdevices lib opt rootusretc 
lost+found  platformsbinvar
 
 
-- ZFS still has the pool mounted, but at least now it realises it's not 
working --
 
 
# zpool listNAME  SIZE   

Re: [zfs-discuss] Is my 3-way mirror completely lost?

2008-07-29 Thread Bob Friesenhahn
On Tue, 29 Jul 2008, Emiel van de Laar wrote:

 I'm not sure what to do next. Is my final pool completely lost?

It sounds like your good disk has some serious problems and that 
formatting the two disks with bad sectors was the wrong thing to do. 
You might have been able to recover using the two failing disks by 
removing the good disk which was causing the hang.

Since diagnostics on the good disk succeeded, there may still be 
some hope by using a low-level tool like 'dd' to transfer the 
underlying data to more reliable storage.  If there is a successful 
transfer, then you have something to work with.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Bob Friesenhahn
On Tue, 29 Jul 2008, Steve wrote:

 I agree with mike503. If you create the awareness (of the 
 instability of recorded information) there is a large potential 
 market waiting for a ZFS/NAS little server!

The big mistake in the posting was to assume that Sun should be in 
this market.  Sun has no experience in the consumer market and as 
far as I know, it has never tried to field a consumer product.

Anyone here is free to go into business selling ready-made NAS servers 
based on OpenSolaris.  Except for Adaptec SnapServer (which is 
pricey), almost all of the competition for small NAS servers is based 
on a special version of Microsoft Windows targeted for NAS service and 
which only offers CIFS.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl

2008-07-29 Thread Rob Clark
There may be some work being done to fix this:

zpool should support raidz of mirrors
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6485689

Discussed in this thread:
Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM )
http://opensolaris.org/jive/thread.jspa?threadID=15854tstart=0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Errors in ZFS/NFSv4 ACL Documentation

2008-07-29 Thread Cindy . Swearingen
Mark,

Thanks for your detailed review comments. I will check where the latest
man pages are online and get back to you.

In the meantime, I can file the bugs to get these issues fixed on your
behalf.

Thanks again,

Cindy

Marc Bevand wrote:
 I noticed some errors in ls(1), acl(5) and the ZFS Admin Guide about 
 ZFS/NFSv4 
 ACLs:
 
 ls(1): read_acl (r)  Permission  to  read  the ACL of a file. The compact 
 representation of read_acl is c, not r.
 
 ls(1): -c | -vThe same as -l, and in addition displays the [...] The 
 options are in fact -/ c or -/ v. 
 
 ls(1): The display in verbose mode (/ v) uses full attribute [...].  This 
 should read (-/ v).
 
 acl(5): execute (X). The x should be lowercase: (x)
 
 acl(5) does not document 3 ACEs: success access (S), failed access (F), 
 inherited (I).
 
 The ZFS Admin Guide does not document the same 3 ACEs.
 
 The ZFS Admin Guide gives examples listing a compact representation of ACLs 
 containing only 6 inheritance flags instead of 7. For example in the 
 section Setting and Displaying ACLs on ZFS Files in Compact Format:
 # ls -V file.1
 -rw-r--r-- 1 root   root  206663 Feb 16 11:00 file.1
 owner@:--x---:--:deny
   ^^
  7th position for flag 'I' missing
 
 By the way, where can I find the latest version of the ls(1) manpage online ? 
 I cannot find it, neither on src.opensolaris.org, nor in the manpage 
 consolidation download center [1]. I'd like to check whether the errors I 
 found in ls(1) are fixed before submitting a bug report.
 
 [1] http://opensolaris.org/os/downloads/manpages/
 
 -marc
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Steve
waynel wrote:
 
 We have a couple of machines similar to your just
 spec'ed.  They have worked great.  The only problem
 is, the power management routine only works for K10
 and later.  We will move to Intel core 2 duo for
 future machines (mainly b/c power management
 considerations).
 

So is Intel better? Which motherboard could be a good choice? (microatx?)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is my 3-way mirror completely lost?

2008-07-29 Thread Richard Elling
Bob Friesenhahn wrote:
 On Tue, 29 Jul 2008, Emiel van de Laar wrote:
   
 I'm not sure what to do next. Is my final pool completely lost?
 

 It sounds like your good disk has some serious problems and that 
 formatting the two disks with bad sectors was the wrong thing to do. 
 You might have been able to recover using the two failing disks by 
 removing the good disk which was causing the hang.

 Since diagnostics on the good disk succeeded, there may still be 
 some hope by using a low-level tool like 'dd' to transfer the 
 underlying data to more reliable storage.  If there is a successful 
 transfer, then you have something to work with.
   

Good idea.  This eliminates ZFS from the equation and should verify
that the data is readable.

Also check for errors or faults discovered by FMA using fmdump.
If these are consumer grade disks, they may not return failure when
an unrecoverable read is attempted, but the request should timeout
eventually.  You may be seeing serial timeouts which should show
up in the FMA records.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Datto ZFS OpenSolaris NAS Product

2008-07-29 Thread Haik Aftandilian
I just read about this new NAS product based on OpenSolaris and ZFS. There are 
lots of questions on this forum about good hardware for a home NAS box so the 
hardware/software this company is using might be interesting. From their site, 
they are using a 1.5 Ghz Low Voltage VIA C7 processor with 1 GB of RAM to 
serve up a ZFS mirror.

  http://www.dattobackup.com/zseries-tech.php

Haik
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread Jonathan Loran

I think the important point here is that this makes the case for ZFS 
handling at least one layer of redundancy.  If the disk you pulled was 
part of a mirror or raidz, there wouldn't be data loss when your system 
was rebooted.  In fact, the zpool status commands would likely keep 
working, and a reboot wouldn't be necessary at all.  I think it's 
unreasonable to expect a system with any file system to recover from a 
single drive being pulled.  Of course, loosing extra work because of the 
delayed notification is bad, but none the less, this is not a reasonable 
test.  Basically, always provide redundancy in your zpool config.

Jon

Ross Smith wrote:
 A little more information today.  I had a feeling that ZFS would 
 continue quite some time before giving an error, and today I've shown 
 that you can carry on working with the filesystem for at least half an 
 hour with the disk removed.
  
 I suspect on a system with little load you could carry on working for 
 several hours without any indication that there is a problem.  It 
 looks to me like ZFS is caching reads  writes, and that provided 
 requests can be fulfilled from the cache, it doesn't care whether the 
 disk is present or not.
  
 I would guess that ZFS is attempting to write to the disk in the 
 background, and that this is silently failing.
  
 Here's the log of the tests I did today.  After removing the drive, 
 over a period of 30 minutes I copied folders to the filesystem, 
 created an archive, set permissions, and checked properties.  I did 
 this both in the command line and with the graphical file manager tool 
 in Solaris.  Neither reported any errors, and all the data could be 
 read  written fine.  Until the reboot, at which point all the data 
 was lost, again without error.
  
 If you're not interested in the detail, please skip to the end where 
 I've got some thoughts on just how many problems there are here.
  
  
 # zpool status test
   pool: test
  state: ONLINE
  scrub: none requested
 config:
 NAMESTATE READ WRITE CKSUM
 testONLINE   0 0 0
   c2t7d0ONLINE   0 0 0
 errors: No known data errors
 # zfs list test
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 test   243M   228G   242M  /test
 # zpool list test
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test   232G   243M   232G 0%  ONLINE  -
  

 -- drive removed --
  

 # cfgadm |grep sata1/7
 sata1/7sata-portemptyunconfigured ok
  
  
 -- cfgadmin knows the drive is removed.  How come ZFS does not? --
  

 # cp -r /rc-pool/copytest /test/copytest
 # zpool list test
 NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test  232G  73.4M   232G 0%  ONLINE  -
 # zfs list test
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 test   142K   228G18K  /test
  
  
 -- Yup, still up.  Let's start the clock --
  

 # date
 Tue Jul 29 09:31:33 BST 2008
 # du -hs /test/copytest
  667K /test/copytest
  
  
 -- 5 minutes later, still going strong --
  

 # date
 Tue Jul 29 09:36:30 BST 2008
 # zpool list test
 NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test  232G  73.4M   232G 0%  ONLINE  -
 # cp -r /rc-pool/copytest /test/copytest2
 # ls /test
 copytest   copytest2
 # du -h -s /test
  1.3M /test
 # zpool list test
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test   232G  73.4M   232G 0%  ONLINE  -
 # find /test | wc -l
 2669
 # find //test/copytest | wc -l
 1334
 # find /rc-pool/copytest | wc -l
 1334
 # du -h -s /rc-pool/copytest
  5.3M /rc-pool/copytest
  
  
 -- Not sure why the original pool has 5.3MB of data when I use du. --
 -- File Manager reports that they both have the same size --
  
  
 -- 15 minutes later it's still working.  I can read data fine --

 # date
 Tue Jul 29 09:43:04 BST 2008
 # chmod 777 /test/*
 # mkdir /rc-pool/test2
 # cp -r /test/copytest2 /rc-pool/test2/copytest2
 # find /rc-pool/test2/copytest2 | wc -l
 1334
 # zpool list test
 NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test  232G  73.4M   232G 0%  ONLINE  -
  
  
 -- and yup, the drive is still offline --
  

 # cfgadm | grep sata1/7
 sata1/7sata-portemptyunconfigured ok


 -- And finally, after 30 minutes the pool is still going strong --
  

 # date
 Tue Jul 29 09:59:56 BST 2008
 # tar -cf /test/copytest.tar /test/copytest/*
 # ls -l
 total 3
 drwxrwxrwx   3 root root   3 Jul 29 09:30 copytest
 -rwxrwxrwx   1 root root 4626432 Jul 29 09:59 copytest.tar
 drwxrwxrwx   3 root root   3 Jul 29 09:39 copytest2
 # zpool list test
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test   232G  73.4M   232G 0%  ONLINE  -

  
 After a full 30 minutes there's no indication whatsoever of any 
 problem.  Checking properties of the folder in File Browser reports 
 2665 items, totalling 9.0MB.
  
 At this point I tried # zfs set sharesmb=on 

Re: [zfs-discuss] Supermicro AOC-SAT2-MV8 hang when drive removed

2008-07-29 Thread David Collier-Brown
  Just a side comment: this discussion shows all the classic symptoms of 
two groups of people with different basic assumptions, each wondering why 
the other said what they did.
  Getting these out in the open would be A Good Thing (;-))

--dave

Jonathan Loran wrote:
 I think the important point here is that this makes the case for ZFS 
 handling at least one layer of redundancy.  If the disk you pulled was 
 part of a mirror or raidz, there wouldn't be data loss when your system 
 was rebooted.  In fact, the zpool status commands would likely keep 
 working, and a reboot wouldn't be necessary at all.  I think it's 
 unreasonable to expect a system with any file system to recover from a 
 single drive being pulled.  Of course, loosing extra work because of the 
 delayed notification is bad, but none the less, this is not a reasonable 
 test.  Basically, always provide redundancy in your zpool config.
 
 Jon
 
 Ross Smith wrote:
 
A little more information today.  I had a feeling that ZFS would 
continue quite some time before giving an error, and today I've shown 
that you can carry on working with the filesystem for at least half an 
hour with the disk removed.
 
I suspect on a system with little load you could carry on working for 
several hours without any indication that there is a problem.  It 
looks to me like ZFS is caching reads  writes, and that provided 
requests can be fulfilled from the cache, it doesn't care whether the 
disk is present or not.
 
I would guess that ZFS is attempting to write to the disk in the 
background, and that this is silently failing.
 
Here's the log of the tests I did today.  After removing the drive, 
over a period of 30 minutes I copied folders to the filesystem, 
created an archive, set permissions, and checked properties.  I did 
this both in the command line and with the graphical file manager tool 
in Solaris.  Neither reported any errors, and all the data could be 
read  written fine.  Until the reboot, at which point all the data 
was lost, again without error.
 
If you're not interested in the detail, please skip to the end where 
I've got some thoughts on just how many problems there are here.
 
 
# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:
NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  c2t7d0ONLINE   0 0 0
errors: No known data errors
# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test   243M   228G   242M  /test
# zpool list test
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test   232G   243M   232G 0%  ONLINE  -
 

-- drive removed --
 

# cfgadm |grep sata1/7
sata1/7sata-portemptyunconfigured ok
 
 
-- cfgadmin knows the drive is removed.  How come ZFS does not? --
 

# cp -r /rc-pool/copytest /test/copytest
# zpool list test
NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test  232G  73.4M   232G 0%  ONLINE  -
# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test   142K   228G18K  /test
 
 
-- Yup, still up.  Let's start the clock --
 

# date
Tue Jul 29 09:31:33 BST 2008
# du -hs /test/copytest
 667K /test/copytest
 
 
-- 5 minutes later, still going strong --
 

# date
Tue Jul 29 09:36:30 BST 2008
# zpool list test
NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test  232G  73.4M   232G 0%  ONLINE  -
# cp -r /rc-pool/copytest /test/copytest2
# ls /test
copytest   copytest2
# du -h -s /test
 1.3M /test
# zpool list test
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test   232G  73.4M   232G 0%  ONLINE  -
# find /test | wc -l
2669
# find //test/copytest | wc -l
1334
# find /rc-pool/copytest | wc -l
1334
# du -h -s /rc-pool/copytest
 5.3M /rc-pool/copytest
 
 
-- Not sure why the original pool has 5.3MB of data when I use du. --
-- File Manager reports that they both have the same size --
 
 
-- 15 minutes later it's still working.  I can read data fine --

# date
Tue Jul 29 09:43:04 BST 2008
# chmod 777 /test/*
# mkdir /rc-pool/test2
# cp -r /test/copytest2 /rc-pool/test2/copytest2
# find /rc-pool/test2/copytest2 | wc -l
1334
# zpool list test
NAME  SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test  232G  73.4M   232G 0%  ONLINE  -
 
 
-- and yup, the drive is still offline --
 

# cfgadm | grep sata1/7
sata1/7sata-portemptyunconfigured ok


-- And finally, after 30 minutes the pool is still going strong --
 

# date
Tue Jul 29 09:59:56 BST 2008
# tar -cf /test/copytest.tar /test/copytest/*
# ls -l
total 3
drwxrwxrwx   3 root root   3 Jul 29 09:30 copytest
-rwxrwxrwx   1 root root 4626432 Jul 29 09:59 copytest.tar
drwxrwxrwx   3 root root   3 Jul 29 09:39 copytest2
# zpool list test
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
test   232G  73.4M   232G 0%  ONLINE  -

 
After a full 30 minutes there's no indication whatsoever of 

Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Brandon High
On Tue, Jul 29, 2008 at 9:20 AM, Steve [EMAIL PROTECTED] wrote:
 So is Intel better? Which motherboard could be a good choice? (microatx?)

Inexpensive Intel motherboards do not support ECC memory while all
current AMD cpus do.

If ECC is important to you, Intel is not a good choice.

I'm disappointed that there is no support for power management on the
K8, which is a bit of a shock since Sun's been selling K8 based
systems for a few years now. The cost of an X3 ($125) and AM2+ mobo
($80) is about the same as an Intel chip ($80) and motherboard ($150)
that supports ECC.

-B

-- 
Brandon High [EMAIL PROTECTED]
The good is the enemy of the best. - Nietzsche
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Stefano Pini

Hi guys,
we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS 
(i.e. NFS server).
Both server will contain the same files and should be accessed by 
different clients at the same time (i.e. they should be both active)

So we need to guarantee that both x4500 contain the same files:
We could simply copy the contents on both x4500 , which is an option 
because the new files are in a limited number and rate , but we would 
really like to use ZFS send  receive commands:


AFAIK the commands works fine but  generally speaking are there any 
known limitations ?
And, in detail , it is not clear  if the receiving ZFS file system could 
be used regularly while it is in receiving mode:
in poor words is it possible to read and export in nfs,   files from a  
ZFS file system while it is receiving update from  another  ZFS send ?


Clearly  until the new updates are received and applied the old copy 
would be used


TIA
Stefano



Sun Microsystems Spa
Viale Fulvio testi 327
20162 Milano ITALY
me *STEFANO PINI*
Senior Technical Specialist at Sun Microsystems Italy 
http://www.sun.com/italy
contact | [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] | +39 02 
64152150



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Chris Cosby
On Tue, Jul 29, 2008 at 5:13 PM, Stefano Pini [EMAIL PROTECTED] wrote:

 Hi guys,
 we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS (i.e.
 NFS server).
 Both server will contain the same files and should be accessed by different
 clients at the same time (i.e. they should be both active)
 So we need to guarantee that both x4500 contain the same files:
 We could simply copy the contents on both x4500 , which is an option
 because the new files are in a limited number and rate , but we would
 really like to use ZFS send  receive commands:

If they are truly limited, something like an rsync or similar. There was a
script being thrown around a while back that was touted as the Best Backup
Script That Doesn't Do Backups, but I can't find it. In essence, it just
created a list of what changed since the last backup and allowed you to use
tar/cpio/cp - whatever to do the backup.




 AFAIK the commands works fine but  generally speaking are there any known
 limitations ?
 And, in detail , it is not clear  if the receiving ZFS file system could be
 used regularly while it is in receiving mode:
 in poor words is it possible to read and export in nfs,   files from a  ZFS
 file system while it is receiving update from  another  ZFS send ?

First, the zfs send works only on a snapshot. -i sends incremental
snapshots, so you would think that would work. From the zfs man page, you'll
see that during a receive, the destination file system is unmounted and
cannot be accessed during the receive.

  If an incremental stream is received, then the  destina-
 tion file system must already exist, and its most recent
 snapshot must match the incremental stream's source. The
 destination  file  system and all of its child file sys-
 tems are unmounted and cannot  be  accessed  during  the
 receive operation.



 Clearly  until the new updates are received and applied the old copy would
 be used

 TIA
 Stefano



 Sun Microsystems Spa
 Viale Fulvio testi 327
 20162 Milano ITALY
 me *STEFANO PINI*
 Senior Technical Specialist at Sun Microsystems Italy 
 http://www.sun.com/italy
 contact | [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] | +39 02
 64152150



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread mike
I'd say some good places to look are silentpcreview.com and mini-itx.com.

I found this tasty morsel on an ad at mini-itx...
http://www.american-portwell.com/product.php?productid=16133

6x onboard SATA. 4 gig support. core2duo support. which means 64 bit = yes, 4 
gig = yes, 6x sata is nice.

now if only I could find a chassis for this. AFAIK the Chenbro is the only  2 
drive mini-itx chassis so far. I wish I knew metal working and carve up my own 
:P
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Richard Elling
Stefano Pini wrote:
 Hi guys,
 we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS 
 (i.e. NFS server).
 Both server will contain the same files and should be accessed by 
 different clients at the same time (i.e. they should be both active)

What exactly are they trying to do?

AVS can be used to keep two systems in sync, but for a simple design,
there should be two NFS file systems, one active for each X4500.
There has recently been some discussions on
[EMAIL PROTECTED]
about using AVS to keep the unshared storage in sync.
 -- richard

 So we need to guarantee that both x4500 contain the same files:
 We could simply copy the contents on both x4500 , which is an option 
 because the new files are in a limited number and rate , but we 
 would really like to use ZFS send  receive commands:

 AFAIK the commands works fine but  generally speaking are there any 
 known limitations ?
 And, in detail , it is not clear  if the receiving ZFS file system 
 could be used regularly while it is in receiving mode:
 in poor words is it possible to read and export in nfs,   files from 
 a  ZFS file system while it is receiving update from  another  ZFS send ?

 Clearly  until the new updates are received and applied the old copy 
 would be used

 TIA
 Stefano



 Sun Microsystems Spa
 Viale Fulvio testi 327
 20162 Milano ITALY
 me *STEFANO PINI*
 Senior Technical Specialist at Sun Microsystems Italy 
 http://www.sun.com/italy
 contact | [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] | +39 02 
 64152150


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread eric kustarz

On Jul 29, 2008, at 2:24 PM, Chris Cosby wrote:



 On Tue, Jul 29, 2008 at 5:13 PM, Stefano Pini [EMAIL PROTECTED]  
 wrote:
 Hi guys,
 we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS  
 (i.e. NFS server).
 Both server will contain the same files and should be accessed by  
 different clients at the same time (i.e. they should be both active)
 So we need to guarantee that both x4500 contain the same files:
 We could simply copy the contents on both x4500 , which is an option  
 because the new files are in a limited number and rate , but we  
 would really like to use ZFS send  receive commands:
 If they are truly limited, something like an rsync or similar. There  
 was a script being thrown around a while back that was touted as the  
 Best Backup Script That Doesn't Do Backups, but I can't find it. In  
 essence, it just created a list of what changed since the last  
 backup and allowed you to use tar/cpio/cp - whatever to do the backup.

I think zfs send/recv would be a great way to go here - see below.





 AFAIK the commands works fine but  generally speaking are there any  
 known limitations ?
 And, in detail , it is not clear  if the receiving ZFS file system  
 could be used regularly while it is in receiving mode:
 in poor words is it possible to read and export in nfs,   files from  
 a  ZFS file system while it is receiving update from  another  ZFS  
 send ?
 First, the zfs send works only on a snapshot. -i sends incremental  
 snapshots, so you would think that would work. From the zfs man  
 page, you'll see that during a receive, the destination file system  
 is unmounted and cannot be accessed during the receive.

   If an incremental stream is received, then the  destina-
  tion file system must already exist, and its most recent
  snapshot must match the incremental stream's source. The
  destination  file  system and all of its child file sys-
  tems are unmounted and cannot  be  accessed  during  the
  receive operation.

Actually we don't unmount the file systems anymore for incremental  
send/recv, see:
6425096 want online 'zfs recv' (read only and read/write)

Available since November 2007 in OpenSolaris/Nevada.  Coming to a  
s10u6 near you.

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about ZFS Send/Receive

2008-07-29 Thread Chris Cosby
Obviously, I should stop answering, as all I deal with and all that I will
deal with is GA Solaris. OpenSolaris might as well not exist as far as I'm
concerned. With that in mind, I'll just keep reading and appreciating all of
the good zfs info that comes along.

Peace out.

On Tue, Jul 29, 2008 at 5:54 PM, eric kustarz [EMAIL PROTECTED] wrote:


 On Jul 29, 2008, at 2:24 PM, Chris Cosby wrote:



 On Tue, Jul 29, 2008 at 5:13 PM, Stefano Pini [EMAIL PROTECTED]
 wrote:
 Hi guys,
 we are  proposing  a customer  a couple of X4500 (24 Tb) used as NAS (i.e.
 NFS server).
 Both server will contain the same files and should be accessed by
 different clients at the same time (i.e. they should be both active)
 So we need to guarantee that both x4500 contain the same files:
 We could simply copy the contents on both x4500 , which is an option
 because the new files are in a limited number and rate , but we would
 really like to use ZFS send  receive commands:
 If they are truly limited, something like an rsync or similar. There was a
 script being thrown around a while back that was touted as the Best Backup
 Script That Doesn't Do Backups, but I can't find it. In essence, it just
 created a list of what changed since the last backup and allowed you to use
 tar/cpio/cp - whatever to do the backup.


 I think zfs send/recv would be a great way to go here - see below.





 AFAIK the commands works fine but  generally speaking are there any known
 limitations ?
 And, in detail , it is not clear  if the receiving ZFS file system could
 be used regularly while it is in receiving mode:
 in poor words is it possible to read and export in nfs,   files from a
  ZFS file system while it is receiving update from  another  ZFS send ?
 First, the zfs send works only on a snapshot. -i sends incremental
 snapshots, so you would think that would work. From the zfs man page, you'll
 see that during a receive, the destination file system is unmounted and
 cannot be accessed during the receive.

  If an incremental stream is received, then the  destina-
 tion file system must already exist, and its most recent
 snapshot must match the incremental stream's source. The
 destination  file  system and all of its child file sys-
 tems are unmounted and cannot  be  accessed  during  the
 receive operation.


 Actually we don't unmount the file systems anymore for incremental
 send/recv, see:
 6425096 want online 'zfs recv' (read only and read/write)

 Available since November 2007 in OpenSolaris/Nevada.  Coming to a s10u6
 near you.

 eric




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal Setup: RAID-5, Areca, etc!

2008-07-29 Thread Robert Milkowski
Hello Bob,

Friday, July 25, 2008, 9:00:41 PM, you wrote:

BF On Fri, 25 Jul 2008, Brandon High wrote:

 I am not sure if ZFS really has to wait for both sides of a mirror to
 finish, but even if it does, if there are enough VDEVs then ZFS can still
 proceed with writing.

 It would have to wait on an fsync() call, since that won't return
 until both halves of the mirror have completed. If the cards you're
 using have NVRAM, then they could return faster.

BF While it is possible that the ZFS implementation does actually wait 
BF for both drives to report that the data is written, it only has to 
BF know that the data is committed to one drive in order to satisfy the 
BF synchronous write expectation.  This is not the case for legacy 
BF mirrored pairs where the disks are absolutely required to contain the 
BF same content at the same logical locations.  ZFS does not require that
BF disks in a mirrored pair contain identical content at all times.

AFAIK zfs does require that all writes are committed to all devices to
satisfy configured redundancy unless some of devices were marked as
failed. Otherwise, especially in sync case, you could loose data
because of a disk failure in a redundant configuration. Not to mention
other possible issues.

-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Steve
A little case modding maybe not so difficult...there are examples (and 
instructions) like: 
http://www.mashie.org/casemods/udat2.html

But for sure there are more advanced like:
http://forums.bit-tech.net/showthread.php?t=76374pp=20

And here you can have a full example of the human ingenious!!
http://www.darkroastedblend.com/2007/06/cool-computer-case-mods.html
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread mike
that mashie link might be exactly what i wanted...

that mini-itx board w/ 6 SATA. use CF maybe for boot (might need IDE to CF 
converter) - 5 drive holder (hotswap as a bonus) - you get 4 gig ram, 
core2-based chip (64-bit), onboard graphics, 5 SATA2 drives... that is cool.

however. would need to hack it up (and I don't have any metal cutting stuff) 
and who knows how loud it is without any front on those drives. i'd want a 
small cover on top to help with noise.

looks like i might have to hang out over on the mashie site now too ;)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-29 Thread Robert Milkowski
Hello Bob,

Friday, July 25, 2008, 4:58:54 PM, you wrote:

BF On Fri, 25 Jul 2008, Robert Milkowski wrote:

 Both on 2540 and 6540 if you do not disable it your performance will 
 be very bad especially for synchronous IOs as ZIL will force your 
 array to flush its cache every time. If you are not using ZFS on any 
 other storage than 2540 on your servers then put set 
 zfs:zfs_nocacheflush=1 in /etc/system and do a reboot. If you 
 haven't done so it should help you considerably.

BF This does not seem wise since then data (records of trades) may be 
BF lost if the system crashes or loses power.  It is much better to apply
BF the firmware tweaks so that the 2540 reports that the data is written 
BF as soon as it is safely in its NVRAM rather than waiting for it to be 
BF on disk.  ZFS should then perform rather well with low latency. 

Both cases are basically the same.
Please notice I'm not talking about disabling ZIL, I'm talking about
disabling cache flushes in ZFS. ZFS will still wait for the array to
confirm that it did receive data (nvram).

If you loose power the behavior will be the same - no difference here.




-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread Steve
If I understood properly there is just one piece that has to be modified: a 
flat alluminium board with a squared hole in the center, that any fine mechanic 
around your city should do very easily...

The problem more than the noise in this tight case might be the temperature!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-29 Thread Bob Friesenhahn
On Wed, 30 Jul 2008, Robert Milkowski wrote:

 Both cases are basically the same.
 Please notice I'm not talking about disabling ZIL, I'm talking about
 disabling cache flushes in ZFS. ZFS will still wait for the array to
 confirm that it did receive data (nvram).

So it seems that in your opinion, the periodic burp in system call 
completion time is due to ZFS's periodic cache flush.  That is 
certainly quite possible.

Testing will prove it, but the testing can be on someone else's system 
rather than my own. :)

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] My first 'unrecoverable error', what to do?

2008-07-29 Thread Sam
I've had my 10x500 ZFS+ running for probably 6 months now and had thought it 
was scrubbing occasionally (wrong) so I started a scrub this morning, its 
almost done now and I got this:

errors: No known data errors
# zpool status
  pool: pile
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress, 97.93% done, 0h5m to go
config:

NAMESTATE READ WRITE CKSUM
pileONLINE   0 0 0
  raidz2ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 1
c5t5d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 1
c5t7d0  ONLINE   0 0 0
c3d0ONLINE   0 0 1
c4d0ONLINE   0 0 0


So it says its a minor error but still one to be concerned about, I thought 
resilvering takes care of checksum errors, does it not?  Should I be running to 
buy 3 new 500GB drives?

Thanks,
Sam
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My first 'unrecoverable error', what to do?

2008-07-29 Thread Sam
Could this someway be related to this rather large (100GB) difference that 'zfs 
list' and 'zpool list' report:

NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
pile  4.53T  4.31T   223G95%  ONLINE  -
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
pile  3.44T   120G  3.44T  /pile

I know there should be a 1TB difference in SIZE but the difference in AVAIL 
makes no sense.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My first 'unrecoverable error', what to do?

2008-07-29 Thread Arne Schwabe
Sam schrieb:
 I've had my 10x500 ZFS+ running for probably 6 months now and had thought it 
 was scrubbing occasionally (wrong) so I started a scrub this morning, its 
 almost done now and I got this:

 errors: No known data errors
 # zpool status
   pool: pile
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub in progress, 97.93% done, 0h5m to go
 config:

 NAMESTATE READ WRITE CKSUM
 pileONLINE   0 0 0
   raidz2ONLINE   0 0 0
 c5t0d0  ONLINE   0 0 0
 c5t1d0  ONLINE   0 0 0
 c5t2d0  ONLINE   0 0 0
 c5t3d0  ONLINE   0 0 0
 c5t4d0  ONLINE   0 0 1
 c5t5d0  ONLINE   0 0 0
 c5t6d0  ONLINE   0 0 1
 c5t7d0  ONLINE   0 0 0
 c3d0ONLINE   0 0 1
 c4d0ONLINE   0 0 0


 So it says its a minor error but still one to be concerned about, I thought 
 resilvering takes care of checksum errors, does it not?  Should I be running 
 to buy 3 new 500GB drives?

   
Failures can have different cause. Maybe a cable is defect. Also 
occosinal defect sectors are normal and are managed quite good by the 
defect managment of the drive. You can use zpool clear to reset the 
counters to 0.

Arne

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-07-29 Thread mike
exactly.

that's why i'm trying to get an account on that site (looks like open 
registration for the forums is disabled) so i can shoot the breeze and talk 
about all this stuff too.

zfs would be perfect for this as most these guys are trying to find hardware 
raid cards that will fit, etc... with mini-itx boards coming with 4 and now 6 
ports, that isn't an issue, as long as onboard SATA2+ZFS is fast enough 
everyone wins.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My first 'unrecoverable error', what to do?

2008-07-29 Thread Bob Friesenhahn
On Tue, 29 Jul 2008, Sam wrote:
 So it says its a minor error but still one to be concerned about, I 
 thought resilvering takes care of checksum errors, does it not? 
 Should I be running to buy 3 new 500GB drives?

Presumably these are SATA drives.  Studies show that typical SATA 
drives tend to produce recurring data errors during their lifetime so 
a few data errors are likely nothing to be alarmed about.  If you see 
many tens or hundreds then there would be cause for concern. 
Enterprise SCSI drives produce very few such errors and evidence 
suggests that data errors may portend doom.

I have yet to see an error here.  Knock on wood!

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss