Re: [zfs-discuss] Zpool import not working - I broke my pool...

2008-08-06 Thread Ross Smith

Hmm... got a bit more information for you to add to that bug I think.
 
Zpool import also doesn't work if you have mirrored log devices and either one 
of them is offline.
 
I created two ramdisks with:
# ramdiskadm -a rc-pool-zil-1 256m
# ramdiskadm -a rc-pool-zil-2 256m
 
And added them to the pool with:
# zpool add rc-pool log mirror /dev/ramdisk/rc-pool-zil-1 
/dev/ramdisk/rc-pool-zil-2
 
I can reboot fine, the pool imports ok without the ZIL and I have a script that 
recreates the ramdisks and adds them back to the pool:#!/sbin/shstate=$1case 
$state in'start')   echo 'Starting Ramdisks'   /usr/sbin/ramdiskadm -a 
rc-pool-zil-1 256m   /usr/sbin/ramdiskadm -a rc-pool-zil-2 256m   echo 
'Attaching to ZFS ZIL'   /usr/sbin/zpool replace test 
/dev/ramdisk/rc-pool-zil-1   /usr/sbin/zpool replace test 
/dev/ramdisk/rc-pool-zil-2   ;;'stop')   ;;esac
 
However, if I export the pool, and delete one ramdisk to check that the 
mirroring works fine, the import fails:
# zpool export rc-pool
# ramdiskadm -d rc-pool-zil-1
# zpool import rc-pool
cannot import 'rc-pool': one or more devices is currently unavailable
 
Ross
 Date: Mon, 4 Aug 2008 10:42:43 -0600 From: [EMAIL PROTECTED] Subject: Re: 
 [zfs-discuss] Zpool import not working - I broke my pool... To: [EMAIL 
 PROTECTED]; [EMAIL PROTECTED] CC: zfs-discuss@opensolaris.orgRichard 
 Elling wrote:  Ross wrote:  I'm trying to import a pool I just exported 
 but I can't, even -f doesn't help. Every time I try I'm getting an error:  
 cannot import 'rc-pool': one or more devices is currently unavailable  
  Now I suspect the reason it's not happy is that the pool used to have a 
 ZIL :)  Correct. What you want is CR 6707530, log device failure 
 needs some work  http://bugs.opensolaris.org/view_bug.do?bug_id=6707530  
 which Neil has been working on, scheduled for b96.  Actually no. That CR 
 mentioned the problem and talks about splitting out the bug, as it's really 
 a separate problem. I've just done that and here's the new CR which probably 
 won't be visible immediately to you:  6733267 Allow a pool to be imported 
 with a missing slog  Here's the Description:  --- This 
 CR is being broken out from 6707530 log device failure needs some work  
 When Separate Intent logs (slogs) were designed they were given equal status 
 in the pool device tree. This was because they can contain committed changes 
 to the pool. So if one is missing it is assumed to be important to the 
 integrity of the application(s) that wanted the data committed 
 synchronously, and thus a pool cannot be imported with a missing slog. 
 However, we do allow a pool to be missing a slog on boot up if it's in the 
 /etc/zfs/zpool.cache file. So this sends a mixed message.  We should allow 
 a pool to be imported without a slog if -f is used and to not import without 
 -f but perhaps with a better error message.  It's the guidsum check that 
 actually rejects imports with missing devices. We could have a separate 
 guidsum for the main pool devices (non slog/cache). --- 
_
Get Hotmail on your mobile from Vodafone 
http://clk.atdmt.com/UKM/go/107571435/direct/01/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] more ZFS recovery

2008-08-06 Thread Tom Bird
Hi,

Have a problem with a ZFS on a single device, this device is 48 1T SATA
drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
a ZFS on it as a single device.

There was a problem with the SAS bus which caused various errors
including the inevitable kernel panic, the thing came back up with 3 out
of 4 zfs mounted.

I've tried reading the partition table with format, works fine, also can
dd the first 100G from the device quite happily so the communication
issue appears resolved however the device just won't mount.  Googling
around I see that ZFS does have features designed to reduce the impact
of corruption at a particular point, multiple meta data copies and so
on, however commands to help me tidy up a zfs will only run once the
thing has been mounted.

Would be grateful for any ideas, relevant output here:

[EMAIL PROTECTED]:~# zpool import
  pool: content
id: 14205780542041739352
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on on another system, but can be imported
using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
config:

content FAULTED   corrupted data
  c2t9d0ONLINE

[EMAIL PROTECTED]:~# zpool import content
cannot import 'content': pool may be in use from other system
use '-f' to import anyway

[EMAIL PROTECTED]:~# zpool import -f content
cannot import 'content': I/O error

[EMAIL PROTECTED]:~# uname -a
SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200


Thanks
-- 
Tom

// www.portfast.co.uk -- internet services and consultancy
// hosting from 1.65 per domain
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris+ZFS+RAIDZ+VirtualBox - ready for production systems?

2008-08-06 Thread Orvar Korvar
I use a Intel Q9450 + P45 mobo + ATI 4850 + ZFS + VirtualBox.

I have installed WinXP. It works good and is stable. There are features not 
implemented yet, though. For instance USB.

I suggest you try VB yourself. It is ~20MB and installs quick. I used it on 1GB 
RAM P4 machine. It worked fine. If you have memory you can copy the install CD 
to /tmp. Installing from RAM is quite quick.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs status -v tries too hard?

2008-08-06 Thread James Litchfield
After some errors were logged as to a problem with a ZFS file system,
I ran zfs status followed by zfs status -v...

# zpool status
  pool: ehome
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
ehome   ONLINE   6.28K 2.84M 0
  c2t0d0p0  ONLINE   6.28K 2.84M 0

errors: 796332 data errors, use '-v' for a list

[ elided ]

# zpool status -v
  pool: ehome
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A

 scrub: none requested
config:

NAMESTATEREAD WRITE CKSUM
ehomeONLINE   3.03K 2.09M0
  c2t0d0p0  ONLINE   3.03K 2.09M0

 HANGS HERE

 From another window do a truss of zfs status...

ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM
ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM
ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM
ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM
ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM
ioctl(3, ZFS_IOC_ERROR_LOG, 0x08041DE0)Err#12 ENOMEM

One would think it would get the message

After a reboot, a move of the drive to another UFS port on the laptop,
a zfe export of ehome and a zfs import of ehome, it is back on line
with zero errors


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool upgrade wrecked GRUB

2008-08-06 Thread Timothy Noronha
Almost.  I did exactly the same thing to my system -- upgrading ZFS.

The 2008.11 development snapshot CD I found is based on snv_93 and doesn't yet 
suport ZFS v.11 so it refuses to import the pool.  My system doesn't have a DVD 
drive, so I cannot boot the SXCE snv_94 DVD.  I guess I have to track down or 
wait for a = snv_94 based development snapshot live CD.  Should be any day 
now, right?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool import not working - I broke my pool...

2008-08-06 Thread Neil Perrin
Ross,

Thanks, I have updated the bug with this info.

Neil.

Ross Smith wrote:
 Hmm... got a bit more information for you to add to that bug I think.
  
 Zpool import also doesn't work if you have mirrored log devices and 
 either one of them is offline.
  
 I created two ramdisks with:
 # ramdiskadm -a rc-pool-zil-1 256m
 # ramdiskadm -a rc-pool-zil-2 256m
  
 And added them to the pool with:
 # zpool add rc-pool log mirror /dev/ramdisk/rc-pool-zil-1 
 /dev/ramdisk/rc-pool-zil-2
  
 I can reboot fine, the pool imports ok without the ZIL and I have a 
 script that recreates the ramdisks and adds them back to the pool:
 #!/sbin/sh
 state=$1
 case $state in
 'start')
echo 'Starting Ramdisks'
/usr/sbin/ramdiskadm -a rc-pool-zil-1 256m
/usr/sbin/ramdiskadm -a rc-pool-zil-2 256m
echo 'Attaching to ZFS ZIL'
/usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-1
/usr/sbin/zpool replace test /dev/ramdisk/rc-pool-zil-2
;;
 'stop')
;;
 esac
  
 However, if I export the pool, and delete one ramdisk to check that the 
 mirroring works fine, the import fails:
 # zpool export rc-pool
 # ramdiskadm -d rc-pool-zil-1
 # zpool import rc-pool
 cannot import 'rc-pool': one or more devices is currently unavailable
  
 Ross
 
 
   Date: Mon, 4 Aug 2008 10:42:43 -0600
   From: [EMAIL PROTECTED]
   Subject: Re: [zfs-discuss] Zpool import not working - I broke my pool...
   To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
   CC: zfs-discuss@opensolaris.org
  
  
  
   Richard Elling wrote:
Ross wrote:
I'm trying to import a pool I just exported but I can't, even -f 
 doesn't help. Every time I try I'm getting an error:
cannot import 'rc-pool': one or more devices is currently 
 unavailable
   
Now I suspect the reason it's not happy is that the pool used to 
 have a ZIL :)
   
   
Correct. What you want is CR 6707530, log device failure needs some 
 work
http://bugs.opensolaris.org/view_bug.do?bug_id=6707530
which Neil has been working on, scheduled for b96.
  
   Actually no. That CR mentioned the problem and talks about splitting out
   the bug, as it's really a separate problem. I've just done that and 
 here's
   the new CR which probably won't be visible immediately to you:
  
   6733267 Allow a pool to be imported with a missing slog
  
   Here's the Description:
  
   ---
   This CR is being broken out from 6707530 log device failure needs 
 some work
  
   When Separate Intent logs (slogs) were designed they were given equal 
 status in the pool device tree.
   This was because they can contain committed changes to the pool.
   So if one is missing it is assumed to be important to the integrity 
 of the
   application(s) that wanted the data committed synchronously, and thus
   a pool cannot be imported with a missing slog.
   However, we do allow a pool to be missing a slog on boot up if
   it's in the /etc/zfs/zpool.cache file. So this sends a mixed message.
  
   We should allow a pool to be imported without a slog if -f is used
   and to not import without -f but perhaps with a better error message.
  
   It's the guidsum check that actually rejects imports with missing 
 devices.
   We could have a separate guidsum for the main pool devices (non 
 slog/cache).
   ---
  
 
 
 
 Find out how to make Messenger your very own TV! Try it Now! 
 http://clk.atdmt.com/UKM/go/101719648/direct/01/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Tom Bird wrote:
 Hi,

 Have a problem with a ZFS on a single device, this device is 48 1T SATA
 drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
 a ZFS on it as a single device.

 There was a problem with the SAS bus which caused various errors
 including the inevitable kernel panic, the thing came back up with 3 out
 of 4 zfs mounted.

 I've tried reading the partition table with format, works fine, also can
 dd the first 100G from the device quite happily so the communication
 issue appears resolved however the device just won't mount.  Googling
 around I see that ZFS does have features designed to reduce the impact
 of corruption at a particular point, multiple meta data copies and so
 on, however commands to help me tidy up a zfs will only run once the
 thing has been mounted.
   

You should also check the end of the LUN.  ZFS stores its configuration data
at the beginning and end of the LUN.  An I/O error is a fairly generic 
error, but
it can also be an indicator of a catastrophic condition.  You should 
also check
the system log in /var/adm/messages as well as any faults reported by 
fmdump.

In general, ZFS can only repair conditions for which it owns data 
redundancy.
In this case, ZFS does not own the redundancy function, so you are 
susceptible
to faults of this sort.
 -- richard

 Would be grateful for any ideas, relevant output here:

 [EMAIL PROTECTED]:~# zpool import
   pool: content
 id: 14205780542041739352
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
 The pool may be active on on another system, but can be imported
 using
 the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-72
 config:

 content FAULTED   corrupted data
   c2t9d0ONLINE

 [EMAIL PROTECTED]:~# zpool import content
 cannot import 'content': pool may be in use from other system
 use '-f' to import anyway

 [EMAIL PROTECTED]:~# zpool import -f content
 cannot import 'content': I/O error

 [EMAIL PROTECTED]:~# uname -a
 SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200


 Thanks
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Bryan Allen

Good afternoon,

I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The
pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured).

When I put a moderate amount of load on the zpool (like, say, copying many
files locally, or deleting a large number of ZFS fs), the system hangs and
becomes completely unresponsive, requiring a reboot.

The ARC never gets over ~40MB.

The system is running Sol10u4.

Are there any suggested tunables for running big zpools on 32bit?

Cheers.
-- 
bda
Cyberpunk is dead.  Long live cyberpunk.
http://mirrorshades.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Strange burstiness in write speed with a mirror

2008-08-06 Thread Will Murnane
I've got a pool which I'm currently syncing a few hundred gigabytes to
using rsync.  The source machine is pretty slow, so it only goes at
about 20 MB/s.  Watching zpool iostat -v local-space 10, I see a
pattern like this (trimmed to take up less space):
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
local-space   251G   405G  0143 51  17.1M
  mirror 251G   405G  0143 51  17.1M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0137  0  17.1M
local-space   252G   404G  1163  2.55K  17.6M
  mirror 252G   404G  1163  2.55K  17.6M
c1d0s6  -  -  0145  6.39K  16.7M
c0d0s6  -  -  0150  38.4K  17.6M
local-space   253G   403G  0159511  16.9M
  mirror 253G   403G  0159511  16.9M
c1d0s6  -  -  0340  0  41.0M
c0d0s6  -  -  0145  12.8K  16.9M
local-space   253G   403G  0135511  16.2M
  mirror 253G   403G  0135511  16.2M
c1d0s6  -  -  0484  0  60.4M
c0d0s6  -  -  0130  0  16.2M
local-space   253G   403G  0125  0  15.4M
  mirror 253G   403G  0125  0  15.4M
c1d0s6  -  -  0471  0  59.0M
c0d0s6  -  -  0123  0  15.4M
local-space   253G   403G  0139  0  16.2M
  mirror 253G   403G  0139  0  16.2M
c1d0s6  -  -  0474  0  59.3M
c0d0s6  -  -  0129  0  16.2M
local-space   253G   403G  0139 51  17.1M
  mirror 253G   403G  0139 51  17.1M
c1d0s6  -  -  0  3  6.39K   476K
c0d0s6  -  -  0137  0  17.1M
local-space   253G   403G  0144  0  18.1M
  mirror 253G   403G  0144  0  18.1M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0144  0  18.1M
local-space   253G   403G  0146  0  18.1M
  mirror 253G   403G  0146  0  18.1M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0144  0  18.1M
local-space   253G   403G  0156  0  19.3M
  mirror 253G   403G  0156  0  19.3M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0154  0  19.3M
local-space   253G   403G  0152  0  19.1M
  mirror 253G   403G  0152  0  19.1M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0152  0  19.1M
local-space   253G   403G  0158  0  19.1M
  mirror 253G   403G  0158  0  19.1M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0152  0  19.1M
local-space   253G   403G  0150  0  18.5M
  mirror 253G   403G  0150  0  18.5M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0147  0  18.5M
local-space   253G   403G  0155  0  19.4M
  mirror 253G   403G  0155  0  19.4M
c1d0s6  -  -  0  0  0  0
c0d0s6  -  -  0155  0  19.4M

The interesting part of this (as far as I can tell) is the rightmost
column; the write speeds of the second disk stay constant at about 20
MB/s, and the first disk fluctuates between zero and 60 MB/s.  Is this
normal behavior?  Could it indicate a failing disk?  There's nothing
in `fmadm faulty', `dmesg', or `/var/adm/messages' that would indicate
an impending disk failure, but this behavior is strange.  I'm using
rsync-3.0.1 (yes, security fix is on the way) on both ends, and no NFS
involved in this.  rsync is writing 256k blocks.  `iostat -xl 2' shows
a similar kind of fluctuation going on.

Any suggestions what's going on?  Any other diagnostics you'd like to
see?  I'd be happy to provide them.

Thanks!
Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Will Murnane
On Wed, Aug 6, 2008 at 13:31, Bryan Allen [EMAIL PROTECTED] wrote:
 I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The
 pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured).
You might try taking out 4gb of the ram (!).  Some 32-bit drivers have
problems doing DMA to 4GB, so limiting yourself to that much might at
least eliminate that source of problems.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
 re == Richard Elling [EMAIL PROTECTED] writes:
 tb == Tom Bird [EMAIL PROTECTED] writes:

tb There was a problem with the SAS bus which caused various
tb errors including the inevitable kernel panic, the thing came
tb back up with 3 out of 4 zfs mounted.

re In general, ZFS can only repair conditions for which it owns
re data redundancy.

If that's really the excuse for this situation, then ZFS is not
``always consistent on the disk'' for single-VDEV pools.

There was no loss of data here, just an interruption in the connection
to the target, like power loss or any other unplanned shutdown.
Corruption in this scenario is is a significant regression w.r.t. UFS:

  http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html

How about the scenario where you lose power suddenly, but only half of
a mirrored VDEV is available when power is restored?  Is ZFS
vulnerable to this type of unfixable corruption in that scenario,
too?


pgpcFzodF6Qa6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Will Murnane
On Wed, Aug 6, 2008 at 13:57, Miles Nordin [EMAIL PROTECTED] wrote:
 re == Richard Elling [EMAIL PROTECTED] writes:
 tb == Tom Bird [EMAIL PROTECTED] writes:

tb There was a problem with the SAS bus which caused various
tb errors including the inevitable kernel panic, the thing came
tb back up with 3 out of 4 zfs mounted.

re In general, ZFS can only repair conditions for which it owns
re data redundancy.

 If that's really the excuse for this situation, then ZFS is not
 ``always consistent on the disk'' for single-VDEV pools.
Well, yes.  If data is sent, but corruption somewhere (the SAS bus,
apparently, here) causes bad data to be written, ZFS can generally
detect but not fix that.  It might be nice to have a verifywrites
mode or something similar to make sure that good data has ended up on
disk (at least at the time it checks), but failing that there's not
much ZFS (or any filesystem) can do.  Using a pool with some level of
redundancy (mirroring, raidz) at least gives zfs a chance to read the
missing pieces from the redundancy that it's kept.

 How about the scenario where you lose power suddenly, but only half of
 a mirrored VDEV is available when power is restored?  Is ZFS
 vulnerable to this type of unfixable corruption in that scenario,
 too?
Every filesystem is vulnerable to corruption, all the time.  I'm
willing to dispute any claims otherwise.  Some are just more likely
than others to hit their error conditions.  I've personally run into
UFS' problems more often than ZFS... but that doesn't mean I think I'm
safe.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Miles Nordin wrote:
 re == Richard Elling [EMAIL PROTECTED] writes:
 tb == Tom Bird [EMAIL PROTECTED] writes:
 

 tb There was a problem with the SAS bus which caused various
 tb errors including the inevitable kernel panic, the thing came
 tb back up with 3 out of 4 zfs mounted.

 re In general, ZFS can only repair conditions for which it owns
 re data redundancy.

 If that's really the excuse for this situation, then ZFS is not
 ``always consistent on the disk'' for single-VDEV pools.
   

I disagree with your assessment.  The on-disk format (any on-disk format)
necessarily assumes no faults on the media.  The difference between ZFS
on-disk format and most other file systems is that the metadata will be
consistent to some point in time because it is COW.  With UFS, for instance,
the metadata is overwritten, which is why it cannot be considered always
consistent (and why fsck exists).

 There was no loss of data here, just an interruption in the connection
 to the target, like power loss or any other unplanned shutdown.
 Corruption in this scenario is is a significant regression w.r.t. UFS:
   

I see no evidence that the data is or is not correct.  What we know is that
ZFS is attempting to read something and the device driver is returning EIO.
Unfortunately, EIO is a catch-all error code, so more digging to find the
root cause is needed.

However, I will bet a steak dinner that if this device was mirrored to 
another,
the pool will import just fine, with the affected device in a faulted or 
degraded
state.

   http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html
   

I have no idea what Eric is referring to, and it does not match my 
experience.
Unfortunately, he didn't reference any CRs either :-(.  Your baby is 
ugly posts
aren't very useful.

That said, we are constantly improving the resiliency of ZFS (more good
stuff coming in b96), so it might be worth trying to recover with a later
version.  For example, boot SXCE b94 and try to import the pool.

 How about the scenario where you lose power suddenly, but only half of
 a mirrored VDEV is available when power is restored?  Is ZFS
 vulnerable to this type of unfixable corruption in that scenario,
 too?
   

No, this works just fine as long as one side works.  But that is a very 
different case.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
 re == Richard Elling [EMAIL PROTECTED] writes:

 c  If that's really the excuse for this situation, then ZFS is
 c not ``always consistent on the disk'' for single-VDEV pools.

re I disagree with your assessment.  The on-disk format (any
re on-disk format) necessarily assumes no faults on the media.

The media never failed, only the connection to the media.  We've every
good reason to believe that every CDB that the storage controller
acknowledged as complete, was completed and is still there---and that
is the only statement which must be true of unfaulty media.  We've no
strong reason to doubt it.

re I see no evidence that the data is or is not correct.

the ``evidence'' is that it was on a SAN, and the storage itself never
failed, only the connection between ZFS and the storage.  Remember:

 this device is 48 1T SATA drives presented as a 42T LUN via hardware
 RAID 6 on a SAS bus which had a ZFS on it as a single device.

This sort of SAN-outage happens all the time, so it's not straining my
belief to suggest that probably nothing else happened other than
disruption of the connection between ZFS and the storage.  It's not
like a controller randomly ``acted up'' or something, so that I would
suspect a bad disk.

 c http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html

re I have no idea what Eric is referring to, and it does not
re match my experience.

unfortunately it's very easy to match the experience of ``nothing
happened'' and hard to match the experience ``exactly the same thing
happened to me.''  Have you been provoking ZFS in exactly the way Eric
described, a single-vdev pool on FC where the FC SAN often has outages
or where the storage is rebooted while ZFS is still running?  If not,
obviously it doesn't match your experience because you have none with
this situation.  OTOH if you've been doing that a lot, your not
running into this problem means something.  Otherwise, it's another
case of the home-user defense: ``I can't tell you how close to zero the number
of problems I've had with it is.  It's so close to zero, it is zero,
so there's virtually 0% chance what you're saying happened to you
really did happen to you.  and also to this other guy.''

When I say ``doesn't mathc my experience'' I meant I _do_ see Mac OS X
pinwheels and for me it's ``usually'' traceable back to VM pressure or
dead NFS server, not some random application-level userinterface
modal-wait as others claimed: I'm selecting for the same situation you
are, and gettin g a different result.

that said, yeah, a CR would be nice.  For such a serious problem, I'd
like to think someone's collected an image of the corrupt filesystem
and is trying to figure out wtf happened.

I care about how safe is my data, not how pretty is your baby.  I want
its relative safety accurately represented based on the experience
available to us.

 c How about the scenario where you lose power suddenly, but only
 c half of a mirrored VDEV is available when power is restored?
 c Is ZFS vulnerable to this type of unfixable corruption in that
 c scenario, too?

re No, this works just fine as long as one side works.  But that
re is a very different case.  -- richard

Why do you regard this case as very different from a single vdev?  I
don't have confidence that it's clearly different w.r.t. whatever
hypothetical bug Eric and Tom have run into.

wm If data is sent, but corruption somewhere (the SAS bus,
wm apparently, here) causes bad data to be written, ZFS can
wm generally detect but not fix that.

Why would there be bad data written?  The SAS bus has checksums.  The
problem AIUI was that the bus went away, not that it started
scribbling random data all over the place.  Am I wrong?  Remember what
Tom's SAS bus is connected to.

wm verifywrites

The verification is the storage array returning success to the command
it was issued.  ZFS is supposed to, for example, delay returning from
fsync() until this has happened.  The same mechanism is used to write
batches of things in a well-defined order to supposedly achieve the
``always-consistent''.  It depends on the drive/array's ability to
accurately report when data is committed to stable storage, not on
rereading what was written, and this is the correct dependency because
ZFS leaves write caches on, so the drive could satisfy a read from the
small on-disk cache RAM even though that data would be lost if you
pulled the disk's power cord.

The system contains all the tools needed to keep the consistency
promises even if you go around yanking SAS cables.

And this is a data-loss issue, not just an availability issue like we
were discussing before w.r.t. pulling drives.

wm Every filesystem is vulnerable to corruption, all the time.

Every filesystem in recent history makes rigorous guarantees about
what will survive if you pull the connection to the disk array, or the
host's power, at any time you wish.  The 

[zfs-discuss] zfs crash CR6727355 marked incomplete

2008-08-06 Thread Michael Hale
A bug report I've submitted for a zfs-related kernel crash has been  
marked incomplete and I've been asked to provide more information.

This CR has been marked as incomplete by User 1-5Q-2508
for the reason Need More Info.  Please update the CR
providing the information requested in the Evaluation and/or Comments  
field.

However, when I pull up 6727355 in the bugs.opensolaris.org, it  
doesn't allow me to make any edits, nor do I see an evaluation or  
comments field - am I doing something wrong?


--
Michael Hale[EMAIL 
PROTECTED] 
 
Manager of Engineering Support  Enterprise Engineering 
Group
Transcom Enhanced Services  
http://www.transcomus.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs crash CR6727355 marked incomplete

2008-08-06 Thread Neil Perrin


Michael Hale wrote:
 A bug report I've submitted for a zfs-related kernel crash has been  
 marked incomplete and I've been asked to provide more information.
 
 This CR has been marked as incomplete by User 1-5Q-2508
 for the reason Need More Info.  Please update the CR
 providing the information requested in the Evaluation and/or Comments  
 field.
 
 However, when I pull up 6727355 in the bugs.opensolaris.org, it  
 doesn't allow me to make any edits, nor do I see an evaluation or  
 comments field - am I doing something wrong?

1. The Comments field asks that the core dump be made readable by our
   zfs group, and the CR was made incomplete until the person who
   saved the core does this.
2. You do not see this because the Comments is not readable outside
   of Sun as it is used to contain customer information.
3. Finally there is no Evaluation yet.

Bottom line is that you can ignore the Need more info - it wasn't
directed at you. Sorry about the confusion. I guess the kinks in the
system aren't ironed out yet. Usually if we need more info we
will email you directly.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strange burstiness in write speed with a mirror

2008-08-06 Thread Bob Friesenhahn
On Wed, 6 Aug 2008, Will Murnane wrote:

 I've got a pool which I'm currently syncing a few hundred gigabytes to
 using rsync.  The source machine is pretty slow, so it only goes at
 about 20 MB/s.  Watching zpool iostat -v local-space 10, I see a
 pattern like this (trimmed to take up less space):

The pattern is indeed strange.  You have a fairly low data rate load 
shared across a large number of mirrors.  Maybe the I/Os complete very 
quickly and are often split within the 10 second duration, with no 
I/Os at all sometimes.  If you change to 60 seconds or 120 seconds 
does the reported data rate even out?  It looks like the mirrors are 
split across two controllers, which is very good, but could expose a 
performance issue with one of the controllers.

 Any suggestions what's going on?  Any other diagnostics you'd like to
 see?  I'd be happy to provide them.

Search for Jeff Bonwick's diskqual.sh script posted earlier to this 
forum.  It was posted on Mon, 14 Apr 2008 15:49:41 -0700.  It is quite 
handy for seeing if your disks are performing at a reasonably uniform 
rate.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Thomas Garner
For what it's worth I see this as well on 32-bit Xeons, 1GB ram, and
dual AOC-SAT2-MV8 (large amounts of io sometimes resulting in lockup
requiring a reboot --- though my setup is Nexenta b85). Nothing in the
logging, nor loadavg increasing significantly.  It could be the
regular Marvell driver issues, but is definitely not cool when it
happens.

Thomas

On Wed, Aug 6, 2008 at 1:31 PM, Bryan Allen [EMAIL PROTECTED] wrote:

 Good afternoon,

 I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The
 pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured).

 When I put a moderate amount of load on the zpool (like, say, copying many
 files locally, or deleting a large number of ZFS fs), the system hangs and
 becomes completely unresponsive, requiring a reboot.

 The ARC never gets over ~40MB.

 The system is running Sol10u4.

 Are there any suggested tunables for running big zpools on 32bit?

 Cheers.
 --
 bda
 Cyberpunk is dead.  Long live cyberpunk.
 http://mirrorshades.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Brian D. Horn
In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches)
all the known marvell88sx problems have long ago been dealt with.

However, I've said this before.  Solaris on 32-bit platforms has problems and
is not to be trusted.  There are far, far too many places in the source
code where a 64-bit object is either loaded or stored without any atomic
locking occurring which could result in any number of wrong and bad behaviors.
ZFS has some problems of this sort, but so does some of the low level 32-bit
x86 code.  The problem was reported long ago, but to the best of my knowledge
the issues have not been addressed.  Looking below it appears that nothing
has been done for about 9 months.

Here is the top of the bug report:

Bug ID   6634371
SynopsisSolaris ON is broken w.r.t. 64-bit operations on 32-bit 
processors
State   1-Dispatched (Default State)
Category:Subcategorykernel:other
Keywords32-bit | 64-bit | atomic
Reported Against
Duplicate Of
Introduced In   
Commit to Fix   
Fixed In
Release Fixed   
Related Bugs
Submit Date 27-NOV-2007
Last Update Date28-NOV-2007
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread James C. McPherson
Brian D. Horn wrote:
 In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches)
 all the known marvell88sx problems have long ago been dealt with.
 
 However, I've said this before.  Solaris on 32-bit platforms has problems and
 is not to be trusted.  There are far, far too many places in the source
 code where a 64-bit object is either loaded or stored without any atomic
 locking occurring which could result in any number of wrong and bad behaviors.
 ZFS has some problems of this sort, but so does some of the low level 32-bit
 x86 code.  The problem was reported long ago, but to the best of my knowledge
 the issues have not been addressed.  Looking below it appears that nothing
 has been done for about 9 months.
 
 Here is the top of the bug report:
 
 Bug ID 6634371
 Synopsis  Solaris ON is broken w.r.t. 64-bit operations on 32-bit 
 processors
 State 1-Dispatched (Default State)
 Category:Subcategory  kernel:other

I believe you misfiled that bug. I've redirected it to

solaris / kernel / arch-x86

which appears to me to be more appropriate.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenSolaris+ZFS+RAIDZ+VirtualBox - ready for production systems?

2008-08-06 Thread Evert Meulie
Oh, I have 'played' with them all: VirtualBox, VMware, KVM...
But now I need to set up a production system for various Linux  Windows 
guests. And none of the 3 mentioned are 100% perfect, so the choice is 
difficult...
My first choice would be KVM+RAIDZ, but since KVM only works on Linux, and 
RAIDZ doesn't work all that well yet on Linux, this is not an option...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-auto-snapshot 0.11 work (was Re: zfs-auto-snapshot with at scheduling )

2008-08-06 Thread Rob
 The other changes that will appear in 0.11 (which is
 nearly done) are:

Still looking forward to seeing .11 :)
Think we can expect a release soon? (or at least svn access so that others can 
check out the trunk?)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Al Hopper
On Wed, Aug 6, 2008 at 8:20 AM, Tom Bird [EMAIL PROTECTED] wrote:
 Hi,

 Have a problem with a ZFS on a single device, this device is 48 1T SATA
 drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
 a ZFS on it as a single device.

 There was a problem with the SAS bus which caused various errors
 including the inevitable kernel panic, the thing came back up with 3 out
 of 4 zfs mounted.

Hi Tom,

After reading this and the followups to date  this could be due to
anything ... and we (on the list) don't know the history of the system
or the RAID device.  You could have a bad SAS controller, bad system
memory, a bad cable or a RAID controller with a firmware bug

The first step would be to form a ZFS pool with 2 mirrors, beat up on
it and gain some confidence in the overall system components.   Write
lots of data to it, run zpool scrub etc. and verify that it's 100%
rock solid before you then zpool destroy it and then test with a
larger pool.  In every case where someone has initially posted an
opening story list yours, the problem has almost always turned out to
be outside of ZFS.   As others have explained, if ZFS does not have a
config with data redundancy - there is not much that can be learned -
except that it just broke.

Keep testing and report back.  Also, any additional data on the
hardware and software config would be useful and let us know if this
is a new system or if the hardware has already been in service and
its reliability track record.

 I've tried reading the partition table with format, works fine, also can
 dd the first 100G from the device quite happily so the communication
 issue appears resolved however the device just won't mount.  Googling
 around I see that ZFS does have features designed to reduce the impact
 of corruption at a particular point, multiple meta data copies and so
 on, however commands to help me tidy up a zfs will only run once the
 thing has been mounted.

 Would be grateful for any ideas, relevant output here:

 [EMAIL PROTECTED]:~# zpool import
  pool: content
id: 14205780542041739352
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
The pool may be active on on another system, but can be imported
 using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-72
 config:

content FAULTED   corrupted data
  c2t9d0ONLINE

 [EMAIL PROTECTED]:~# zpool import content
 cannot import 'content': pool may be in use from other system
 use '-f' to import anyway

 [EMAIL PROTECTED]:~# zpool import -f content
 cannot import 'content': I/O error

 [EMAIL PROTECTED]:~# uname -a
 SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200


 Thanks
 --
 Tom


Regards,

-- 
Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
 Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Carson Gaspar
Brian D. Horn wrote:
 In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches)
 all the known marvell88sx problems have long ago been dealt with.

Not true. The working marvell patches still have not been released for 
Solaris. They're still just IDRs. Unless you know something I (and my 
Sun support reps) don't, in which case please provide patch numbers.

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Brian D. Horn
As far as I can tell from the patch web patches:

For Solaris 10 x86 138053-01 should have the fixes it does
depend on other earlier patches though).  I find it very difficult
to tell what the story is with patches as the patch numbers
seem to have very little in them to correlate them to
code changes.

For Solaris Nevada/OpenSolaris it would seem that the fixes
when back Feb 11, 2008 (though there have been additional
changes to the sata module since then).

Pretty much if you have a version of the driver that still spews informational
messages with marvell88sx in them, you are running old stuff.  If those
messages have been suppressed, odds are that you have new stuff.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Mike Gerdts
On Wed, Aug 6, 2008 at 6:22 PM, Carson Gaspar [EMAIL PROTECTED] wrote:
 Brian D. Horn wrote:
 In the most recent code base (both OpenSolaris/Nevada and S10Ux with patches)
 all the known marvell88sx problems have long ago been dealt with.

 Not true. The working marvell patches still have not been released for
 Solaris. They're still just IDRs. Unless you know something I (and my
 Sun support reps) don't, in which case please provide patch numbers.

I was able to get a Tpatch this week with encouraging words about a
likely release of 138053-02 this week.  In a separate thread last week
(?) Enda said that it should be out within a couple weeks.

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Peter Bortas
On Thu, Aug 7, 2008 at 5:32 AM, Peter Bortas [EMAIL PROTECTED] wrote:
 On Wed, Aug 6, 2008 at 7:31 PM, Bryan Allen [EMAIL PROTECTED] wrote:

 Good afternoon,

 I have a ~600GB zpool living on older Xeons. The system has 8GB of RAM. The
 pool is hanging off two LSI Logic SAS3041X-Rs (no RAID configured).

 When I put a moderate amount of load on the zpool (like, say, copying many
 files locally, or deleting a large number of ZFS fs), the system hangs and
 becomes completely unresponsive, requiring a reboot.

 I have the same problem with 32bit, 2GiB RAM and 6 disk in a 2.7T
 raidz on snv_81. Slightly unbalanced one might say, but it shouldn't
 lock up regardless.

Forgot to mention I run with diffrent controllers: 2 x Sil3114 PCI cards.

-- 
Peter Bortas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Marc Bevand
Bryan, Thomas: these hangs of 32-bit Solaris under heavy (fs, I/O) loads are a 
well known problem. They are caused by memory contention in the kernel heap. 
Check 'kstat vmem::heap'. The usual recommendation is to change the 
kernelbase. It worked for me. See:

http://mail.opensolaris.org/pipermail/zfs-discuss/2008-March/046710.html
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-March/046715.html

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
 From the ZFS Administration Guide, Chapter 11, Data Repair section:
 Given that the fsck utility is designed to repair known pathologies
 specific to individual file systems, writing such a utility for a file
 system with no known pathologies is impossible.

That's a fallacy (and is incorrect even for the UFS fsck; refer to the 
McKusick/Kowalski paper and the distinction they make between 'expected' 
corruptions and other inconsistencies).

First, there are two types of utilities which might be useful in the situation 
where a ZFS pool has become corrupted. The first is a file system checking 
utility (call it zfsck); the second is a data recovery utility. The difference 
between those is that the first tries to bring the pool (or file system) back 
to a usable state, while the second simply tries to recover the files to a new 
location.

What does a file system check do?  It verifies that a file system is internally 
consistent, and makes it consistent if it is not.  If ZFS were always 
consistent on disk, then only a verification would be needed.  Since we have 
evidence that it is not always consistent in the face of hardware failures, at 
least, repair may also be needed.  This doesn't need to be that hard.  For 
instance, the space maps can be reconstructed by walking the various block 
trees; the uberblock effectively has several backups (though it might be better 
in some cases if an older backup were retained); and the ZFS checksums make it 
easy to identify block types and detect bad pointers. Files can be marked as 
damaged if they contain pointers to bad data; directories can be repaired if 
their hash structures are damaged (as long as the names and pointers can be 
salvaged); etc.  Much more complex file systems than ZFS have file system 
checking utilities, because journaling, COW, etc. don't help you in the
  face of software bugs or certain classes of hardware failures.

A recovery tool is even simpler, because all it needs to do is find a tree root 
and then walk the file system, discovering directories and files, verifying 
that each of them is readable by using the checksums to check intermediate and 
leaf blocks, and extracting the data.  The tricky bit with ZFS is simply 
identifying a relatively new root, so that the newest copy of the data can be 
identified.

Almost every file system starts out without an fsck utility, and implements one 
once it becomes obvious that sorry, you have to reinitialize the file system 
-- or worse, sorry, we lost all of your data -- is unacceptable to a certain 
proportion of customers.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
 As others have explained, if ZFS does not have a
 config with data redundancy - there is not much that
 can be learned - except that it just broke.

Plenty can be learned by just looking at the pool.
Unfortunately ZFS currently doesn't have tools which
make that easy; as I understand it, zdb doesn't work
(in a useful way) on a pool which won't import, so
dumping out the raw data structures and looking at
them by hand is the only way to determine what
ZFS doesn't like and deduce what went wrong (and
how to fix it).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Nicolas Williams
On Wed, Aug 06, 2008 at 02:23:44PM -0400, Will Murnane wrote:
 On Wed, Aug 6, 2008 at 13:57, Miles Nordin [EMAIL PROTECTED] wrote:
  If that's really the excuse for this situation, then ZFS is not
  ``always consistent on the disk'' for single-VDEV pools.
 Well, yes.  If data is sent, but corruption somewhere (the SAS bus,
 apparently, here) causes bad data to be written, ZFS can generally
 detect but not fix that.  It might be nice to have a verifywrites
 mode or something similar to make sure that good data has ended up on
 disk (at least at the time it checks), but failing that there's not
 much ZFS (or any filesystem) can do.  Using a pool with some level of
 redundancy (mirroring, raidz) at least gives zfs a chance to read the
 missing pieces from the redundancy that it's kept.

There's also ditto blocks.  So even on a one vdev pool you ZFS can
recover from random corruption unless you're really unlucky.

Of course, this is a feature.  Without ZFS the OP would have had silent,
undetected (by the OS that is) data corruption.

Basically you don't want to have one-vdev pools.  If you'll use HW RAID
then you should also do mirroring at the ZFS layer.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Nicolas Williams
On Wed, Aug 06, 2008 at 03:44:08PM -0400, Miles Nordin wrote:
  re == Richard Elling [EMAIL PROTECTED] writes:
 
  c  If that's really the excuse for this situation, then ZFS is
  c not ``always consistent on the disk'' for single-VDEV pools.
 
 re I disagree with your assessment.  The on-disk format (any
 re on-disk format) necessarily assumes no faults on the media.
 
 The media never failed, only the connection to the media.  We've every
 good reason to believe that every CDB that the storage controller
 acknowledged as complete, was completed and is still there---and that
 is the only statement which must be true of unfaulty media.  We've no
 strong reason to doubt it.

zdb should be able to pinpoint the problem, no?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on 32bit.

2008-08-06 Thread Brian D. Horn
Yes, there have been bugs with heavy I/O and ZFS running the system
out of memory.  However, there was a contention in the thread
about it possibly being due to marvell88sx driver bugs (most likely not).
Further, my mention of 32-bit Solaris being unsafe at any speed is still
true.  Without analysis of a specific hang it is very hard to say what caused 
it.
It could be driver, memory exhaustion, file system error, VM error, broken
hardware, or any number of other things.  My points were
1) The marvell88sx driver should be pretty solid at this point in time (yes, 
earlier
releases had problems, most of which were related to bad block handling), 
and
2) There are systemic issues in Solaris on 32-bit architectures (of which only
x86 is supported).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool upgrade wrecked GRUB

2008-08-06 Thread andrew
 so finally, I gathered up some courage and
 installgrub /boot/grub/stage1 /boot/grub/stage2
 /dev/rdsk/c2d0s0 seemed to write out what I assume
 is a new MBR. 

Not the MBR - the stage1 and 2 files are written to the boot area of the 
Solaris FDISK partition.

 tried to also installgrub on the other
 disk in the mirror c3d0 and failed over several
 permuationscannot open/stat /dev/rdsk/c3d0s2 was
 the error msg.

This is because installgrub needs the overlap slice to be present as slice 2 
for some reason. The overlap slice, also called the backup slice, covers the 
whole of the Solaris FDISK partition. If you don't have one on your second 
disk, just create one.

 
 however a reboot from dsk/c2dos0 gave me a healthy
 and unchanged grub stage2 menu and functioning system
 again . whew
 
 Although I cannot prove causality here, I still think
 that the zpool upgrade ver.10 - ver.11 borked the
 MBR. indeed, probably the stage2 sectors, i guess. 

No - upgrading a ZFS pool doesn't touch the MBR or the stage2. The problem is 
that the grub ZFS filesystem reader needs updated to understand the version 11 
pool. This doesn't (yet) happen automatically.

 
 I also seem to also only have single MBR between the
  two disks in the mirror. is this normal?

Not really normal, but at present manually creating a ZFS boot mirror in this 
way does not set the 2nd disk up correctly, as you've discovered. To write a 
new Solaris grub MBR to the second disk, do this:

installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3d0s0

The -m flag tells installgrub to put the grub stage1 into the MBR.

Cheers

Andrew.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
 re == Richard Elling [EMAIL PROTECTED] writes:

re If your pool is not redundant, the chance that data
re corruption can render some or all of your data inaccessible is
re always present.

1. data corruption != unclean shutdown

2. other filesystems do not need a mirror to recover from unclean
   shutdown.  They only need it for when disks fail, or for when disks
   misremember their contents (silent corruption, as in NetApp paper).

   I would call data corruption and silent corruption the same thing:
   what the CKSUM column was _supposed_ to count, though not in fact
   the only thing it counts.

3. saying ZFS needs a mirror to recover from unclean shutdown does not
   agree with the claim ``always consistent on the disk''

4. I'm not sure exactly your position.  Before you were saying what
   Erik warned about doesn't happen, because there's no CR, and Tom
   must be confused too.  Now you're saying of course it happens,
   ZFS's claims of ``always consistent on disk'' count for nothing
   unless you have pool redundancy.


And that is exactly what I said to start with:

re In general, ZFS can only repair conditions for which it owns
re data redundancy.

 c If that's really the excuse for this situation, then ZFS is
 c not ``always consistent on the disk'' for single-VDEV pools.

that is the take-home message?

If so, it still leaves me with the concern, what if the breaking of
one component in a mirrored vdev takes my system down uncleanly?  This
seems like a really plausible failure mode (as Tom said, ``the
inevitable kernel panic'').

In that case, I no longer have any redundancy when the system boots
back up.  If ZFS calls the inconsistent states through which it
apparently sometimes transitions pools ``data corruption'' and depends
on redundancy to recover from them, then isn't it extremely dangerous
to remove power or SAN connectivity from any DEGRADED pool?  The pool
should be rebuilt onto a hot spare IMMEDIATELY so that it's ONLINE as
soon as possible, because if ZFS loses power with a DEGRADED pool all
bets are off.

If this DEGRADED-pool unclean shutdown is, as you say, a completely
different scenario from single-vdev pools that isn't dangerous and has
no trouble with ZFS corruption, then no one should ever run a
single-vdev pool.  We should instead run mirrored vdevs that are
always DEGRADED, since this configuration looks identical to
everything outside ZFS but supposedly magically avoids the issue.  If
only we had some way to attach to vdevs fake mirror components that
immediately get marked FAULTED then we can avoid the corruption risk.
But, that's clearly absurd!

so, let's say ZFS's requirement is, as we seem to be describing it:
might lose the whole pool if your kernel panics or you pull the power
cord in a situation without redundancy.  Then I think this is an
extremely serious issue, even for redundant pools.  It is very
plausible that a machine will panic or lose power during a resilver.

And if, on the other hand, ZFS doesn't transition disks through
inconsistent states and then excuse itself calling what it did ``data
corruption'' when it bites you after an unclean shutdown, then what
happened to Erik and Tom?  

It seems to me it is ZFS's fault and can't be punted off to the
administrator's ``asking for it.''


pgpX6I9cpJdn1.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
 nw == Nicolas Williams [EMAIL PROTECTED] writes:

nw  Without ZFS the OP would have had silent, undetected (by the
nw OS that is) data corruption.

It sounds to me more like the system would have paniced as soon as he
pulled the cord, and when it rebooted, it would have rolled the UFS
log and mounted, without even an fsck, with no corruption at all,
silent or otherwise.

Note that the storage controller never even lost power, and does not
appear to be faulty.


pgpOySYkUnQ3V.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss