[zfs-discuss] Re: zfs exported a live filesystem

2006-12-12 Thread Jim Hranicky
For the record, this happened with a new filesystem. I didn't
muck about with an old filesystem while it was still mounted, 
I created a new one, mounted it and then accidentally exported
it.

  Except that it doesn't:
  
  # mount /dev/dsk/c1t1d0s0 /mnt
  # share /mnt
  # umount /mnt
  umount: /mnt busy
  # unshare /mnt
  # umount /mnt
 
 If you umount -f it will though!

Well, sure, but I was still surprised that it happened anyway.

 The system is working as designed, the NFS client did
 what it was  supposed to do.  If you brought the pool back in
 again with zpool import  things should have picked up where they left off.

Yep -- an import/shareall made the FS available again.

 Whats more you we probably running as root when you
 did that so you got  what you asked for - there is only so much protection
 we can give  without being annoying!  

Sure, but there are still safeguards in place even when running things
as root, such as requiring umount -f as above, or warning you
when running format on a disk with mounted partitions.

Since this appeared to be an operation that may warrant such a
safeguard I thought I'd check and see if this was to be expected or
if a safeguard should be put in.

Annoying isn't always bad :-

 Now having said that I personally wouldn't have
 expected that zpool  export should have worked as easily as that while
 there where shared  filesystems.  I would have expected that exporting
 the pool should have attempted to unmount all the ZFS filesystems first -
 which would have  failed without a -f flag because they were shared.
 
 So IMO it is a bug or at least an RFE.

Ok, where should I file an RFE?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Kickstart hot spare attachment

2006-12-12 Thread Jim Hranicky
For my latest test I set up a stripe of two mirrors with one hot spare
like so:

zpool create -f -m /export/zmir zmir mirror c0t0d0 c3t2d0 mirror c3t3d0 c3t4d0 
spare c3t1d0

I spun down c3t2d0 and c3t4d0 simultaneously, and while the system kept 
running (my tar over NFS barely hiccuped), the zpool command hung again.

I rebooted the machine with -dnq, and although the system didn't come up
the first time, it did after a fsck and a second reboot. 

However, once again the hot spare isn't getting used:

# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
  the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-D3
 scrub: resilver completed with 0 errors on Tue Dec 12 09:15:49 2006
config:

  NAMESTATE READ WRITE CKSUM
  zmirDEGRADED 0 0 0
mirrorDEGRADED 0 0 0
  c0t0d0  ONLINE   0 0 0
  c3t2d0  UNAVAIL  0 0 0  cannot open
mirrorDEGRADED 0 0 0
  c3t3d0  ONLINE   0 0 0
  c3t4d0  UNAVAIL  0 0 0  cannot open
  spares
c3t1d0AVAIL

A few questions:

- I know I can attach it via the zpool commands, but is there a way to
kickstart the attachment process if it fails to attach automatically upon
disk failure?

- In this instance the spare is twice as big as the other
drives -- does that make a difference? 

- Is there something inherent to an old SCSI bus that causes spun-
down drives to hang the system in some way, even if it's just hanging
the zpool/zfs system calls? Would a thumper be more resilient to this?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
Ok, so I'm planning on wiping my test pool that seems to have problems 
with non-spare disks being marked as spares, but I can't destroy it:

# zpool destroy -f zmir
cannot iterate filesystems: I/O error

Anyone know how I can nuke this for good?

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
BTW, I'm also unable to export the pool -- same error.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
Nevermind:

# zfs destroy [EMAIL PROTECTED]:28
cannot open '[EMAIL PROTECTED]:28': I/O error

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
 You are likely hitting:
 
 6397052 unmounting datasets should process
 /etc/mnttab instead of traverse DSL
 
 Which was fixed in build 46 of Nevada.  In the
 meantime, you can remove
 /etc/zfs/zpool.cache manually and reboot, which will
 remove all your
 pools (which you can then re-import on an individual
 basis).

I'm running b51, but I'll try deleting the cache.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Can't destroy corrupted pool

2006-12-11 Thread Jim Hranicky
This worked. 

I've restarted my testing but I've been fdisking each drive before I 
add it to the pool, and so far the system is behaving as expected
when I spin a drive down, i.e., the hot spare gets automatically used. 
This makes me wonder if it's possible to ensure that the forced
addition of a drive to a pool wipes the pool of any previous data, 
especially any zfs metadata.

I'll keep the list posted as I continue my tests.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs exported a live filesystem

2006-12-11 Thread Jim Hranicky
By mistake, I just exported my test filesystem while it was up
and being served via NFS, causing my tar over NFS to start
throwing stale file handle errors. 

Should I file this as a bug, or should I just not do that :-

Ko,
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Managed to corrupt my pool

2006-12-05 Thread Jim Hranicky
 So the questions are:
 
 - is this fixable? I don't see an inum I could run
  find on to remove, 
and I can't even do a zfs volinit anyway:
nextest-01# zfs volinit
  cannot iterate filesystems: I/O error
 
 - would not enabling zil_disable have prevented
  this?
 
- Should I have been doing a 3-way mirror?
 - Is there a more optimum configuration to help
  prevent this  kind of corruption?

Anyone have any thoughts on this? I'd really like to be 
able to build a nice ZFS box for file service but if a 
hardware failure can corrupt a disk pool I'll have to 
try to find another solution, I'm afraid.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Managed to corrupt my pool

2006-12-05 Thread Jim Hranicky
 Anyone have any thoughts on this? I'd really like to
 be able to build a nice ZFS box for file service but if
 a  hardware failure can corrupt a disk pool I'll have to
  try to find another solution, I'm afraid.

Sorry, I worded this poorly -- if the loss of a disk in a mirror
can corrupt the pool it's going to give me pause in implementing
a ZFS solution. 

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Managed to corrupt my pool

2006-11-30 Thread Jim Hranicky
Platform:

  - old dell workstation with an Andataco gigaraid enclosure 
plugged into an Adaptec 39160
  - Nevada b51

Current zpool config:

   - one two-disk mirror with two hot spares

In my ferocious pounding of ZFS I've managed to corrupt my data
pool. This is what I've been doing to test it:

   - set zil_disable to 1 in /etc/system
   - continually untar a couple of files into the filesystem
   - manually spin down a drive in the mirror by holding down
 the button on the enclosure
   - for any system hangs reboot with a nasty

  reboot -dnq

I've gotten different results after the spindown:

   - works properly: short or no hang, hot spare successfully 
  added to the mirror
   - system hangs, and after a reboot the spare is not added
   - tar hangs, but after running zpool status the hot
  spare is added properly and tar continues
   - tar continues, but hangs on zpool status

The last is what happened just prior to the corruption. Here's the output
of zpool status:

nextest-01# zpool status -v
  pool: zmir
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed with 1 errors on Thu Nov 30 11:37:21 2006
config:

NAMESTATE READ WRITE CKSUM
zmirDEGRADED 8 0 4
  mirrorDEGRADED 8 0 4
c3t3d0  ONLINE   0 024
c3t4d0  UNAVAIL  0 0 0  cannot open
spares
  c0t0d0AVAIL
  c3t1d0AVAIL

errors: The following persistent errors have been detected:

  DATASET  OBJECT  RANGE
  15   0   lvl=4294967295 blkid=0

So the questions are:

  - is this fixable? I don't see an inum I could run find on to remove, 
and I can't even do a zfs volinit anyway:

nextest-01# zfs volinit
cannot iterate filesystems: I/O error

   - would not enabling zil_disable have prevented this?

   - Should I have been doing a 3-way mirror?

   - Is there a more optimum configuration to help prevent this
  kind of corruption?

Ultimately, I want to build a ZFS server with performance and reliability
comparable to say, a Netapp, but the fact that I appear to have been
able to nuke my pool by simulating a hardware error gives me pause. 

I'd love to know if I'm off-base in my worries.

Jim
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs hot spare not automatically getting used

2006-11-29 Thread Jim Hranicky
I know this isn't necessarily ZFS specific, but after I reboot I spin the 
drives back
up, but nothing I do (devfsadm, disks, etc) can get them seen again until the
next reboot.

I've got some older scsi drives in an old Andataco Gigaraid enclosure which
I thought supported hot-swap, but I seem unable to hot swap them in. The PC
has an adaptec 39160 card in it and I'm running Nevada b51. Is this not a 
setup that can support hot swap? Or is there something I have to do other
than devfsadm to get the scsi bus rescanned?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: zfs hot spare not automatically getting used

2006-11-28 Thread Jim Hranicky
So is there a command to make the spare get used, or
so I have to remove it as a spare and add it if it doesn't
get automatically used?

Is this a bug to be fixed, or will this always be the case when
the disks aren't exactly the same size?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss