Re: [zfs-discuss] What does dataset is busy actually mean?

2008-01-12 Thread Reid Spencer
Yes, it seems that mounting it and unmounting it with the zfs command clears 
the condition and allows the data set to be destroyed. Seems this is a bug in 
zfs, or at least an annoyance. I verified with fuser that no processes were 
using the file system. 

Now, what I'd really like to know, is what causes a dataset to get into this 
state?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What does dataset is busy actually mean?

2008-01-12 Thread Reid Spencer
Hmm, actually, not.

I just ran into a dataset where the mount/unmount doesn't clear the condition. 
I still get dataset is busy when attempting to destroy it.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What does dataset is busy actually mean? [creating snap]

2008-01-12 Thread Rob Logan

  what causes a dataset to get into this state?

while I'm not exactly sure, I do have the steps leading up to when
I saw it trying to create a snapshot. ie:

10 % zfs snapshot z/b80nd/[EMAIL PROTECTED]
cannot create snapshot 'z/b80nd/[EMAIL PROTECTED]': dataset is busy
13 % mount -F zfs z/b80nd/var /z/b80nd/var
mount: Mount point /z/b80nd/var does not exist.
14 % mount -F zfs z/b80nd/var /mnt
15 % zfs snapshot -r z/[EMAIL PROTECTED]
16 % zfs list | grep 0107
root/0107nd455M   107G  6.03G  legacy
root/[EMAIL PROTECTED] 50.5M  -  6.02G  -
z/[EMAIL PROTECTED]0  -   243M  -
z/b80nd/[EMAIL PROTECTED]0  -  1.18G  -
z/b80nd/[EMAIL PROTECTED]0  -  2.25G  -
z/b80nd/[EMAIL PROTECTED]0  -  56.3M  -

running 64bit opensol-20080107 on intel

to get there I was walking through this cookbook:

zfs snapshot root/[EMAIL PROTECTED]
zfs clone root/[EMAIL PROTECTED] root/0107nd
cat /etc/vfstab | sed s/^root/#root/ | sed s/^z/#z/   /root/0107nd/ 
etc/vfstab
echo root/0107nd - / zfs - no -  /root/0107nd/etc/vfstab
cat /root/0107nd/etc/vfstab
zfs snapshot -r z/[EMAIL PROTECTED]
rsync -a --del --verbose /usr/.zfs/snapshot/dump/ /root/0107nd/usr
rsync -a --del --verbose /opt/.zfs/snapshot/dump/ /root/0107nd/opt
rsync -a --del --verbose /var/.zfs/snapshot/dump/ /root/0107nd/var
zfs set mountpoint=legacy root/0107nd
zpool set bootfs=root/0107nd root
reboot

mkdir -p /z/tmp/bfu ; cd /z/tmp/bfu
wget http://dlc.sun.com/osol/on/downloads/20080107/SUNWonbld.i386.tar.bz2
bzip2 -d -c SUNWonbld.i386.tar.bz2 | tar -xvf -
pkgadd -d onbld
wget 
http://dlc.sun.com/osol/on/downloads/20080107/on-bfu-nightly-osol-nd.i386.tar.bz2
bzip2 -d -c on-bfu-nightly-osol-nd.i386.tar.bz2 | tar -xvf -
setenv FASTFS /opt/onbld/bin/i386/fastfs
setenv BFULD /opt/onbld/bin/i386/bfuld
setenv GZIPBIN /usr/bin/gzip
/opt/onbld/bin/bfu /z/tmp/bfu/archives-nightly-osol-nd/i386
/opt/onbld/bin/acr
echo etc/zfs/zpool.cache  /boot/solaris/filelist.ramdisk  ; echo bug  
in bfu
reboot

rm -rf /bfu* /.make* /.bfu*
zfs snapshot root/[EMAIL PROTECTED]
mount -F zfs z/b80nd/var /mnt  ; echo bug in zfs
zfs snapshot -r z/[EMAIL PROTECTED]
zfs clone z/[EMAIL PROTECTED] z/0107nd
zfs set compression=lzjb z/0107nd
zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/usr
zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/var
zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/opt
rsync -a --del --verbose /.zfs/snapshot/dump/ /z/0107nd
zfs set mountpoint=legacy z/0107nd/usr
zfs set mountpoint=legacy z/0107nd/opt
zfs set mountpoint=legacy z/0107nd/var
echo z/0107nd/usr - /usr zfs - yes -  /etc/vfstab
echo z/0107nd/var - /var zfs - yes -  /etc/vfstab
echo z/0107nd/opt - /opt zfs - yes -  /etc/vfstab
reboot

heh heh, booting from a clone of a clone... waisted space under
root/`uname -v`/usr for a few libs needed at boot, but having
/usr /var /opt on the compressed pool with two raidz vdevs boots
to login in 45secs rather than 52secs on the single vdev root pool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Phenom support in b78

2008-01-12 Thread Alan Romeril
Hello All,
In a moment of insanity I've upgraded from a 5200+ to a Phenom 9600 on my 
zfs server and I've had a lot of problems with hard hangs when accessing the 
pool.
The motherboard is an Asus M2N32-WS, which has had the latest available BIOS 
upgrade installed to support the Phenom.

bash-3.2# psrinfo -pv
The physical processor has 4 virtual processors (0-3)
  x86 (AuthenticAMD 100F22 family 16 model 2 step 2 clock 2310 MHz)
AMD Phenom(tm) 9600 Quad-Core Processor

The pool is spread across 12 disks ( 3 x 4 disk raidz groups ) attached to 
both the motherboard and a Supermicro AOC-SAT2-MV8 in a PCI-X slot (marvell88sx 
driver).  The hangs occur during large writes to the pool, i.e a 10G mkfile, 
usually just after the physical disk access start, and the file is not created 
in the directory on the pool at all.  The system hard hangs at this point, even 
with booting under kmdb there's no panic string and after setting snooping=1 in 
/etc/system there's no crash dump created after it reboots.  Doing the same 
operation to a single UFS disk attached to the motherboard's ATA133 interface 
doesn't cause a problem, neither does writing to a raidz pool created from 4 
files on that ATA disk.  If I use psradm and disable any 2 cores on the Phenom 
there's no problem with the mkfile either, but turn a third on and it'll hang.  
This is with the virtualization, and power now extensions disabled in the BIOS.

So, before I go and shout at the motherboard manufacturer are there any 
components in b78 that might not be expecting a quad core AMD cpu?  Possibly in 
the marvell88sx driver?  Or is there anything more I can do to track this issue 
down.

Thanks,
Alan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disk array problems - any suggestions?

2008-01-12 Thread Michael Stalnaker
All;

I have a 24-disk SATA array attached to an HP DL160 with a LSI 3801E for the
controller. We've been seeing errors that look like:

WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0);
Disconnected command timeout for Target 23
WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0);
Disconnected command timeout for Target 23

   SCSI transport failed: reason 'reset': giving up


WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0);
Disconnected command timeout for Target 23
WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0);
Disconnected command timeout for Target 23

When these occur, the system hangs on any access to the array and never
recovers. After some discussions with some folks at Sun, I rebuilt the
system from Solaris 10 x 86 Update 4 to run Open Solaris. It's currently on
Solaris Express (Nevada) build 78, and these errors are continuing. The
drives are the 750g hitachis, and after power cycle and reboot, the error
does not persist on one drive. Each of the drives is in a carrier with some
active electronics to adapt the SATA drives for SAS use. My fear at the
moment is that there's some sort of problem with the 24 drive enclosure
itself as the drives appear to be fine, and I cannot believe we're seeing an
intermittent failure across a number of drives.

Any suggestions would be appreciated.

--Mike Stalnaker

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Panic on Zpool Import (Urgent)

2008-01-12 Thread Ben Rockwood
Today, suddenly, without any apparent reason that I can find, I'm 
getting panic's during zpool import.  The system paniced earlier today 
and has been suffering since.  This is snv_43 on a thumper.  Here's the 
stack:

panic[cpu0]/thread=99adbac0: assertion failed: ss != NULL, file: 
../../common/fs/zfs/space_map.c, line: 145

fe8000a240a0 genunix:assfail+83 ()
fe8000a24130 zfs:space_map_remove+1d6 ()
fe8000a24180 zfs:space_map_claim+49 ()
fe8000a241e0 zfs:metaslab_claim_dva+130 ()
fe8000a24240 zfs:metaslab_claim+94 ()
fe8000a24270 zfs:zio_dva_claim+27 ()
fe8000a24290 zfs:zio_next_stage+6b ()
fe8000a242b0 zfs:zio_gang_pipeline+33 ()
fe8000a242d0 zfs:zio_next_stage+6b ()
fe8000a24320 zfs:zio_wait_for_children+67 ()
fe8000a24340 zfs:zio_wait_children_ready+22 ()
fe8000a24360 zfs:zio_next_stage_async+c9 ()
fe8000a243a0 zfs:zio_wait+33 ()
fe8000a243f0 zfs:zil_claim_log_block+69 ()
fe8000a24520 zfs:zil_parse+ec ()
fe8000a24570 zfs:zil_claim+9a ()
fe8000a24750 zfs:dmu_objset_find+2cc ()
fe8000a24930 zfs:dmu_objset_find+fc ()
fe8000a24b10 zfs:dmu_objset_find+fc ()
fe8000a24bb0 zfs:spa_load+67b ()
fe8000a24c20 zfs:spa_import+a0 ()
fe8000a24c60 zfs:zfs_ioc_pool_import+79 ()
fe8000a24ce0 zfs:zfsdev_ioctl+135 ()
fe8000a24d20 genunix:cdev_ioctl+55 ()
fe8000a24d60 specfs:spec_ioctl+99 ()
fe8000a24dc0 genunix:fop_ioctl+3b ()
fe8000a24ec0 genunix:ioctl+180 ()
fe8000a24f10 unix:sys_syscall32+101 ()

syncing file systems... done

This is almost identical to a post to this list over a year ago titled 
ZFS Panic.  There was follow up on it but the results didn't make it 
back to the list.

I spent time doing a full sweep for any hardware failures, pulled 2 
drives that I suspected as problematic but weren't flagged as such, etc, 
etc, etc.  Nothing helps.

Bill suggested a 'zpool import -o ro' on the other post, but thats not 
working either.

I _can_ use 'zpool import' to see the pool, but I have to force the 
import.  A simple 'zpool import' returns output in about a minute.  
'zpool import -f poolname' takes almost exactly 10 minutes every single 
time, like it hits some timeout and then panics.

I did notice that while the 'zpool import' is running 'iostat' is 
useless, just hangs.  I still want to believe this is some device 
misbehaving but I have no evidence to support that theory.

Any and all suggestions are greatly appreciated.  I've put around 8 
hours into this so far and I'm getting absolutely nowhere.

Thanks

benr.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss