Re: [zfs-discuss] What does dataset is busy actually mean?
Yes, it seems that mounting it and unmounting it with the zfs command clears the condition and allows the data set to be destroyed. Seems this is a bug in zfs, or at least an annoyance. I verified with fuser that no processes were using the file system. Now, what I'd really like to know, is what causes a dataset to get into this state? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What does dataset is busy actually mean?
Hmm, actually, not. I just ran into a dataset where the mount/unmount doesn't clear the condition. I still get dataset is busy when attempting to destroy it. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What does dataset is busy actually mean? [creating snap]
what causes a dataset to get into this state? while I'm not exactly sure, I do have the steps leading up to when I saw it trying to create a snapshot. ie: 10 % zfs snapshot z/b80nd/[EMAIL PROTECTED] cannot create snapshot 'z/b80nd/[EMAIL PROTECTED]': dataset is busy 13 % mount -F zfs z/b80nd/var /z/b80nd/var mount: Mount point /z/b80nd/var does not exist. 14 % mount -F zfs z/b80nd/var /mnt 15 % zfs snapshot -r z/[EMAIL PROTECTED] 16 % zfs list | grep 0107 root/0107nd455M 107G 6.03G legacy root/[EMAIL PROTECTED] 50.5M - 6.02G - z/[EMAIL PROTECTED]0 - 243M - z/b80nd/[EMAIL PROTECTED]0 - 1.18G - z/b80nd/[EMAIL PROTECTED]0 - 2.25G - z/b80nd/[EMAIL PROTECTED]0 - 56.3M - running 64bit opensol-20080107 on intel to get there I was walking through this cookbook: zfs snapshot root/[EMAIL PROTECTED] zfs clone root/[EMAIL PROTECTED] root/0107nd cat /etc/vfstab | sed s/^root/#root/ | sed s/^z/#z/ /root/0107nd/ etc/vfstab echo root/0107nd - / zfs - no - /root/0107nd/etc/vfstab cat /root/0107nd/etc/vfstab zfs snapshot -r z/[EMAIL PROTECTED] rsync -a --del --verbose /usr/.zfs/snapshot/dump/ /root/0107nd/usr rsync -a --del --verbose /opt/.zfs/snapshot/dump/ /root/0107nd/opt rsync -a --del --verbose /var/.zfs/snapshot/dump/ /root/0107nd/var zfs set mountpoint=legacy root/0107nd zpool set bootfs=root/0107nd root reboot mkdir -p /z/tmp/bfu ; cd /z/tmp/bfu wget http://dlc.sun.com/osol/on/downloads/20080107/SUNWonbld.i386.tar.bz2 bzip2 -d -c SUNWonbld.i386.tar.bz2 | tar -xvf - pkgadd -d onbld wget http://dlc.sun.com/osol/on/downloads/20080107/on-bfu-nightly-osol-nd.i386.tar.bz2 bzip2 -d -c on-bfu-nightly-osol-nd.i386.tar.bz2 | tar -xvf - setenv FASTFS /opt/onbld/bin/i386/fastfs setenv BFULD /opt/onbld/bin/i386/bfuld setenv GZIPBIN /usr/bin/gzip /opt/onbld/bin/bfu /z/tmp/bfu/archives-nightly-osol-nd/i386 /opt/onbld/bin/acr echo etc/zfs/zpool.cache /boot/solaris/filelist.ramdisk ; echo bug in bfu reboot rm -rf /bfu* /.make* /.bfu* zfs snapshot root/[EMAIL PROTECTED] mount -F zfs z/b80nd/var /mnt ; echo bug in zfs zfs snapshot -r z/[EMAIL PROTECTED] zfs clone z/[EMAIL PROTECTED] z/0107nd zfs set compression=lzjb z/0107nd zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/usr zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/var zfs clone z/b80nd/[EMAIL PROTECTED] z/0107nd/opt rsync -a --del --verbose /.zfs/snapshot/dump/ /z/0107nd zfs set mountpoint=legacy z/0107nd/usr zfs set mountpoint=legacy z/0107nd/opt zfs set mountpoint=legacy z/0107nd/var echo z/0107nd/usr - /usr zfs - yes - /etc/vfstab echo z/0107nd/var - /var zfs - yes - /etc/vfstab echo z/0107nd/opt - /opt zfs - yes - /etc/vfstab reboot heh heh, booting from a clone of a clone... waisted space under root/`uname -v`/usr for a few libs needed at boot, but having /usr /var /opt on the compressed pool with two raidz vdevs boots to login in 45secs rather than 52secs on the single vdev root pool. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Phenom support in b78
Hello All, In a moment of insanity I've upgraded from a 5200+ to a Phenom 9600 on my zfs server and I've had a lot of problems with hard hangs when accessing the pool. The motherboard is an Asus M2N32-WS, which has had the latest available BIOS upgrade installed to support the Phenom. bash-3.2# psrinfo -pv The physical processor has 4 virtual processors (0-3) x86 (AuthenticAMD 100F22 family 16 model 2 step 2 clock 2310 MHz) AMD Phenom(tm) 9600 Quad-Core Processor The pool is spread across 12 disks ( 3 x 4 disk raidz groups ) attached to both the motherboard and a Supermicro AOC-SAT2-MV8 in a PCI-X slot (marvell88sx driver). The hangs occur during large writes to the pool, i.e a 10G mkfile, usually just after the physical disk access start, and the file is not created in the directory on the pool at all. The system hard hangs at this point, even with booting under kmdb there's no panic string and after setting snooping=1 in /etc/system there's no crash dump created after it reboots. Doing the same operation to a single UFS disk attached to the motherboard's ATA133 interface doesn't cause a problem, neither does writing to a raidz pool created from 4 files on that ATA disk. If I use psradm and disable any 2 cores on the Phenom there's no problem with the mkfile either, but turn a third on and it'll hang. This is with the virtualization, and power now extensions disabled in the BIOS. So, before I go and shout at the motherboard manufacturer are there any components in b78 that might not be expecting a quad core AMD cpu? Possibly in the marvell88sx driver? Or is there anything more I can do to track this issue down. Thanks, Alan This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Disk array problems - any suggestions?
All; I have a 24-disk SATA array attached to an HP DL160 with a LSI 3801E for the controller. We've been seeing errors that look like: WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0); Disconnected command timeout for Target 23 WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0); Disconnected command timeout for Target 23 SCSI transport failed: reason 'reset': giving up WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0); Disconnected command timeout for Target 23 WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],3/pci1000,[EMAIL PROTECTED] (mpt0); Disconnected command timeout for Target 23 When these occur, the system hangs on any access to the array and never recovers. After some discussions with some folks at Sun, I rebuilt the system from Solaris 10 x 86 Update 4 to run Open Solaris. It's currently on Solaris Express (Nevada) build 78, and these errors are continuing. The drives are the 750g hitachis, and after power cycle and reboot, the error does not persist on one drive. Each of the drives is in a carrier with some active electronics to adapt the SATA drives for SAS use. My fear at the moment is that there's some sort of problem with the 24 drive enclosure itself as the drives appear to be fine, and I cannot believe we're seeing an intermittent failure across a number of drives. Any suggestions would be appreciated. --Mike Stalnaker ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Panic on Zpool Import (Urgent)
Today, suddenly, without any apparent reason that I can find, I'm getting panic's during zpool import. The system paniced earlier today and has been suffering since. This is snv_43 on a thumper. Here's the stack: panic[cpu0]/thread=99adbac0: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 145 fe8000a240a0 genunix:assfail+83 () fe8000a24130 zfs:space_map_remove+1d6 () fe8000a24180 zfs:space_map_claim+49 () fe8000a241e0 zfs:metaslab_claim_dva+130 () fe8000a24240 zfs:metaslab_claim+94 () fe8000a24270 zfs:zio_dva_claim+27 () fe8000a24290 zfs:zio_next_stage+6b () fe8000a242b0 zfs:zio_gang_pipeline+33 () fe8000a242d0 zfs:zio_next_stage+6b () fe8000a24320 zfs:zio_wait_for_children+67 () fe8000a24340 zfs:zio_wait_children_ready+22 () fe8000a24360 zfs:zio_next_stage_async+c9 () fe8000a243a0 zfs:zio_wait+33 () fe8000a243f0 zfs:zil_claim_log_block+69 () fe8000a24520 zfs:zil_parse+ec () fe8000a24570 zfs:zil_claim+9a () fe8000a24750 zfs:dmu_objset_find+2cc () fe8000a24930 zfs:dmu_objset_find+fc () fe8000a24b10 zfs:dmu_objset_find+fc () fe8000a24bb0 zfs:spa_load+67b () fe8000a24c20 zfs:spa_import+a0 () fe8000a24c60 zfs:zfs_ioc_pool_import+79 () fe8000a24ce0 zfs:zfsdev_ioctl+135 () fe8000a24d20 genunix:cdev_ioctl+55 () fe8000a24d60 specfs:spec_ioctl+99 () fe8000a24dc0 genunix:fop_ioctl+3b () fe8000a24ec0 genunix:ioctl+180 () fe8000a24f10 unix:sys_syscall32+101 () syncing file systems... done This is almost identical to a post to this list over a year ago titled ZFS Panic. There was follow up on it but the results didn't make it back to the list. I spent time doing a full sweep for any hardware failures, pulled 2 drives that I suspected as problematic but weren't flagged as such, etc, etc, etc. Nothing helps. Bill suggested a 'zpool import -o ro' on the other post, but thats not working either. I _can_ use 'zpool import' to see the pool, but I have to force the import. A simple 'zpool import' returns output in about a minute. 'zpool import -f poolname' takes almost exactly 10 minutes every single time, like it hits some timeout and then panics. I did notice that while the 'zpool import' is running 'iostat' is useless, just hangs. I still want to believe this is some device misbehaving but I have no evidence to support that theory. Any and all suggestions are greatly appreciated. I've put around 8 hours into this so far and I'm getting absolutely nowhere. Thanks benr. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss