Hi Carsten,

This was supposed to be fixed in build 164 of Nevada (6742788). If you are still seeing this issue in S11, I think you should raise a bug with relevant details. As Paul has suggested,
    this could also be due to incomplete snapshot.

    I have seen interrupted zfs recv's causing weired bugs.

Thanks,
Deepak.

On 03/27/12 12:44 PM, Carsten John wrote:
Hallo everybody,

I have a Solaris 11 box here (Sun X4270) that crashes with a kernel panic 
during the import of a zpool (some 30TB) containing ~500 zfs filesystems after 
reboot. This causes a reboot loop, until booted single user and removed 
/etc/zfs/zpool.cache.


 From /var/adm/messages:

savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) 
rp=ffffff002f9cec50 addr=20 occurred in module "zfs" due to a NULL pointer 
dereference
savecore: [ID 882351 auth.error] Saving compressed system crash dump in 
/var/crash/vmdump.2

This is what mdb tells:

mdb unix.2 vmcore.2
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp 
scsi_vhci zfs mpt sd ip hook neti arp usba uhci sockfs qlc fctl s1394 kssl lofs 
random fcp idm sata fcip cpc crypto ufs logindmux ptm sppp ]
$c
zap_leaf_lookup_closest+0x45(ffffff0700ca2a98, 0, 0, ffffff002f9cedb0)
fzap_cursor_retrieve+0xcd(ffffff0700ca2a98, ffffff002f9ceed0, ffffff002f9cef10)
zap_cursor_retrieve+0x195(ffffff002f9ceed0, ffffff002f9cef10)
zfs_purgedir+0x4d(ffffff0721d32c20)
zfs_rmnode+0x57(ffffff0721d32c20)
zfs_zinactive+0xb4(ffffff0721d32c20)
zfs_inactive+0x1a3(ffffff0721d3a700, ffffff07149dc1a0, 0)
fop_inactive+0xb1(ffffff0721d3a700, ffffff07149dc1a0, 0)
vn_rele+0x58(ffffff0721d3a700)
zfs_unlinked_drain+0xa7(ffffff07022dab40)
zfsvfs_setup+0xf1(ffffff07022dab40, 1)
zfs_domount+0x152(ffffff07223e3c70, ffffff0717830080)
zfs_mount+0x4e3(ffffff07223e3c70, ffffff07223e5900, ffffff002f9cfe20, 
ffffff07149dc1a0)
fsop_mount+0x22(ffffff07223e3c70, ffffff07223e5900, ffffff002f9cfe20, 
ffffff07149dc1a0)
domount+0xd2f(0, ffffff002f9cfe20, ffffff07223e5900, ffffff07149dc1a0, 
ffffff002f9cfe18)
mount+0xc0(ffffff0713612c78, ffffff002f9cfe98)
syscall_ap+0x92()
_sys_sysenter_post_swapgs+0x149()


I can import the pool readonly.

The server is a mirror for our primary file server and is synced via zfs 
send/receive.

I saw a similar effect some time ago on a opensolaris box (build 111b). That 
time my final solution was to copy over the read only mounted stuff to a newly 
created pool. As it is the second time this failure occures (on different 
machines) I'm really concerned about overall reliability....



Any suggestions?


thx

Carsten
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to