[zfs-discuss] ZFS panic on blade BL465c G1
Hello list, I got a c7000 with BL465c G1 blades to play with and have been trying to get some form of Solaris to work on it. However, this is the state: OpenSolaris 134: Installs with ZFS, but no BNX nic drivers. OpenIndiana 147: Panics on zpool create everytime, even from console. Has no UFS option, has NICS. Solaris 10 u9: Panics on zpool create, but has UFS option, has NICs. One option would be to get 147 NIC drivers for 134. But for now, the ZFS panics happens on create. The blade has a HP Smart Array e200i card in it, both HDDs set up as single HDD logical volumes; # format AVAILABLE DISK SELECTIONS: 0. c1t0d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63 /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@0,0 1. c1t1d0 DEFAULT cyl 17841 alt 2 hd 255 sec 63 /p...@2,0/pci1166,1...@11/pci1166,1...@0/pci103c,3...@8/s...@1,0 1; p; p Total disk cylinders available: 17841 + 2 (reserved cylinders) Part TagFlag Cylinders SizeBlocks 0 rootwm 1 - 17840 136.66GB(17840/0/0) 286599600 1 unassignedwu 00 (0/0/0) 0 2 backupwm 0 - 17840 136.67GB(17841/0/0) 286615665 3 unassignedwm 00 (0/0/0) 0 4 unassignedwm 00 (0/0/0) 0 5 unassignedwm 00 (0/0/0) 0 6 unassignedwm 00 (0/0/0) 0 7 unassignedwm 00 (0/0/0) 0 8 bootwu 0 - 07.84MB(1/0/0) 16065 # zpool create -f zboot c1t1d0s0 panic[cpu2]/thread=fe80011a2c60: BAD TRAP: type=e (#pf Page fault) rp=fe80011a2940 addr=278 occurred in modul e unix due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x278 pid=0, pc=0xfb8406fb, sp=0xfe80011a2a38, eflags=0x10246 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 278 cr3: 1161f000 cr8: c rdi: 278 rsi:4 rdx: fe80011a2c60 rcx: 14 r8:0 r9:0 rax:0 rbx: 278 rbp: fe80011a2a60 r10:0 r11:1 r12: 10 r13:0 r14:4 r15: 9bb02ef0 fsb:0 gsb: 8ac7a800 ds: 43 es: 43 fs:0 gs: 1c3 trp:e err:2 rip: fb8406fb cs: 28 rfl:10246 rsp: fe80011a2a38 ss: 30 fe80011a2850 unix:die+da () fe80011a2930 unix:trap+5e6 () fe80011a2940 unix:cmntrap+140 () fe80011a2a60 unix:mutex_enter+b () fe80011a2a70 zfs:zio_buf_alloc+1d () fe80011a2aa0 zfs:zio_vdev_io_start+120 () fe80011a2ad0 zfs:zio_execute+7b () fe80011a2af0 zfs:zio_nowait+1a () fe80011a2b60 zfs:vdev_probe+f0 () fe80011a2ba0 zfs:vdev_open+2b1 () fe80011a2bc0 zfs:vdev_open_child+21 () fe80011a2c40 genunix:taskq_thread+295 () fe80011a2c50 unix:thread_start+8 () syncing file systems... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic on blade BL465c G1
On Oct 3, 2010, at 7:22 PM, Jorgen Lundman lund...@lundman.net wrote: One option would be to get 147 NIC drivers for 134. IIRC, the bnx drivers are closed source and obtained from Broadcom. No need to upgrade OS just for a NIC driver. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic
On 04/ 2/10 10:25 AM, Ian Collins wrote: Is this callstack familiar to anyone? It just happened on a Solaris 10 update 8 box: genunix: [ID 655072 kern.notice] fe8000d1b830 unix:real_mode_end+7f81 () genunix: [ID 655072 kern.notice] fe8000d1b910 unix:trap+5e6 () genunix: [ID 655072 kern.notice] fe8000d1b920 unix:_cmntrap+140 () genunix: [ID 655072 kern.notice] fe8000d1ba40 zfs:zfs_space_delta_cb+46 () genunix: [ID 655072 kern.notice] fe8000d1ba80 zfs:dmu_objset_do_userquota_callbacks+b9 () genunix: [ID 655072 kern.notice] fe8000d1bae0 zfs:dsl_pool_sync+df () genunix: [ID 655072 kern.notice] fe8000d1bb90 zfs:spa_sync+29d () genunix: [ID 655072 kern.notice] fe8000d1bc40 zfs:txg_sync_thread+1f0 () genunix: [ID 655072 kern.notice] fe8000d1bc50 unix:thread_start+8 () I've seen a couple more of these, they look very similar (same stack, slightly different offsets) to 6886691 and 6885428. Both of these are closed as not reproducible. I guess I'd better open a new case... -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic
Is this callstack familiar to anyone? It just happened on a Solaris 10 update 8 box: genunix: [ID 655072 kern.notice] fe8000d1b830 unix:real_mode_end+7f81 () genunix: [ID 655072 kern.notice] fe8000d1b910 unix:trap+5e6 () genunix: [ID 655072 kern.notice] fe8000d1b920 unix:_cmntrap+140 () genunix: [ID 655072 kern.notice] fe8000d1ba40 zfs:zfs_space_delta_cb+46 () genunix: [ID 655072 kern.notice] fe8000d1ba80 zfs:dmu_objset_do_userquota_callbacks+b9 () genunix: [ID 655072 kern.notice] fe8000d1bae0 zfs:dsl_pool_sync+df () genunix: [ID 655072 kern.notice] fe8000d1bb90 zfs:spa_sync+29d () genunix: [ID 655072 kern.notice] fe8000d1bc40 zfs:txg_sync_thread+1f0 () genunix: [ID 655072 kern.notice] fe8000d1bc50 unix:thread_start+8 () -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
Andre van Eyssen an...@purplecow.org wrote: On Fri, 10 Apr 2009, Rince wrote: FWIW, I strongly expect live ripping of a SATA device to not panic the disk layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be fault-tolerant and drive dropping away at any time is a rather expected scenario. Ripping a SATA device out runs a goodly chance of confusing the controller. If you'd had this problem with fibre channel or even SCSI, I'd find it a far bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like to level at them. Of course, this boils down to controller and enclosure and a lot of other random chances for disaster. PATA (ide) does not support hpt-plug, SATA does and SATA uses the same interface as SAS does. I would expect that there is no difference between unplugging a SATA drive and unplugging a SAS drive. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
r == Rince rincebr...@gmail.com writes: r *ZFS* shouldn't panic under those conditions. The disk layer, r perhaps, but not ZFS. well, yes, but panicing brings down the whole box anyway so there is no practical difference, just a difference in blame. I would rather say, the fact that redundant ZFS ought to be the best-practice proper way to configure ~all filesystems in the future, means that disk drivers in the future ought to expect to have ZFS above them, so panicing when enough drives are still available to keep the pool up isn't okay, and also it's not okay to let problems with one drive interrupt access to other drives, and finally we've still no reasonably-practiceable consensus on how to deal with timeout problems, like vanishing iSCSI targets, and ATA targets that remain present but take 1000x longer to respond to each command, as ATA disks often do when they're failing and as is, I suspect, well-handled by all the serious hardware RAID storage vendors. With some chips writing a good driver has proven (on Linux) to be impossible, or beyond the skill of the person who adopted the chip, or beyond the effort warranted by the chip's interestingness. well, fine, but these things are certainly important enough to document, and on Linux they ARE documented: http://ata.wiki.kernel.org/index.php/SATA_hardware_features It's kind of best-effort, but still it's a lot better than ``all those problems on X4500 were fixed AGES ago, just upgrade'' / ``still having problems'' / ``ok they are all fixed now'' / ``no they're not, still can't hotplug, still no NCQ'' / ``well they are much more stable now.'' / ``can I hotplug? is NCQ working?'' / ... Note the LSI 1068 IT-mode cards driven by the proprietary 'mpt' driver are supported, by a GPL driver, on Linux, and smartctl works on these cards. but they don't appear on the wiki above, so Linux's list of chip features isn't complete, but it's a start. r As far as it should be concerned, it's equivalent to ejecting r a disk via cfgadm without telling ZFS first, which *IS* a r supported operation. an interesting point! Either way, though, we're responsible for the whole system. ``Our new handsets have microkernels, which is excellent for reliability! In the future, when there's a bug, it won't crash the whole celfone. It'll just crash the, ahh, the Phone Application.'' rght, sure, but SO WHAT?! pgpFIN4mApS1p.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
Grant, Didn't see a response so I'll give it a go. Ripping a disk away and silently inserting a new one is asking for trouble imho. I am not sure what you were trying to accomplish but generally replace a drive/lun would entail commands like zpool offline tank c1t3d0 cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0disk connectedconfigured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3disk connectedunconfigured ok Replace the physical disk c1t3d0 # cfgadm -c configure sata1/3 Taken from this page: http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view ..Remco Grant Lowe wrote: Hi All, Don't know if this is worth reporting, as it's human error. Anyway, I had a panic on my zfs box. Here's the error: marksburg /usr2/glowe grep panic /var/log/syslog Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 marksburg /usr2/glowe What we did to cause this is we pulled a LUN from zfs, and replaced it with a new LUN. We then tried to shutdown the box, but it wouldn't go down. We had to send a break to the box and reboot. This is an oracle sandbox, so we're not really concerned. Ideas? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
Hi Remco. Yes, I realize that was asking for trouble. It wasn't supposed to be a test of yanking a LUN. We needed a LUN for a VxVM/VxFS system and that LUN was available. I was just surprised at the panic, since the system was quiesced at the time. But there is coming a time when we will be doing this. Thanks for the feedback. I appreciate it. - Original Message From: Remco Lengers re...@lengers.com To: Grant Lowe gl...@sbcglobal.net Cc: zfs-discuss@opensolaris.org Sent: Thursday, April 9, 2009 5:31:42 AM Subject: Re: [zfs-discuss] ZFS Panic Grant, Didn't see a response so I'll give it a go. Ripping a disk away and silently inserting a new one is asking for trouble imho. I am not sure what you were trying to accomplish but generally replace a drive/lun would entail commands like zpool offline tank c1t3d0 cfgadm | grep c1t3d0 sata1/3::dsk/c1t3d0disk connectedconfigured ok # cfgadm -c unconfigure sata1/3 Unconfigure the device at: /devices/p...@0,0/pci1022,7...@2/pci11ab,1...@1:3 This operation will suspend activity on the SATA device Continue (yes/no)? yes # cfgadm | grep sata1/3 sata1/3disk connectedunconfigured ok Replace the physical disk c1t3d0 # cfgadm -c configure sata1/3 Taken from this page: http://docs.sun.com/app/docs/doc/819-5461/gbbzy?a=view ..Remco Grant Lowe wrote: Hi All, Don't know if this is worth reporting, as it's human error. Anyway, I had a panic on my zfs box. Here's the error: marksburg /usr2/glowe grep panic /var/log/syslog Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 marksburg /usr2/glowe What we did to cause this is we pulled a LUN from zfs, and replaced it with a new LUN. We then tried to shutdown the box, but it wouldn't go down. We had to send a break to the box and reboot. This is an oracle sandbox, so we're not really concerned. Ideas? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
On Fri, 10 Apr 2009, Rince wrote: FWIW, I strongly expect live ripping of a SATA device to not panic the disk layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be fault-tolerant and drive dropping away at any time is a rather expected scenario. Ripping a SATA device out runs a goodly chance of confusing the controller. If you'd had this problem with fibre channel or even SCSI, I'd find it a far bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like to level at them. Of course, this boils down to controller and enclosure and a lot of other random chances for disaster. In addition, where there is a procedure to gently remove the device, use it. We don't just yank disks from the FC-AL backplanes on V880s, because there is a procedure for handling this even for failed disks. The five minutes to do it properly is a good investment compared to much longer downtime from a fault condition arising from careless manhandling of hardware. -- Andre van Eyssen. mail: an...@purplecow.org jabber: an...@interact.purplecow.org purplecow.org: UNIX for the masses http://www2.purplecow.org purplecow.org: PCOWpix http://pix.purplecow.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Panic
On Fri, Apr 10, 2009 at 12:43 AM, Andre van Eyssen an...@purplecow.orgwrote: On Fri, 10 Apr 2009, Rince wrote: FWIW, I strongly expect live ripping of a SATA device to not panic the disk layer. It explicitly shouldn't panic the ZFS layer, as ZFS is supposed to be fault-tolerant and drive dropping away at any time is a rather expected scenario. Ripping a SATA device out runs a goodly chance of confusing the controller. If you'd had this problem with fibre channel or even SCSI, I'd find it a far bigger concern. IME, IDE and SATA just don't hold up to the abuses we'd like to level at them. Of course, this boils down to controller and enclosure and a lot of other random chances for disaster. In addition, where there is a procedure to gently remove the device, use it. We don't just yank disks from the FC-AL backplanes on V880s, because there is a procedure for handling this even for failed disks. The five minutes to do it properly is a good investment compared to much longer downtime from a fault condition arising from careless manhandling of hardware. IDE isn't supposed to do this, but SATA explicitly has hotplug as a feature. (I think this might be SATA 2, so any SATA 1 controllers out there are hedging your bets, but...) I'm not advising this as a recommended procedure, but the failure of the controller isn't my point. *ZFS* shouldn't panic under those conditions. The disk layer, perhaps, but not ZFS. As far as it should be concerned, it's equivalent to ejecting a disk via cfgadm without telling ZFS first, which *IS* a supported operation. - Rich -- Procrastination means never having to say you're sorry. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Panic
Hi All, Don't know if this is worth reporting, as it's human error. Anyway, I had a panic on my zfs box. Here's the error: marksburg /usr2/glowe grep panic /var/log/syslog Apr 8 06:57:17 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 Apr 8 07:15:10 marksburg savecore: [ID 570001 auth.error] reboot after panic: assertion failed: 0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs, dbp), file: ../../common/fs/zfs/dmu.c, line: 580 marksburg /usr2/glowe What we did to cause this is we pulled a LUN from zfs, and replaced it with a new LUN. We then tried to shutdown the box, but it wouldn't go down. We had to send a break to the box and reboot. This is an oracle sandbox, so we're not really concerned. Ideas? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic in build 108?
I upgraded my 280R system to yesterday's nightly build, and when I rebooted, this happened: Boot device: /p...@8,60/SUNW,q...@4/f...@0,0/d...@w212037e9abe4,0:a File and args: SunOS Release 5.11 Version snv_108 64-bit Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. / panic[cpu0]/thread=300045b6780: BAD TRAP: type=31 rp=2a100734df0 addr=28 mmu_fsr=0 occurred in module zfs due to a NULL pointer dereference zfs: trap type = 0x31 addr=0x28 pid=55, pc=0x12f7e68, sp=0x2a100734691, tstate=0x4480001600, context=0x57 g1-g7: 2c19c6, 0, 0, 0, 0, 57, 300045b6780 02a100734b00 unix:die+74 (10bd400, 2a100734df0, 28, 0, 8, 2a100734bc0) %l0-3: 012f7e68 0010 0010 0100 %l4-7: 2000 010bd770 010bd400 000b 02a100734be0 unix:trap+9e8 (2a100734df0, 0, 31, 0, 0, 1c00) %l0-3: 02a100734ce0 0005 03000428ac28 %l4-7: 0001 0001 0028 02a100734d40 unix:ktl0+48 (0, 18ebfb8, 3514, 180c000, 3255d18, 2a100734f38) %l0-3: 0001 1400 004480001600 0101b128 %l4-7: 03317840 018ec000 02a100734df0 02a100734e90 zfs:vdev_readable+4 (0, 1, fffd, 0, 0, 3) %l0-3: 013324a0 0006 0003 0050bd73 %l4-7: 0007 0300030be080 0001 fff8 02a100734f40 zfs:vdev_mirror_child_select+74 (300070ef120, 0, 300070ef110, 300070ef110, 6, 97) %l0-3: 0300070ef120 0050bd73 0001 %l4-7: 002c19c6 0003 7fff 0006 02a100734ff0 zfs:vdev_mirror_io_start+d8 (300027756e0, 1, 12fb53c, 300070ef110, 8, 1) %l0-3: fc00 0001 0001 4000 %l4-7: 0300045b6780 032f0008 032f0058 0001 02a1007350d0 zfs:zio_execute+8c (300027756e0, 130c45c, 58, b, 18e2c00, 400) %l0-3: 018e2f60 0001 fc04 0001 %l4-7: 0800 0800 0001 02a100735180 zfs:zio_wait+c (300027756e0, 1, 1, 30002775998, 0, 12c7ebc) %l0-3: 00074458 018e65a8 0753 030002632000 %l4-7: 018e4000 0001 fc04 fc00 02a100735230 zfs:arc_read_nolock+83c (0, 32f, 2, 12c7c00, 0, 400) %l0-3: 0001 0001 02a100735408 018e4000 %l4-7: 030006ce1bc0 02a10073542c 018e8450 030007677998 02a100735340 zfs:dmu_objset_open_impl+b0 (32f, 0, 30006bcd970, 2a1007354f8, 18e2c00, 3000664a540) %l0-3: 02a100735408 0001 012c7c00 03000664a550 %l4-7: 0300027756e0 0300030be080 030006bcd940 03000664a570 02a100735430 zfs:dsl_pool_open+30 (32f, 50bd74, 32f01a8, , 0, 30006bcdbc8) %l0-3: 0001 0132d400 030006bcd940 0300054acd40 %l4-7: 0300027756e0 0300030be080 0001 0001 02a100735500 zfs:spa_load+960 (32f, 32f0320, 1, 32f03c0, bab10c, 1815600) %l0-3: 013324a0 0006 0003 0050bd73 %l4-7: 0007 0300030be080 0001 fff8 02a1007355f0 zfs:spa_open_common+80 (3000617e000, 2a100735758, 132d4b5, 2a100735818, 0, 18f9000) %l0-3: 0005 018f90f0 032f 0003 %l4-7: 018f9000 0180c0e8 0180c0e8 02a1007356a0 zfs:spa_get_stats+18 (3000617e000, 2a100735818, 3000617e400, 800, 7a, 132d400) %l0-3: 01839ea0 0183fe30 0180c000 %l4-7: 0001 02a100735760 zfs:zfs_ioc_pool_stats+10 (3000617e000, 0, 0, 7a, 3000617e000, 3000428ac28) %l0-3: 03000617e002 03000617e000 0019 %l4-7: 005a 007a 01336400 01336400 02a100735820 zfs:zfsdev_ioctl+124 (18e39f0, 10005, ffbfecd8, 1000, 32eb178, 3000617e000) %l0-3: 018e3a00 0078 000f 0005 %l4-7: 0014 018e3800 0001 013185d8 02a1007358d0 genunix:fop_ioctl+58 (300049fcc80, 5a05, ffbfecd8, 13, 0, 2a100735adc) %l0-3: 030004a63800 012b6d18 030001a60c00 018c5930 %l4-7: 01877c00 0300030bfbf8 0001 018c6800 02a100735990 genunix:ioctl+164 (3, 5a05, ffbfecd8, 7a7000, 30003215928, 800) %l0-3: 0013 0003 %l4-7: 0003
Re: [zfs-discuss] zfs panic
Looks like a corrupted pool -- you appear to have a mirror block pointer with no valid children. From the dump, you could probably determine which file is bad, but I doubt you could delete it; you might need to recreate your pool. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic
I'm sorry about the problems. We try to be responsive to fixing bugs and implementing new features that people are requesting for ZFS. It's not always possible to get it right. In this instance I don't think the bug was reproducible, and perhaps that's why it hasn't received the attention it deserves. As far as I know yours is the second reported instance. It may be that the problem has been fixed and that's why we haven't seen it in-house. However, that's just speculation, and some serious investigation is needed. Neil. On 01/13/09 06:39, Krzys wrote: To be honest I am quite surprised as this bug you referring to was submited early in 2008 and last updated over the summer. Quite surprised that Sun did not come up with a fix for it so far. ZFS is certainly gaining some popularity at my workplace, and we were thinking of using it instead of veritas, but I am not sure what to do with it now.. what if we have systems that we quite depend on and we have similar issue? How could we solve it? is calling sun support going to help me in such case? This particular system is my playground and I do not care about it to that extend but if I had other system that has much greater importance and I get such situation its quite scary... :( On Mon, 12 Jan 2009, Neil Perrin wrote: This is a known bug: 6678070 Panic from vdev_mirror_map_alloc() http://bugs.opensolaris.org/view_bug.do?bug_id=6678070 Neil. On 01/12/09 21:12, Krzys wrote: any idea what could cause my system to panic? I get my system rebooted daily at various times. very strange, but its pointing to zfs. I have U6 with all latest patches. Jan 12 05:47:12 chrysek unix: [ID 836849 kern.notice] Jan 12 05:47:12 chrysek ^Mpanic[cpu1]/thread=30002c8d4e0: Jan 12 05:47:12 chrysek unix: [ID 799565 kern.notice] BAD TRAP: type=28 rp=2a10285c790 addr=7b76a0a8 mmu_fsr=0 Jan 12 05:47:12 chrysek unix: [ID 10 kern.notice] Jan 12 05:47:12 chrysek unix: [ID 839527 kern.notice] zfs: ... ... ... 374706 pages dumped, compression ratio 3.50, Jan 12 05:48:51 chrysek genunix: [ID 851671 kern.notice] dump succeeded Jan 12 05:49:40 chrysek genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_13-02 64-bit ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic
On Tue, January 13, 2009 09:51, Neil Perrin wrote: I'm sorry about the problems. We try to be responsive to fixing bugs and implementing new features that people are requesting for ZFS. It's not always possible to get it right. In this instance I don't think the bug was reproducible, and perhaps that's why it hasn't received the attention it deserves. As far as I know yours is the second reported instance. Mine (first mentioned on this mailing list last night) may not be the same thing; but it's a ZFS null pointer crash, so it may. And I think it's scrub-related. I'm currently waiting to see if anybody wants the details from the log, or the dump file, or if there's stuff I should look at. Meanwhile, I'm annoyed my home fileserver is down -- but I'm already getting far more than I'm paying for, and I think far more than I'd get from Microsoft if I reported such a problem (and the systems I'm running their software on I paid money to them for). It may be that the problem has been fixed and that's why we haven't seen it in-house. However, that's just speculation, and some serious investigation is needed. My problem (which may not be the same problem) is in 2008.11, I believe that's nv101 code (or 101b?). What's my next step? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs panic
any idea what could cause my system to panic? I get my system rebooted daily at various times. very strange, but its pointing to zfs. I have U6 with all latest patches. Jan 12 05:47:12 chrysek unix: [ID 836849 kern.notice] Jan 12 05:47:12 chrysek ^Mpanic[cpu1]/thread=30002c8d4e0: Jan 12 05:47:12 chrysek unix: [ID 799565 kern.notice] BAD TRAP: type=28 rp=2a10285c790 addr=7b76a0a8 mmu_fsr=0 Jan 12 05:47:12 chrysek unix: [ID 10 kern.notice] Jan 12 05:47:12 chrysek unix: [ID 839527 kern.notice] zfs: Jan 12 05:47:12 chrysek unix: [ID 983713 kern.notice] integer divide zero trap: Jan 12 05:47:12 chrysek unix: [ID 381800 kern.notice] addr=0x7b76a0a8 Jan 12 05:47:12 chrysek unix: [ID 101969 kern.notice] pid=18941, pc=0x7b76a0a8, sp=0x2a10285c031, tstate=0x4480001606, context=0x1 Jan 12 05:47:12 chrysek unix: [ID 743441 kern.notice] g1-g7: 7b76a07c, 1, 0, 0, 241b2a, 16, 30002c8d4e0 Jan 12 05:47:12 chrysek unix: [ID 10 kern.notice] Jan 12 05:47:12 chrysek genunix: [ID 723222 kern.notice] 02a10285c4b0 unix:die+9c (28, 2a10285c790, 7b76a0a8, 0, 2a10285c570, 1) Jan 12 05:47:12 chrysek genunix: [ID 179002 kern.notice] %l0-3: 000a 0028 000a 0801 Jan 12 05:47:12 chrysek %l4-7: 02a10285cd18 02a10285cd3c 0006 0109a000 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c590 unix:trap+644 (2a10285c790, 1, 0, 0, 180c000, 30002c8d4e0) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 06002c5b9130 0028 0600118fa088 Jan 12 05:47:13 chrysek %l4-7: 00db 004480001606 00010200 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c6e0 unix:ktl0+48 (0, 70021d50, 349981, 180c000, 10394e8, 2a10285c8e8) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0007 1400 004480001606 0101bedc Jan 12 05:47:13 chrysek %l4-7: 0600110bd630 0600110be400 02a10285c790 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c830 zfs:spa_get_random+c (0, 0, d15c4746ef9ddd65, 0, , 8) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 01ff 7b772a00 000e Jan 12 05:47:13 chrysek %l4-7: 00020801 ee00 060031b23680 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c8f0 zfs:vdev_mirror_map_alloc+b8 (60012ec20e0, 30006a9a3c8, 1, 30006a9a370, 0, ff) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: Jan 12 05:47:13 chrysek %l4-7: 0600112cc080 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285c9a0 zfs:vdev_mirror_io_start+4 (30006a9a370, 0, 0, 30006a9a3c8, 0, 7b772bc4) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0001 7b7a4688 Jan 12 05:47:14 chrysek %l4-7: 7b7a4400 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285ca80 zfs:zio_execute+74 (30006a9a370, 7b783f70, 78, f, 1, 70496c00) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 030083edb728 00c44002 00038000 70496d88 Jan 12 05:47:14 chrysek %l4-7: 00efc006 0801 8000 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285cb30 zfs:arc_read+724 (1, 600112cc080, 30075baba00, 200, 0, 300680b9288) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0001 70496060 0006 0801 Jan 12 05:47:14 chrysek %l4-7: 02a10285cd18 030083edb728 02a10285cd3c Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285cc40 zfs:dbuf_prefetch+13c (60035ce1050, 70496c00, 30075baba00, 0, 0, 3007578b0a0) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 000a 0028 000a 0801 Jan 12 05:47:14 chrysek %l4-7: 02a10285cd18 02a10285cd3c 0006 Jan 12 05:47:15 chrysek genunix: [ID 723222 kern.notice] 02a10285cd50 zfs:dmu_zfetch_fetch+2c (60035ce1050, 8b67, 100, 100, cd, 8c34) Jan 12 05:47:15 chrysek genunix: [ID 179002 kern.notice] %l0-3: 7049d098 4000 7049d000 7049d188 Jan 12 05:47:15 chrysek %l4-7: 06d8 00db 7049d178 7049d0f8 Jan 12 05:47:15 chrysek genunix: [ID 723222 kern.notice] 02a10285ce00 zfs:dmu_zfetch_dofetch+b8 (60035ce12a0, 6002f87c260, 8b67, 8a67, 8b68, 0) Jan 12 05:47:15 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0001
Re: [zfs-discuss] zfs panic
This is a known bug: 6678070 Panic from vdev_mirror_map_alloc() http://bugs.opensolaris.org/view_bug.do?bug_id=6678070 Neil. On 01/12/09 21:12, Krzys wrote: any idea what could cause my system to panic? I get my system rebooted daily at various times. very strange, but its pointing to zfs. I have U6 with all latest patches. Jan 12 05:47:12 chrysek unix: [ID 836849 kern.notice] Jan 12 05:47:12 chrysek ^Mpanic[cpu1]/thread=30002c8d4e0: Jan 12 05:47:12 chrysek unix: [ID 799565 kern.notice] BAD TRAP: type=28 rp=2a10285c790 addr=7b76a0a8 mmu_fsr=0 Jan 12 05:47:12 chrysek unix: [ID 10 kern.notice] Jan 12 05:47:12 chrysek unix: [ID 839527 kern.notice] zfs: Jan 12 05:47:12 chrysek unix: [ID 983713 kern.notice] integer divide zero trap: Jan 12 05:47:12 chrysek unix: [ID 381800 kern.notice] addr=0x7b76a0a8 Jan 12 05:47:12 chrysek unix: [ID 101969 kern.notice] pid=18941, pc=0x7b76a0a8, sp=0x2a10285c031, tstate=0x4480001606, context=0x1 Jan 12 05:47:12 chrysek unix: [ID 743441 kern.notice] g1-g7: 7b76a07c, 1, 0, 0, 241b2a, 16, 30002c8d4e0 Jan 12 05:47:12 chrysek unix: [ID 10 kern.notice] Jan 12 05:47:12 chrysek genunix: [ID 723222 kern.notice] 02a10285c4b0 unix:die+9c (28, 2a10285c790, 7b76a0a8, 0, 2a10285c570, 1) Jan 12 05:47:12 chrysek genunix: [ID 179002 kern.notice] %l0-3: 000a 0028 000a 0801 Jan 12 05:47:12 chrysek %l4-7: 02a10285cd18 02a10285cd3c 0006 0109a000 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c590 unix:trap+644 (2a10285c790, 1, 0, 0, 180c000, 30002c8d4e0) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 06002c5b9130 0028 0600118fa088 Jan 12 05:47:13 chrysek %l4-7: 00db 004480001606 00010200 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c6e0 unix:ktl0+48 (0, 70021d50, 349981, 180c000, 10394e8, 2a10285c8e8) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0007 1400 004480001606 0101bedc Jan 12 05:47:13 chrysek %l4-7: 0600110bd630 0600110be400 02a10285c790 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c830 zfs:spa_get_random+c (0, 0, d15c4746ef9ddd65, 0, , 8) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: 01ff 7b772a00 000e Jan 12 05:47:13 chrysek %l4-7: 00020801 ee00 060031b23680 Jan 12 05:47:13 chrysek genunix: [ID 723222 kern.notice] 02a10285c8f0 zfs:vdev_mirror_map_alloc+b8 (60012ec20e0, 30006a9a3c8, 1, 30006a9a370, 0, ff) Jan 12 05:47:13 chrysek genunix: [ID 179002 kern.notice] %l0-3: Jan 12 05:47:13 chrysek %l4-7: 0600112cc080 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285c9a0 zfs:vdev_mirror_io_start+4 (30006a9a370, 0, 0, 30006a9a3c8, 0, 7b772bc4) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0001 7b7a4688 Jan 12 05:47:14 chrysek %l4-7: 7b7a4400 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285ca80 zfs:zio_execute+74 (30006a9a370, 7b783f70, 78, f, 1, 70496c00) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 030083edb728 00c44002 00038000 70496d88 Jan 12 05:47:14 chrysek %l4-7: 00efc006 0801 8000 Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285cb30 zfs:arc_read+724 (1, 600112cc080, 30075baba00, 200, 0, 300680b9288) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 0001 70496060 0006 0801 Jan 12 05:47:14 chrysek %l4-7: 02a10285cd18 030083edb728 02a10285cd3c Jan 12 05:47:14 chrysek genunix: [ID 723222 kern.notice] 02a10285cc40 zfs:dbuf_prefetch+13c (60035ce1050, 70496c00, 30075baba00, 0, 0, 3007578b0a0) Jan 12 05:47:14 chrysek genunix: [ID 179002 kern.notice] %l0-3: 000a 0028 000a 0801 Jan 12 05:47:14 chrysek %l4-7: 02a10285cd18 02a10285cd3c 0006 Jan 12 05:47:15 chrysek genunix: [ID 723222 kern.notice] 02a10285cd50 zfs:dmu_zfetch_fetch+2c (60035ce1050, 8b67, 100, 100, cd, 8c34) Jan 12 05:47:15 chrysek genunix: [ID 179002 kern.notice] %l0-3: 7049d098 4000 7049d000 7049d188 Jan 12 05:47:15 chrysek %l4-7: 06d8 00db 7049d178
Re: [zfs-discuss] zfs panic on boot
I'm seeing this too. Nothing unusual happened before the panic. Just a shutdown (init 5) and later startup. I have the crashdump and copy of the problem zpool (on swan). Here's the stack trace: $C ff0004463680 vpanic() ff00044636b0 vcmn_err+0x28(3, f792ecf0, ff0004463778) ff00044637a0 zfs_panic_recover+0xb6() ff0004463830 space_map_add+0xdb(ff014c1a21b8, 472785000, 1000) ff00044638e0 space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1, ff014c1a1e88, ff0149c88c30) ff0004463920 metaslab_activate+0x66(ff014c1a1e80, 4000) ff00044639e0 metaslab_group_alloc+0x24e(ff014bdeb000, 4000, 3a6734, 1435b, ff014baa9840, 2) ff0004463ab0 metaslab_alloc_dva+0x1da(ff01477880c0, ff014beefa70, 4000, ff014baa9840, 2, 0, 3a6734, 0) ff0004463b50 metaslab_alloc+0x82(ff01477880c0, ff014beefa70, 4000, ff014baa9840, 3, 3a6734, 0, 0) ff0004463ba0 zio_dva_allocate+0x62(ff014934c458) ff0004463bd0 zio_execute+0x7f(ff014934c458) ff0004463c60 taskq_thread+0x1a7(ff014bfb77a0) ff0004463c70 thread_start+8() This is on a Ferrari laptop (AMD X64) running snv79. I'd love to rescue my zpool. Any suggestions? Thanks, Gordon This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic on boot
space_map_add+0xdb(ff014c1a21b8, 472785000, 1000) space_map_load+0x1fc(ff014c1a21b8, fbd52568, 1, ff014c1a1e88, ff0149c88c30) running snv79. hmm.. did you spend any time in snv_74 or snv_75 that might have gotten http://bugs.opensolaris.org/view_bug.do?bug_id=6603147 zdb -e name_of_pool_that_crashes_on_import would be interesting, but the damage might have been done. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic on boot
Hi, On 09/29/07 22:00, Gavin Maltby wrote: Hi, Our zfs nfs build server running snv_73 (pool created back before zfs integrated to ON) paniced I guess from zfs the first time and now panics on attempted boot every time as below. Is this a known issue and, more importantly (2TB of data in the pool), any suggestions on how to recover (other than from backup). panic[cpu0]/thread=ff003cc8dc80: zfs: allocating allocated segment(offset=24872013824 size=4096) So in desperation I set 'zfs_recover' which just produced an assertion failure moments after the original panic location. but also set 'aok' to blast through assertions has allowed me to import the pool again (I had booted -m milestone=none and blown away /etc/zfs/zpool.cache to be able to boot at all). Luckily just the single corruption apparent at the moment, ie just a single assertion caught after running for half a day like this: Sep 30 17:01:53 tb3 genunix: [ID 415322 kern.warning] WARNING: zfs: allocating allocated segment(offset=24872013824 size=4096) Sep 30 17:01:53 tb3 genunix: [ID 411747 kern.notice] ASSERTION CAUGHT: sm-sm_space == space (0xc4896c00 == 0xc4897c00), file: ../../common/fs/zfs/space_map.c, line: 355 What I'd really like to know is whether/how I can map from that assertion at the pool level back down to a single filesystem or even file using this segment - perhaps I can recycle that file to free the segment and set the world straight again? A scrub is only 20% complete, but has found no errors thus far. I check the T3 pair and no complaints there either - I did reboot them just for luck (last reboot was 2 years ago, apparently!). Gavin smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic on boot
T3 comment below... Gavin Maltby wrote: Hi, On 09/29/07 22:00, Gavin Maltby wrote: Hi, Our zfs nfs build server running snv_73 (pool created back before zfs integrated to ON) paniced I guess from zfs the first time and now panics on attempted boot every time as below. Is this a known issue and, more importantly (2TB of data in the pool), any suggestions on how to recover (other than from backup). panic[cpu0]/thread=ff003cc8dc80: zfs: allocating allocated segment(offset=24872013824 size=4096) So in desperation I set 'zfs_recover' which just produced an assertion failure moments after the original panic location. but also set 'aok' to blast through assertions has allowed me to import the pool again (I had booted -m milestone=none and blown away /etc/zfs/zpool.cache to be able to boot at all). Luckily just the single corruption apparent at the moment, ie just a single assertion caught after running for half a day like this: Sep 30 17:01:53 tb3 genunix: [ID 415322 kern.warning] WARNING: zfs: allocating allocated segment(offset=24872013824 size=4096) Sep 30 17:01:53 tb3 genunix: [ID 411747 kern.notice] ASSERTION CAUGHT: sm-sm_space == space (0xc4896c00 == 0xc4897c00), file: ../../common/fs/zfs/space_map.c, line: 355 What I'd really like to know is whether/how I can map from that assertion at the pool level back down to a single filesystem or even file using this segment - perhaps I can recycle that file to free the segment and set the world straight again? A scrub is only 20% complete, but has found no errors thus far. I check the T3 pair and no complaints there either - I did reboot them just for luck (last reboot was 2 years ago, apparently!). Living on the edge... The T3 has a 2 year battery life (time is counted). When it decides the batteries are too old, it will shut down the nonvolatile write cache. You'll want to make sure you have fresh batteries soon. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs panic on boot
On 10/01/07 17:01, Richard Elling wrote: T3 comment below... [cut] A scrub is only 20% complete, but has found no errors thus far. I check the T3 pair and no complaints there either - I did reboot them just for luck (last reboot was 2 years ago, apparently!). Living on the edge... The T3 has a 2 year battery life (time is counted). When it decides the batteries are too old, it will shut down the nonvolatile write cache. You'll want to make sure you have fresh batteries soon. Thanks - we have replaced batteries in that time - there is no need to shutdown during battery replacement. Gavin smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs panic on boot
Hi, Our zfs nfs build server running snv_73 (pool created back before zfs integrated to ON) paniced I guess from zfs the first time and now panics on attempted boot every time as below. Is this a known issue and, more importantly (2TB of data in the pool), any suggestions on how to recover (other than from backup). panic[cpu0]/thread=ff003cc8dc80: zfs: allocating allocated segment(offset=24872013824 size=4096) ff003cc8d3c0 genunix:vcmn_err+28 ()ck to the main menu. ff003cc8d4b0 zfs:zfs_panic_recover+b6 () ff003cc8d540 zfs:space_map_add+db () ff003cc8d5e0 zfs:space_map_load+1f4 () ff003cc8d620 zfs:metaslab_activate+66 () ff003cc8d6e0 zfs:metaslab_group_alloc+24e () ff003cc8d7b0 zfs:metaslab_alloc_dva+192 () ff003cc8d850 zfs:metaslab_alloc+82 () ff003cc8d8a0 zfs:zio_dva_allocate+68 () ff003cc8d8c0 zfs:zio_next_stage+b3 () ff003cc8d8f0 zfs:zio_checksum_generate+6e () ff003cc8d910 zfs:zio_next_stage+b3 () ff003cc8d980 zfs:zio_write_compress+239 () ff003cc8d9a0 zfs:zio_next_stage+b3 () ff003cc8d9f0 zfs:zio_wait_for_children+5d () ff003cc8da10 zfs:zio_wait_children_ready+20 () ff003cc8da30 zfs:zio_next_stage_async+bb () ff003cc8da50 zfs:zio_nowait+11 () ff003cc8dad0 zfs:dmu_objset_sync+172 () ff003cc8db40 zfs:dsl_pool_sync+199 () ff003cc8dbd0 zfs:spa_sync+1c5 () ff003cc8dc60 zfs:txg_sync_thread+19a () ff003cc8dc70 unix:thread_start+8 () In case it matters this is an X4600 M2. There is about 1.5TB in use out of a 2TB pool. The IO devices are nothing exciting but adequate for building - 2 x T3b. The pool was created under sparc on the old nfs server. Thanks Gavin smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic when trying to import pool
Ok I found the problem with 0x06, one disk was missing. But now I got all my disk and I get 0x05.: Sep 21 10:25:53 unknown ^Mpanic[cpu0]/thread=ff0001e12c80: Sep 21 10:25:53 unknown genunix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: .. /../common/fs/zfs/space_map.c, line: 339 Sep 21 10:25:53 unknown unix: [ID 10 kern.notice] Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e124f0 genunix:assfail3+b9 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12590 zfs:space_map_load+2ef () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e125d0 zfs:metaslab_activate+66 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12690 zfs:metaslab_group_alloc+24e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12760 zfs:metaslab_alloc_dva+192 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12800 zfs:metaslab_alloc+82 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12850 zfs:zio_dva_allocate+68 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12870 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e128a0 zfs:zio_checksum_generate+6e () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e128c0 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12930 zfs:zio_write_compress+239 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12950 zfs:zio_next_stage+b3 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129a0 zfs:zio_wait_for_children+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129c0 zfs:zio_wait_children_ready+20 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e129e0 zfs:zio_next_stage_async+bb () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12a00 zfs:zio_nowait+11 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12a80 zfs:dmu_objset_sync+196 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12ad0 zfs:dsl_dataset_sync+5d () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12b40 zfs:dsl_pool_sync+b5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12bd0 zfs:spa_sync+1c5 () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12c60 zfs:txg_sync_thread+19a () Sep 21 10:25:53 unknown genunix: [ID 655072 kern.notice] ff0001e12c70 unix:thread_start+8 () There is no scsi the disks, because those are virtual disk. Also for anyone who are interest, I wrote a little program to show the properties on the vdev.: http://www.projectvolcano.org/zfs/list_vdev.c. Here is a sample output: bash-3.00# ./list_vdev -d /dev/dsk/c1t12d0s0 Vdev properties for /dev/dsk/c1t12d0s0: version: 0x0003 name: share02 state: 0x0001 txg: 0x003fd0e4 pool_guid: 0x88f93fc54c215cfa top_guid: 0x65400f2e7db0c2a5 guid: 0xfc3b9af2d3b6fd46 vdev_tree: type: raidz id: 0x guid: 0x65400f2e7db0c2a5 nparity: 0x0001 metaslab_array: 0x000d metaslab_shift: 0x001e ashift: 0x0009 asize: 0x00196e0c children: [ [0] type: disk id: 0x guid: 0xfc3b9af2d3b6fd46 path: /dev/dsk/c1t12d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x004e [1] type: disk id: 0x0001 guid: 0x377cc1a2beb3c985 path: /dev/dsk/c1t13d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x004d [2] type: disk id: 0x0002 guid: 0xe97db62ad7fe325d path: /dev/dsk/c1t14d0s0 devid: id1,[EMAIL PROTECTED]/a whole_disk: 0x0001 DTL: 0x0091 ] So my question, is there a way to really know why I got IOE (0x05)? Is there a way to know in the debugger? How can I access it? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic in space_map.c line 125
Tomas Ögren wrote: On 18 September, 2007 - Gino sent me these 0,3K bytes: Hello, upgrade to snv_60 or later if you care about your data :) If there are known serious data loss bug fixes that have gone into snv60+, but not into s10u4, then I would like to tell Sun to backport those into s10u4 if they care about keeping customers.. Any specific bug fixes you know about that one really wants? (so we can poke support).. I think it is bug 6458218 assertion failed: ss == NULL which is fixed in Solaris 10 8/07. Hth, victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic when trying to import pool
I have a raid-z zfs filesystem with 3 disks. The disk was starting have read and write errors. The disks was so bad that I started to have trans_err. The server lock up and the server was reset. Then now when trying to import the pool the system panic. I installed the last Recommend on my Solaris U3 and also install the last Kernel patch (120011-14). But still when trying to do zpool import pool it panic. I also dd the disk and tested on another server with OpenSolaris B72 and still the same thing. Here is the panic backtrace: Stack Backtrace - vpanic() assfail3+0xb9(f7dde5f0, 6, f7dde840, 0, f7dde820, 153) space_map_load+0x2ef(ff008f1290b8, c00fc5b0, 1, ff008f128d88, ff008dd58ab0) metaslab_activate+0x66(ff008f128d80, 8000) metaslab_group_alloc+0x24e(ff008f46bcc0, 400, 3fd0f1, 32dc18000, ff008fbeaa80, 0) metaslab_alloc_dva+0x192(ff008f2d1a80, ff008f235730, 200, ff008fbeaa80, 0, 0) metaslab_alloc+0x82(ff008f2d1a80, ff008f235730, 200, ff008fbeaa80, 2 , 3fd0f1) zio_dva_allocate+0x68(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_checksum_generate+0x6e(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_write_compress+0x239(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_wait_for_children+0x5d(ff008f722790, 1, ff008f7229e0) zio_wait_children_ready+0x20(ff008f722790) zio_next_stage_async+0xbb(ff008f722790) zio_nowait+0x11(ff008f722790) dmu_objset_sync+0x196(ff008e4e5000, ff008f722a10, ff008f260a80) dsl_dataset_sync+0x5d(ff008df47e00, ff008f722a10, ff008f260a80) dsl_pool_sync+0xb5(ff00882fb800, 3fd0f1) spa_sync+0x1c5(ff008f2d1a80, 3fd0f1) txg_sync_thread+0x19a(ff00882fb800) thread_start+8() And here is the panic message buf: panic[cpu0]/thread=ff0001ba2c80: assertion failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0 x6 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 339 ff0001ba24f0 genunix:assfail3+b9 () ff0001ba2590 zfs:space_map_load+2ef () ff0001ba25d0 zfs:metaslab_activate+66 () ff0001ba2690 zfs:metaslab_group_alloc+24e () ff0001ba2760 zfs:metaslab_alloc_dva+192 () ff0001ba2800 zfs:metaslab_alloc+82 () ff0001ba2850 zfs:zio_dva_allocate+68 () ff0001ba2870 zfs:zio_next_stage+b3 () ff0001ba28a0 zfs:zio_checksum_generate+6e () ff0001ba28c0 zfs:zio_next_stage+b3 () ff0001ba2930 zfs:zio_write_compress+239 () ff0001ba2950 zfs:zio_next_stage+b3 () ff0001ba29a0 zfs:zio_wait_for_children+5d () ff0001ba29c0 zfs:zio_wait_children_ready+20 () ff0001ba29e0 zfs:zio_next_stage_async+bb () ff0001ba2a00 zfs:zio_nowait+11 () ff0001ba2a80 zfs:dmu_objset_sync+196 () ff0001ba2ad0 zfs:dsl_dataset_sync+5d () ff0001ba2b40 zfs:dsl_pool_sync+b5 () ff0001ba2bd0 zfs:spa_sync+1c5 () ff0001ba2c60 zfs:txg_sync_thread+19a () ff0001ba2c70 unix:thread_start+8 () syncing file systems... Is there a way to restore the data? Is there a way to fsck the zpool, and correct the error manually? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic when trying to import pool
Basically, it is complaining that there aren't enough disks to read the pool metadata. This would suggest that in your 3-disk RAID-Z config, either two disks are missing, or one disk is missing *and* another disk is damaged -- due to prior failed writes, perhaps. (I know there's at least one disk missing because the failure mode is errno 6, which is EXNIO.) Can you tell from /var/adm/messages or fmdump whether there write errors to multiple disks, or to just one? Jeff On Tue, Sep 18, 2007 at 05:26:16PM -0700, Geoffroy Doucet wrote: I have a raid-z zfs filesystem with 3 disks. The disk was starting have read and write errors. The disks was so bad that I started to have trans_err. The server lock up and the server was reset. Then now when trying to import the pool the system panic. I installed the last Recommend on my Solaris U3 and also install the last Kernel patch (120011-14). But still when trying to do zpool import pool it panic. I also dd the disk and tested on another server with OpenSolaris B72 and still the same thing. Here is the panic backtrace: Stack Backtrace - vpanic() assfail3+0xb9(f7dde5f0, 6, f7dde840, 0, f7dde820, 153) space_map_load+0x2ef(ff008f1290b8, c00fc5b0, 1, ff008f128d88, ff008dd58ab0) metaslab_activate+0x66(ff008f128d80, 8000) metaslab_group_alloc+0x24e(ff008f46bcc0, 400, 3fd0f1, 32dc18000, ff008fbeaa80, 0) metaslab_alloc_dva+0x192(ff008f2d1a80, ff008f235730, 200, ff008fbeaa80, 0, 0) metaslab_alloc+0x82(ff008f2d1a80, ff008f235730, 200, ff008fbeaa80, 2 , 3fd0f1) zio_dva_allocate+0x68(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_checksum_generate+0x6e(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_write_compress+0x239(ff008f722790) zio_next_stage+0xb3(ff008f722790) zio_wait_for_children+0x5d(ff008f722790, 1, ff008f7229e0) zio_wait_children_ready+0x20(ff008f722790) zio_next_stage_async+0xbb(ff008f722790) zio_nowait+0x11(ff008f722790) dmu_objset_sync+0x196(ff008e4e5000, ff008f722a10, ff008f260a80) dsl_dataset_sync+0x5d(ff008df47e00, ff008f722a10, ff008f260a80) dsl_pool_sync+0xb5(ff00882fb800, 3fd0f1) spa_sync+0x1c5(ff008f2d1a80, 3fd0f1) txg_sync_thread+0x19a(ff00882fb800) thread_start+8() And here is the panic message buf: panic[cpu0]/thread=ff0001ba2c80: assertion failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0 x6 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 339 ff0001ba24f0 genunix:assfail3+b9 () ff0001ba2590 zfs:space_map_load+2ef () ff0001ba25d0 zfs:metaslab_activate+66 () ff0001ba2690 zfs:metaslab_group_alloc+24e () ff0001ba2760 zfs:metaslab_alloc_dva+192 () ff0001ba2800 zfs:metaslab_alloc+82 () ff0001ba2850 zfs:zio_dva_allocate+68 () ff0001ba2870 zfs:zio_next_stage+b3 () ff0001ba28a0 zfs:zio_checksum_generate+6e () ff0001ba28c0 zfs:zio_next_stage+b3 () ff0001ba2930 zfs:zio_write_compress+239 () ff0001ba2950 zfs:zio_next_stage+b3 () ff0001ba29a0 zfs:zio_wait_for_children+5d () ff0001ba29c0 zfs:zio_wait_children_ready+20 () ff0001ba29e0 zfs:zio_next_stage_async+bb () ff0001ba2a00 zfs:zio_nowait+11 () ff0001ba2a80 zfs:dmu_objset_sync+196 () ff0001ba2ad0 zfs:dsl_dataset_sync+5d () ff0001ba2b40 zfs:dsl_pool_sync+b5 () ff0001ba2bd0 zfs:spa_sync+1c5 () ff0001ba2c60 zfs:txg_sync_thread+19a () ff0001ba2c70 unix:thread_start+8 () syncing file systems... Is there a way to restore the data? Is there a way to fsck the zpool, and correct the error manually? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic when trying to import pool
actually here is the first panic messages: Sep 13 23:33:22 netra2 unix: [ID 603766 kern.notice] assertion failed: dmu_read(os, smo-smo_object, offset, size, entry_map) == 0 (0x5 == 0x0), file: ../../common/fs/zfs/space_map.c, line: 307 Sep 13 23:33:22 netra2 unix: [ID 10 kern.notice] Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 02a103e6b000 genunix:assfail3+94 (7b7706d0, 5, 7b770710, 0, 7b770718, 133) Sep 13 23:33:22 netra2 genunix: [ID 179002 kern.notice] %l0-3: 2000 0133 0186f800 Sep 13 23:33:22 netra2 %l4-7: 0183d400 011eb400 Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 02a103e6b0c0 zfs:space_map_load+1a4 (30007cc2c38, 70450058, 1000, 30007cc2908, 38000, 1) Sep 13 23:33:22 netra2 genunix: [ID 179002 kern.notice] %l0-3: 1a60 03000ce3b000 7b73ead0 Sep 13 23:33:22 netra2 %l4-7: 7b73e86c 7fff 7fff 1000 Sep 13 23:33:22 netra2 genunix: [ID 723222 kern.notice] 02a103e6b190 zfs:metaslab_activate+3c (30007cc2900, 8000, c000, e75efe6c, 30007cc2900, c000) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 02a103e6b308 0003 0002 006dd004 Sep 13 23:33:23 netra2 %l4-7: 7045 030010834940 0300080eba40 0300106c9748 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 02a103e6b240 zfs:metaslab_group_alloc+1bc (3fff, 400, 8000, 32dc18000, 30003387d88, ) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0300106c9750 0001 030007cc2900 Sep 13 23:33:23 netra2 %l4-7: 8000 000196e0c000 4000 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 02a103e6b320 zfs:metaslab_alloc_dva+114 (0, 32dc18000, 30003387d88, 400, 300080eba40, 3fd0f1) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0001 0003 030011c068e0 Sep 13 23:33:23 netra2 %l4-7: 0300106c9748 0300106c9748 Sep 13 23:33:23 netra2 genunix: [ID 723222 kern.notice] 02a103e6b3f0 zfs:metaslab_alloc+2c (30010834940, 200, 30003387d88, 3, 3fd0f1, 0) Sep 13 23:33:23 netra2 genunix: [ID 179002 kern.notice] %l0-3: 030003387de8 0300139e1800 704506a0 Sep 13 23:33:23 netra2 %l4-7: 030013fca7be 030010834940 0001 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 02a103e6b4a0 zfs:zio_dva_allocate+4c (30010eafcc0, 7b7515a8, 30003387d88, 70450508, 70450400, 20001) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 70450400 07030001 07030001 Sep 13 23:33:24 netra2 %l4-7: 018a5c00 0003 0007 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 02a103e6b550 zfs:zio_write_compress+1ec (30010eafcc0, 23e20b, 23e000, 10001, 3, 30003387d88) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0001 0200 Sep 13 23:33:24 netra2 %l4-7: 0001 fc00 0001 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 02a103e6b620 zfs:zio_wait+c (30010eafcc0, 30010834940, 7, 30010eaff20, 3, 3fd0f1) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 7b7297d0 030003387d40 03000be9edf8 Sep 13 23:33:24 netra2 %l4-7: 02a103e6b7c0 0002 0002 03000a799920 Sep 13 23:33:24 netra2 genunix: [ID 723222 kern.notice] 02a103e6b6d0 zfs:dmu_objset_sync+12c (30003387d40, 3000a762c80, 1, 1, 3000be9edf8, 0) Sep 13 23:33:24 netra2 genunix: [ID 179002 kern.notice] %l0-3: 030003387d88 0002 003be93a Sep 13 23:33:24 netra2 %l4-7: 030003387e40 0020 030003387e20 030003387ea0 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 02a103e6b7e0 zfs:dsl_dataset_sync+c (30007609480, 3000a762c80, 30007609510, 30005c475b8, 30005c475b8, 30007609480) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: 0001 0007 030005c47638 0001 Sep 13 23:33:25 netra2 %l4-7: 030007609508 030005c4caa8 Sep 13 23:33:25 netra2 genunix: [ID 723222 kern.notice] 02a103e6b890 zfs:dsl_pool_sync+64 (30005c47500, 3fd0f1, 30007609480, 3000f904380, 300032bb7c0, 300032bb7e8) Sep 13 23:33:25 netra2 genunix: [ID 179002 kern.notice] %l0-3: 030010834d00 03000a762c80 030005c47698 Sep 13 23:33:25 netra2 %l4-7: 030005c47668 030005c47638
[zfs-discuss] ZFS panic in space_map.c line 125
One of our Solaris 10 update 3 servers paniced today with the following error: Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after panic: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 125 The server saved a core file, and the resulting backtrace is listed below: $ mdb unix.0 vmcore.0 $c vpanic() 0xfb9b49f3() space_map_remove+0x239() space_map_load+0x17d() metaslab_activate+0x6f() metaslab_group_alloc+0x187() metaslab_alloc_dva+0xab() metaslab_alloc+0x51() zio_dva_allocate+0x3f() zio_next_stage+0x72() zio_checksum_generate+0x5f() zio_next_stage+0x72() zio_write_compress+0x136() zio_next_stage+0x72() zio_wait_for_children+0x49() zio_wait_children_ready+0x15() zio_next_stage_async+0xae() zio_wait+0x2d() arc_write+0xcc() dmu_objset_sync+0x141() dsl_dataset_sync+0x23() dsl_pool_sync+0x7b() spa_sync+0x116() txg_sync_thread+0x115() thread_start+8() It appears ZFS is still able to read the labels from the drive: $ zdb -lv /dev/rdsk/c3t50002AC00039040Bd0p0 LABEL 0 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 1 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 2 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 3 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 But for some reason it is unable to open the pool: $ zdb -c fpool0 zdb: can't open fpool0: error 2 I saw several bugs related to space_map.c, but the stack traces listed in the bug reports were different than the one listed above. Has anyone seen this bug before? Is there anyway to recover from it? Thanks for any insight, - Ryan -- UNIX Administrator http://prefetch.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS panic in space_map.c line 125
Hi Matty, From the stack I saw, that is 6454482. But this defect has been marked as 'Not reproducible', I have no idea about how to recover from it, but looks like new update will not hit this issue. Matty wrote: One of our Solaris 10 update 3 servers paniced today with the following error: Sep 18 00:34:53 m2000ef savecore: [ID 570001 auth.error] reboot after panic: assertion failed: ss != NULL, file: ../../common/fs/zfs/space_map.c, line: 125 The server saved a core file, and the resulting backtrace is listed below: $ mdb unix.0 vmcore.0 $c vpanic() 0xfb9b49f3() space_map_remove+0x239() space_map_load+0x17d() metaslab_activate+0x6f() metaslab_group_alloc+0x187() metaslab_alloc_dva+0xab() metaslab_alloc+0x51() zio_dva_allocate+0x3f() zio_next_stage+0x72() zio_checksum_generate+0x5f() zio_next_stage+0x72() zio_write_compress+0x136() zio_next_stage+0x72() zio_wait_for_children+0x49() zio_wait_children_ready+0x15() zio_next_stage_async+0xae() zio_wait+0x2d() arc_write+0xcc() dmu_objset_sync+0x141() dsl_dataset_sync+0x23() dsl_pool_sync+0x7b() spa_sync+0x116() txg_sync_thread+0x115() thread_start+8() It appears ZFS is still able to read the labels from the drive: $ zdb -lv /dev/rdsk/c3t50002AC00039040Bd0p0 LABEL 0 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 1 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 2 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 LABEL 3 version=3 name='fpool0' state=0 txg=4 pool_guid=10406529929620343615 top_guid=3365726235666077346 guid=3365726235666077346 vdev_tree type='disk' id=0 guid=3365726235666077346 path='/dev/dsk/c3t50002AC00039040Bd0p0' devid='id1,[EMAIL PROTECTED]/q' whole_disk=0 metaslab_array=13 metaslab_shift=31 ashift=9 asize=322117566464 But for some reason it is unable to open the pool: $ zdb -c fpool0 zdb: can't open fpool0: error 2 I saw several bugs related to space_map.c, but the stack traces listed in the bug reports were different than the one listed above. Has anyone seen this bug before? Is there anyway to recover from it? Thanks for any insight, - Ryan -- Regards, Robin Guo, Xue-Bin Guo Solaris Kernel and Data Service QE, Sun China Engineering and Reserch Institute Phone: +86 10 82618200 +82296 Email: [EMAIL PROTECTED] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic on 32bit x86
hi all, I was extracting a 8GB tar and encountered this panic. the system was just installed last week with Solaris 10 update 3 and the latest recommended patches as of June 26. I can provide more output from mdb, or the crashdump itself if it would be of any use. any ideas what's going on here? # uname -a SunOS fang 5.10 Generic_125101-09 i86pc i386 i86pc # mdb -k unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba fcp fctl nca md lofs zfs random nfs sppp crypto ptm fcip cpc logindmux ] ::status debugging crash dump vmcore.0 (32-bit) from fang operating system: 5.10 Generic_125101-09 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=d4ab09d4 addr=20 occurred in module zfs due to a NULL pointer dereference dump content: kernel pages only *panic_thread::findstack -v stack pointer for thread d490d600: d4ab08d0 d4ab09d4 0xd4ab08f4() d4ab0a30 zap_leaf_lookup+0x25(ebe7bc80, d4ab0bf0, 0, ea7a6950, d4ab0a60, d4ab0bf0) d4ab0a9c fzap_lookup+0x88(ebe7bc80, d4ab0bf0, 8, 0, 1, 0) d4ab0ad0 zap_lookup+0xb0(d4e02f48, 209ce, 0, d4ab0bf0, 8, 0) d4ab0b28 zfs_dirent_lock+0x23e(d4ab0b58, d82d8cd0, d4ab0bf0, d4ab0b54, 6) d4ab0b5c zfs_dirlook+0x9b(d82d8cd0, d4ab0bf0, d4ab0d30) d4ab0b84 zfs_lookup+0x6f(e219fd80, d4ab0bf0, d4ab0d30, d4ab0da0, 1, d3a65c00) d4ab0bc0 fop_lookup+0x2c(e219fd80, d4ab0bf0, d4ab0d30, d4ab0da0, 1, d3a65c00) d4ab0d38 lookuppnvp+0x295(d4ab0da0, 0, 0, d4ab0e50, 0, d3a65c00) d4ab0d70 lookuppnat+0xe8(d4ab0da0, 0, 0, d4ab0e50, 0, 0) d4ab0e58 vn_createat+0x9f(8090840, 0, d4ab0e98, 1, 80, d4ab0f00) d4ab0f0c vn_openat+0x323(8090840, 0, 2502, 180, d4ab0f68, 0) d4ab0f6c copen+0x24f() d4ab0f84 open64+0x1d() d4ab0fac sys_sysenter+0x100() ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS panic caused by an exported zpool??
Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 offline or reservation conflict Apr 23 02:02:21 SERVER144 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/[EMAIL PROTECTED] (sd82): Apr 23 02:02:21 SERVER144 i/o to invalid geometry Apr 23 02:02:22 SERVER144 unix: [ID 836849 kern.notice] Apr 23 02:02:22 SERVER144 ^Mpanic[cpu1]/thread=ff0017fa1c80: Apr 23 02:02:22 SERVER144 genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on unknown off 0: zio 9a5d4cc0 [L0 bplist] 4000L/4000P DVA[0]=0:770b24 000:4000 DVA[1]=0:dfa984000:4000 fletcher4 uncompressed LE contiguous birth=260276 fill=1 cksum=1:1000:800800:2ab2ab000): error 5 Apr 23 02:02:22 SERVER144 unix: [ID 10 kern.notice] Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1a40 zfs:zio_done+17c () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1a60 zfs:zio_next_stage+b3 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1ab0 zfs:zio_wait_for_children+5d () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1ad0 zfs:zio_wait_children_done+20 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1af0 zfs:zio_next_stage+b3 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1b40 zfs:zio_vdev_io_assess+129 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1b60 zfs:zio_next_stage+b3 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1bb0 zfs:vdev_mirror_io_done+2af () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1bd0 zfs:zio_vdev_io_done+26 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1c60 genunix:taskq_thread+1a7 () Apr 23 02:02:23 SERVER144 genunix: [ID 655072 kern.notice] ff0017fa1c70 unix:thread_start+8 () Apr 23 02:02:23 SERVER144 unix: [ID 10 kern.notice] Apr 23 02:02:23 SERVER144 genunix: [ID 672855 kern.notice] syncing file systems... Apr 23 02:02:23 SERVER144 genunix: [ID 433738 kern.notice] [1] Apr 23 02:02:53 SERVER144 last message repeated 20 times Apr 23 02:02:54 SERVER144 genunix: [ID 622722 kern.notice] done (not all i/o completed) Apr 23 02:02:55 SERVER144 genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c2t0d0s3, offset 1677983744, content: kernel Apr 23 02:06:43 SERVER144 genunix: [ID 409368 kern.notice] ^M100% done: 1875291 pages dumped, compression ratio 3.34, Apr 23 02:06:43 SERVER144 genunix: [ID 851671 kern.notice] dump succeeded sd82 is a lun used on a zpool that has been exported 2 days ago ... gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org
[zfs-discuss] zfs panic installing a brandz zone
Folks, before I start delving too deeply into this crashdump, has anyone seen anything like it? The background is that I'm running a non-debug open build of b49 and was in the process of running the zoneadm -z redlx install After a bit, the machine panics, initially looking at the crashdump, I'm down to 88mb free (out of a gig) and see the following stack. fe8000de7800 page_unlock+0x3b(180218720) fe8000de78d0 zfs_getpage+0x236(89b84d80, 12000, 2000, fe8000de7a1c, fe8000de79b8, 2000, fbc29b20, fe808180a000, 1, 80826dc8) fe8000de7950 fop_getpage+0x52(89b84d80, 12000, 2000, fe8000de7a1c, fe8000de79b8, 2000, fbc29b20, fe8081818000, 1, 80826dc8) fe8000de7a50 segmap_fault+0x1d6(801a6f38, fbc29b20, fe8081818000, 2000, 0, 1) fe8000de7b30 segmap_getmapflt+0x67a(fbc29b20, 89b84d80, 12000, 2000, 1, 1) fe8000de7bd0 lofi_strategy_task+0x14b(959d2400) fe8000de7c60 taskq_thread+0x1a7(84453da8) fe8000de7c70 thread_start+8() %rax = 0x %r9 = 0x0300430e %rbx = 0x000e %r10 = 0x1000 %rcx = 0xfe8081819000 %r11 = 0x113709b0 %rdx = 0xfe8000de7c80 %r12 = 0x000180218720 %rsi = 0x00013000 %r13 = 0xfbc52160 pse_mutex+0x200 %rdi = 0xfbc52160 pse_mutex+0x200 %r14 = 0x4000 %r8 = 0x0200 %r15 = 0xfe8000de79d8 %rip = 0xfb8474fb page_unlock+0x3b %rbp = 0xfe8000de7800 %rsp = 0xfe8000de77e0 %rflags = 0x00010246 id=0 vip=0 vif=0 ac=0 vm=0 rf=1 nt=0 iopl=0x0 status=of,df,IF,tf,sf,ZF,af,PF,cf %cs = 0x0028%ds = 0x0043%es = 0x0043 %trapno = 0xe %fs = 0xfsbase = 0x8000 %err = 0x0 %gs = 0x01c3gsbase = 0xfbc27b70 While the panic string says NULL pointer dereference, it appears that 0x180218720 is not mapped. The dereference looks like the first dereference in page_unlock(), which looks at pp-p_selock. I can spend a little time looking at it, but was wondering if anyone had seen this kind of panic previously? I have two identical crashdumps created in exactly the same way. alan. -- Alan Hargreaves - http://blogs.sun.com/tpenta Staff Engineer (Kernel/VOSJEC/Performance) Systems Technical Service Center Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs panic: assertion failed: zp_error != 0 dzp_error != 0
I made some powernow experiments on a dual core amd64 box, running the 64-bit debug on-20060828 kernel. At some point the kernel seemed to make no more progress (probably a bug in the multiprocessor powernow code), the gui was stuck, so I typed (blind) F1-A + $systemdump. Writing the crashdump worked. When the system rebooted, mounting the zfs filesystems paniced the box with the following failed assertion; rebooting the debug kernel (both 64-bit and 32-bit) resulted in the same assertion failed panic: ::status debugging crash dump vmcore.12 (32-bit) from moritz operating system: 5.11 wos_b48_2_debug (i86pc) panic message: assertion failed: zp_error != 0 dzp_error != 0, file: ../../common/fs/zfs/zfs_acl.c, line: 1537 dump content: kernel pages only ::stack vpanic(fea6bfa4, f85a5984, f85a5964, 601) assfail+0x5a(f85a5984, f85a5964, 601) zfs_zaccess_delete+0x13a(ca45a790, ca45a670, cd1f1e78) zfs_remove+0x88(ca453340, caa5b028, cd1f1e78) fop_remove+0x1e(ca453340, caa5b028, cd1f1e78) zfs_replay_remove+0x57(c94172c0, caa5b000, 0) zil_replay_log_record+0x256(c9da24c0, ca9e2708, ca6ded54, 70cf, 0) zil_parse+0x374(c9da24c0, 0, f8568188, ca6ded54, 70cf, 0) zil_replay+0xba(ca454f88, c94172c0, c94172e4, c50b2c6c, f8572ba0) zfs_domount+0x24a(c58cbf00, c79e89c0, cd1f1588) zfs_mount+0x109(c58cbf00, ca6e4c00, ca6def84, cd1f1588) fsop_mount+0x1a(c58cbf00, ca6e4c00, ca6def84, cd1f1588) domount+0x8ad(0, ca6def84, ca6e4c00, cd1f1588, ca6def48) mount+0x6f(ca6def84, ca6def68) syscall_ap+0x4d() sys_sysenter+0x1a2() Now I see that exactly the same panic with identical stack backtrace has already filed as bug 6466374, http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6466374 But why is bug 6466374 closed as a duplicate of bug 6413510? I'd say that bugs 6466374 and 6413510 are about completely different issues... When looking at the code in zfs_zaccess_delete() and zfs_zaccess_common(), it seems that when replaying a remove from the log, a debug version of the zfs filesystem module always panics with the zp_error != 0 dzp_error != 0 failed assertion. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss