Re: [zfs-discuss] Metadata corrupted
Were you able to fix this problem in the end? Unfortunately, no. I believe Matthew Ahrens took a look at it and couldn't find the cause or how to fix it. We had to destroy the pool and re-create it from scratch. Fortunately, this was during the ZFS testing period, and no critically important data was lost, but I am still a bit shaken by the incident. Since then we did eventually adopt ZFS, and it has been running well without further such problems for over a year now. This leads me to believe it was either a software bug, or a hardware failure that triggered a fatal condition in the software that is not resilient to error in a redundant configuration. I am sincerely hoping that this has been fixed, on purpose or by accident. Cheers, Siegfried This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] NFS and Tar/Star Performance
On 12-Jun-07, at 9:02 AM, eric kustarz wrote: Comparing a ZFS pool made out of a single disk to a single UFS filesystem would be a fair comparison. What does your storage look like? The storage looks like: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 All disks are local SATA/300 drives with SATA framework on marvell card. The SATA drives are consumer drives with 16MB cache. I agree it's not a fair comparison, especially with raidz over 6 drives. However, a performance difference of 10x is fairly large. I do not have a single drive available to test ZFS with and compare it to UFS, but I have done similar tests in the past with one ZFS drive without write cache, etc. vs. a UFS drive of the same brand/ size. The difference was still on the order of 10x slower for the ZFS drive over NFS. What could cause such a large difference? Is there a way to measure NFS_COMMIT latency? Cheers, Siegfried ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] NFS and Tar/Star Performance
This is an old topic, discussed many times at length. However, I still wonder if there are any workarounds to this issue except disabling ZIL, since it makes ZFS over NFS almost unusable (it's a whole magnitude slower). My understanding is that the ball is in the hands of NFS due to ZFS's design. The testing results are below. Solaris 10u3 AMD64 server with Mac client over gigabit ethernet. The filesystem is on a 6 disk raidz1 pool, testing the performance of untarring (with bzip2) the Linux 2.6.21 source code. The archive is stored locally and extracted remotely. Locally --- tar xfvj linux-2.6.21.tar.bz2 real4m4.094s, user0m44.732s, sys 0m26.047s star xfv linux-2.6.21.tar.bz2 real1m47.502s, user0m38.573s, sys 0m22.671s Over NFS tar xfvj linux-2.6.21.tar.bz2 real48m22.685s, user0m45.703s, sys 0m59.264s star xfv linux-2.6.21.tar.bz2 real49m13.574s, user0m38.996s, sys 0m35.215s star -no-fsync -x -v -f linux-2.6.21.tar.bz2 real49m32.127s, user0m38.454s, sys 0m36.197s The performance seems pretty bad, lets see how other protocols fare. Over Samba -- tar xfvj linux-2.6.21.tar.bz2 real4m34.952s, user0m44.325s, sys 0m27.404s star xfv linux-2.6.21.tar.bz2 real4m2.998s, user0m44.121s, sys 0m29.214s star -no-fsync -x -v -f linux-2.6.21.tar.bz2 real4m13.352s, user0m44.239s, sys 0m29.547s Over AFP tar xfvj linux-2.6.21.tar.bz2 real3m58.405s, user0m43.132s, sys 0m40.847s star xfv linux-2.6.21.tar.bz2 real19m44.212s, user0m38.535s, sys 0m38.866s star -no-fsync -x -v -f linux-2.6.21.tar.bz2 real3m21.976s, user0m42.529s, sys 0m39.529s Samba and AFP are much faster, except the fsync'ed star over AFP. Is this a ZFS or NFS issue? Over NFS to non-ZFS drive - tar xfvj linux-2.6.21.tar.bz2 real5m0.211s, user0m45.330s, sys 0m50.118s star xfv linux-2.6.21.tar.bz2 real3m26.053s, user0m43.069s, sys 0m33.726s star -no-fsync -x -v -f linux-2.6.21.tar.bz2 real3m55.522s, user0m42.749s, sys 0m35.294s It looks like ZFS is the culprit here. The untarring is much faster to a single 80 GB UFS drive than a 6 disk raid-z array over NFS. Cheers, Siegfried PS. Getting netatalk to compile on amd64 Solaris required some changes since i386 wasn't being defined anymore, and somehow it thought the architecture was sparc64 for some linking steps. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Saving scrub results before scrub completes
a system hang. I have this other strange problem where if I send certain files over the network (CIFS or NFS), the machine slows to a crawl until it is hung. This is reproducible every time with the same special files, but it does not happen locally, only over the network. I already posted about this in network-discuss and am currently investigating the issue. Additionally, you can look at the corefile using mdb and take a look at the vdev error stats. Here's an example (hopefully the formatting doesn't get messed up): Excellent information, thanks! It looks like there are no read/write/ chksum errors. I now at least have a way of checking the scrub results until the panic is fixed (hopefully someday). Siegfried ::spa -v ADDR STATE NAME 060004473680ACTIVE test ADDR STATE AUX DESCRIPTION 060004bcb500 HEALTHY -root 060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 060004bcb500::vdev -re ADDR STATE AUX DESCRIPTION 060004bcb500 HEALTHY -root READWRITE FREECLAIM IOCTL OPS 000 00 BYTES 000 00 EREAD 0 EWRITE0 ECKSUM0 060004bcafc0 HEALTHY - /dev/dsk/c0t2d0s0 READWRITE FREECLAIM IOCTL OPS0x170x1d20 00 BYTES 0x19c000 0x11da000 00 EREAD 0 EWRITE0 ECKSUM0 This will show you and read/write/cksum errors. Thanks, George Siegfried Nikolaivich wrote: Hello All, I am wondering if there is a way to save the scrub results right before the scrub is complete. After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes. The scrub results seem to be cleared when system boots back up, so I never get a chance to see them. Does anyone know of a simple way? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Saving scrub results before scrub completes
Hello All, I am wondering if there is a way to save the scrub results right before the scrub is complete. After upgrading to Solaris 10U3 I still have ZFS panicing right as the scrub completes. The scrub results seem to be cleared when system boots back up, so I never get a chance to see them. Does anyone know of a simple way? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Panic while scrubbing
Hello, I am not sure if I am posting in the correct forum, but it seems somewhat zfs related, so I thought I'd share it. While the machine was idle, I started a scrub. Around the time the scrubbing was supposed to be finished, the machine panicked. This might be related to the 'metadata corruption' that happened earlier to me. Here is the log, any ideas? Oct 24 20:13:51 FServe unix: [ID 836849 kern.notice] Oct 24 20:13:51 FServe ^Mpanic[cpu0]/thread=fe8000311c80: Oct 24 20:13:51 FServe genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=fe80003119c0 addr=fe00e24c6218 Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] Oct 24 20:13:51 FServe unix: [ID 839527 kern.notice] sched: Oct 24 20:13:51 FServe unix: [ID 753105 kern.notice] #pf Page fault Oct 24 20:13:51 FServe unix: [ID 532287 kern.notice] Bad kernel fault at addr=0xfe00e24c6218 Oct 24 20:13:51 FServe unix: [ID 243837 kern.notice] pid=0, pc=0xfb92c360, sp=0xfe8000311ab0, eflags=0x10282 Oct 24 20:13:51 FServe unix: [ID 211416 kern.notice] cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f0xmme,fxsr,pge,mce,pae,pse Oct 24 20:13:51 FServe unix: [ID 354241 kern.notice] cr2: fe00e24c6218 cr3: a22b000 cr8: c Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rdi: 84233e88 rsi: fe00e24c6208 rdx: 3f8038931883 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rcx:0 r8:1 r9: Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]rax:2 rbx: fe80eb90f7c0 rbp: fe8000311ab0 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]r10: a5de7488 r11:1 r12: 84233e88 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]r13:2 r14: fe80eb90f7c0 r15: 84233dd8 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]fsb: 8000 gsb: fbc24060 ds: 43 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice] es: 43 fs:0 gs: 1c3 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice]trp:e err:0 rip: fb92c360 Oct 24 20:13:51 FServe unix: [ID 592667 kern.notice] cs: 28 rfl:10282 rsp: fe8000311ab0 Oct 24 20:13:51 FServe unix: [ID 266532 kern.notice] ss: 30 Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003118d0 unix:real_mode_end+58d1 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003119b0 unix:trap+d77 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe80003119c0 unix:_cmntrap+13f () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ab0 genunix:avl_insert+60 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ae0 genunix:avl_add+33 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311b60 zfs:vdev_queue_io_to_issue+1ec () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311ba0 zfs:zfsctl_ops_root+33c6e7a1 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311bc0 zfs:vdev_disk_io_done+11 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311bd0 zfs:vdev_io_done+12 () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311be0 zfs:zio_vdev_io_done+1b () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311c60 genunix:taskq_thread+bc () Oct 24 20:13:51 FServe genunix: [ID 655072 kern.notice] fe8000311c70 unix:thread_start+8 () Oct 24 20:13:51 FServe unix: [ID 10 kern.notice] Oct 24 20:13:51 FServe genunix: [ID 672855 kern.notice] syncing file systems... Oct 24 20:13:51 FServe genunix: [ID 904073 kern.notice] done Oct 24 20:13:52 FServe genunix: [ID 111219 kern.notice] dumping to /dev/dsk/c0t3d0s1, offset 860356608, content: kernel Oct 24 20:13:52 FServe marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx0: error on port 3: Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] device disconnected Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] device connected Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] SError interrupt Oct 24 20:13:52 FServe marvell88sx: [ID 131198 kern.info] SErrors: Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] Recovered communication error Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] PHY ready change Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] 10-bit to 8-bit decode error Oct 24 20:13:52 FServe marvell88sx: [ID 517869 kern.info] Disparity error Oct 24 20:13:57 FServe genunix: [ID 409368 kern.notice] ^M100% done: 150751 pages dumped, compression ratio 4.23, Oct 24 20:13:57 FServe genunix: [ID 851671 kern.notice] dump succeeded Thanks,
Re: [zfs-discuss] Panic while scrubbing
On 24-Oct-06, at 9:11 PM, James McPherson wrote: this error from the marvell88sx driver is of concern, The 10b8b decode and disparity error messages make me think that you have a bad piece of hardware. I hope it's not your controller but I can't tell without more data. You should have a look at the iostat -En output for the device on marvell88sx instance #0, attached as port 3. If there are any error counts above 0 then - after checking /var/adm/messages for medium errors - you should probably replace the disk. I have just tried to do a 'zpool scrub' and I got the same result - a panic right when the scrub finishes (no errors found during / after panic). So I guess this problem is reproducible (and might not be an intermittent hardware malfunction). It is funny I get the marvell88sx driver error for port 3 as that is the Solaris UFS drive, whereas the rest of the ports are setup for ZFS. Since the scrub seems to be causing the panic, I don't see why an error on the root drive would be the root cause. Note that this error comes in the log after it is trying to make a dump of the panic: genunix: [ID 111219 kern.notice] dumping to /dev/ dsk/c0t3d0s1, offset 860356608, content: kernel By the way, this is what iostat -En shows for port 3: c0t3d0 Soft Errors: 24 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 24 Predictive Failure Analysis: 0 And this is shown on the rest of the ports: c0t?d0 Soft Errors: 6 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 6 Predictive Failure Analysis: 0 Thanks, Siegfried ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Panic while scrubbing
On 24-Oct-06, at 9:47 PM, James McPherson wrote: On 10/25/06, Siegfried Nikolaivich [EMAIL PROTECTED] wrote: And this is shown on the rest of the ports: c0t?d0 Soft Errors: 6 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 6 Predictive Failure Analysis: 0 Hmm. All your disks attached to the same controller and showing entries in the Illegal Request field . what's the common component between them - the cable? I guess the common component between them is the power supply. Each drive has its own SATA cable connected directly to the controller. Could you look through your msgbuf and/or /var/adm/messages and find the full text of when these Illegal Request errors were logged. That will give an idea of where to look next. That is the part I can't figure out. Nowhere does it say Illegal Request except when I run iostat -nE. I found out that the Illegal Request count can be incremented on the ZFS drives by starting a scrub. For example: # iostat -nE ... c0t2d0 Soft Errors: 8 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 8 Predictive Failure Analysis: 0 c0t3d0 Soft Errors: 24 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 24 Predictive Failure Analysis: 0 ... # zpool scrub tank # iostat -nE ... c0t2d0 Soft Errors: 9 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 9 Predictive Failure Analysis: 0 c0t3d0 Soft Errors: 24 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: ST3320620AS Revision: CSerial No: Size: 320.07GB 320072932864 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 24 Predictive Failure Analysis: 0 ... # zpool scrub -s tank (no panic at this point) Happens every time. Thanks, Siegfried ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Metadata corrupted
On Mon, Oct 09, 2006 at 11:08:14PM -0700, Matthew Ahrens wrote: You may also want to try 'fmdump -eV' to get an idea of what those faults were. I am not sure how to interpret the results, maybe you can help me. It looks like the following with many more similar pages following: % fmdump -eV TIME CLASS Oct 07 2006 17:28:48.265102839 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0x933872163a1 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xbe23c6961def3450 vdev = 0x46f50fe03a3fd818 (end detector) pool = tank pool_guid = 0xbe23c6961def3450 pool_context = 0 vdev_guid = 0x46f50fe03a3fd818 vdev_type = disk vdev_path = /dev/dsk/c0t1d0s0 parent_guid = 0x3bb6ede3be1cf975 parent_type = raidz zio_err = 0 zio_offset = 0x1c3644ae00 zio_size = 0xac00 zio_objset = 0x20 zio_object = 0x78 zio_level = 0 zio_blkid = 0xafaf __ttl = 0x1 __tod = 0x45284640 0xfcd25f7 Oct 07 2006 17:31:24.616729701 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0xb7a0bad55900401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xbe23c6961def3450 vdev = 0xa543197df30d1460 (end detector) pool = tank pool_guid = 0xbe23c6961def3450 pool_context = 0 vdev_guid = 0xa543197df30d1460 vdev_type = disk vdev_path = /dev/dsk/c0t2d0s0 parent_guid = 0x3bb6ede3be1cf975 parent_type = raidz zio_err = 0 zio_offset = 0x30d218e00 zio_size = 0xac00 zio_objset = 0x20 zio_object = 0xea zio_level = 0 zio_blkid = 0x7577 __ttl = 0x1 __tod = 0x452846dc 0x24c28c65 Oct 07 2006 17:31:24.903968466 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0xb7b1da39251 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xbe23c6961def3450 vdev = 0x46f50fe03a3fd818 (end detector) pool = tank pool_guid = 0xbe23c6961def3450 pool_context = 0 vdev_guid = 0x46f50fe03a3fd818 vdev_type = disk vdev_path = /dev/dsk/c0t1d0s0 parent_guid = 0x3bb6ede3be1cf975 parent_type = raidz zio_err = 0 zio_offset = 0x30e558800 zio_size = 0xac00 zio_objset = 0x20 zio_object = 0xea zio_level = 0 zio_blkid = 0x7724 __ttl = 0x1 __tod = 0x452846dc 0x35e176d2 Oct 07 2006 17:31:52.178481693 ereport.fs.zfs.checksum nvlist version: 0 class = ereport.fs.zfs.checksum ena = 0xbe0bb6f3b11 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0xbe23c6961def3450 vdev = 0xa543197df30d1460 (end detector) pool = tank pool_guid = 0xbe23c6961def3450 pool_context = 0 vdev_guid = 0xa543197df30d1460 vdev_type = disk vdev_path = /dev/dsk/c0t2d0s0 parent_guid = 0x3bb6ede3be1cf975 parent_type = raidz zio_err = 0 zio_offset = 0x375e12800 zio_size = 0xac00 zio_objset = 0x20 zio_object = 0xec zio_level = 0 zio_blkid = 0x7788 __ttl = 0x1 __tod = 0x452846f8 0xaa36a1d Cheers, Albert This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Metadata corrupted
Yeah, good catch. So this means that it seems to be able to read the label off of each device OK, and the labels look good. I'm not sure what else would cause us to be unable to open the pool... Can you try running 'zpool status -v'? The command seems to return the same thing: % zpool status -v pool: tank state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-CS scrub: none requested config: NAMESTATE READ WRITE CKSUM tankFAULTED 0 0 6 corrupted data raidz ONLINE 0 0 6 c0t0d0 ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 I can provide you with SSH access if you want. Thanks, Siegfried This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Metadata corrupted
status: The pool metadata is corrupted and the pool cannot be opened. Is there at least a way to determine what caused this error? Is it a hardware issue? Is it a possible defect in ZFS? I don't think it's a hardware issue because it seems to be still working fine, and has been for months. It's important to have this information so that I/we can prevent it from happening next time. Thanks, Siegfried This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss