Not an really good subject, I know but that's kind of what happend.

I'm trying to build an backup-solution server, Windows users using OSCAR (which 
uses rsync) to sync their files to an folder and when complete takes a 
snapshot. It has worked before but then I turned on the -R switch to rsync and 
when I then removed the folder with rm -rf it crashed. I didn't save what the 
error that the shell gave me but I know is mentioned something about an colon 
(:).

I have no idea how to debug the Solaris kernel, I am both an ZFS and an Solaris 
newbie, but I've managed to gather some information that I've seen others 
pasting:

bash-3.00# mdb 0
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp scsi_vhci ufs 
ip sctp arp usba uhci nca lofs zfs random sppp ptm nfs crypto ipc md cpc fcip 
fctl fcp logindmux ]
> ::status
debugging crash dump vmcore.0 (32-bit) from unknown
operating system: 5.11 snv_48 (i86pc)
panic message:
ZFS: bad checksum (read on <unknown> off 0: zio dbbf8880 [L1 ZFS plain file] 
4000L/400P DVA[0]=<0:6aa000:800> DVA[1]=<0:b80a7000:800> fletcher4 lzjb LE 
contiguous birth=12316 fill=2 cksum=601eba85e8:
dump content: kernel pages only
> ::spa
ADDR         STATE NAME
da9f5ac0    ACTIVE backup
> ::memstat
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      41647               162   32%
Anon                        56673               221   44%
Exec and libs               11331                44    9%
Page cache                   2963                11    2%
Free (cachelist)            11554                45    9%
Free (freelist)              4742                18    4%

Total                      128910               503
Physical                   128909               503
> $C
db95bae4 vpanic(f9f95858, d7a91388, d7a913b8, f9f94660, 0, 0)
db95bc70 zio_done+0x122(dbbf8880)
db95bc8c zio_next_stage+0x66(dbbf8880)
db95bcac zio_wait_for_children+0x46(dbbf8880, 11, dbbf8a70)
db95bcc0 zio_wait_children_done+0x18(dbbf8880)
db95bcd8 zio_next_stage+0x66(dbbf8880)
db95bd10 zio_vdev_io_assess+0x11a(dbbf8880)
db95bd24 zio_next_stage+0x66(dbbf8880)
db95bd64 vdev_mirror_io_done+0x289(dbbf8880)
db95bd78 zio_vdev_io_done+0x25()
db95bdc8 taskq_thread+0x176(d430a290, 0)
db95bdd8 thread_start+8()

bash-3.00# zpool status -v
  pool: backup
 state: FAULTED
status: One or more devices could not be used because the the label is missing
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        backup      UNAVAIL      0     0     0  insufficient replicas
          raidz1    UNAVAIL      0     0     0  insufficient replicas
            c0d1    FAULTED      0     0     0  corrupted data
            c1d0    FAULTED      0     0     0  corrupted data
            c1d1    ONLINE       0     0     0

The most important stuff from /var/adm/messages:

Oct  5 14:02:39 unknown sshd[5421]: [ID 800047 auth.crit] fatal: Read from 
socket failed: Connection reset by peer
Oct  5 14:04:40 unknown unix: [ID 836849 kern.notice]
Oct  5 14:04:40 unknown ^Mpanic[cpu0]/thread=db95bde0:
Oct  5 14:04:40 unknown genunix: [ID 809409 kern.notice] ZFS: bad checksum 
(read on <unknown> off 0: zio dbbf8880 [L1 ZFS plain file] 4000L/400P 
DVA[0]=<0:6aa000:800> DVA[1]=<0:b80a7000:800> fletcher4 lzjb LE contiguous 
birth=12316 fill=2 
cksum=601eba85e8:408ce39c2ee8:172f0d844a20fa:5dfa8a1d4d38722): error 50
Oct  5 14:04:40 unknown unix: [ID 100000 kern.notice]
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bc70 
zfs:zio_done+122 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bc8c 
zfs:zio_next_stage+66 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bcac 
zfs:zio_wait_for_children+46 (dbbf8880, 11, dbbf8)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bcc0 
zfs:zio_wait_children_done+18 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bcd8 
zfs:zio_next_stage+66 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bd10 
zfs:zio_vdev_io_assess+11a (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bd24 
zfs:zio_next_stage+66 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bd64 
zfs:vdev_mirror_io_done+289 (dbbf8880)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bd78 
zfs:zio_vdev_io_done+25 (dbbf8880, 0, 0, 0, )
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bdc8 
genunix:taskq_thread+176 (d430a290, 0)
Oct  5 14:04:40 unknown genunix: [ID 353471 kern.notice] db95bdd8 
unix:thread_start+8 ()
Oct  5 14:04:40 unknown unix: [ID 100000 kern.notice]
Oct  5 14:04:40 unknown genunix: [ID 672855 kern.notice] syncing file 
systems...Oct  5 14:04:40 unknown genunix: [ID 733762 kern.notice]  2
Oct  5 14:04:41 unknown genunix: [ID 733762 kern.notice]  1
Oct  5 14:04:42 unknown genunix: [ID 904073 kern.notice]  done
Oct  5 14:04:43 unknown genunix: [ID 111219 kern.notice] dumping to 
/dev/dsk/c0d0s3, offset 144769024, content: kernel
Oct  5 14:04:50 unknown genunix: [ID 409368 kern.notice] ^M100% done: 44041 
pages dumped, compression ratio 2.85,
Oct  5 14:04:50 unknown genunix: [ID 851671 kern.notice] dump succeeded

The machine is an Intel P4 laptop (1GB of RAM which Solaris is using 512MB of) 
running Window SP2 with Solaris 10 Nevada b48 inside VMWare Player. I noticed 
that almost right away there were some (5-10) CKSUM errors but those might be 
caused by me pausing the machine turning off the computer and starting over the 
next day?

I'm not on the mailinglist, though I am subscribed to the RSS.

Thanks,

- Simon
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to