On 10/21/2014 08:54 AM, Nick via smartos-discuss wrote: > Server running SmartOS 20140904T175324Z. Rebooted last night with a system > panic -- bad trap error. Here is some mdb info: > > mdb unix.2 vmcore.2 > Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix > scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm > mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs sata ] >> ::status > debugging crash dump vmcore.2 (64-bit) from xxxxx > operating system: 5.11 joyent_20140904T175324Z (i86pc) > image uuid: (not set) > panic message: BAD TRAP: type=d (#gp General protection) > rp=ffffff0021b8c1c0 addr=ffffff0021b8c3a8 > dump content: kernel pages only >> $C > ffffff0021b8c360 mutex_enter+0xb() > ffffff0021b8c390 dnode_hold+0x28(ffffff04f9530040, 846ce, fffffffff7e07041, > ffffff0021b8c3a8) > ffffff0021b8c3f0 dmu_bonus_hold+0x37(ffffff04f9530040, 846ce, 0, > ffffff0021b8c468) > ffffff0021b8c420 sa_buf_hold+0x1d(ffffff04f9530040, 846ce, 0, > ffffff0021b8c468) > ffffff0021b8c4c0 zfs_zget+0x64(ffffff04e9e6d800, 846ce, ffffff0021b8c5f0) > ffffff0021b8c5a0 zfs_dirent_lock+0x516(ffffff0021b8c5f8, ffffff04eeea5010, > ffffff0021b8c9d0, ffffff0021b8c5f0, 6, 0, 0) > ffffff0021b8c660 zfs_dirlook+0x94(ffffff04eeea5010, ffffff0021b8c9d0, > ffffff0021b8c808, 0, 0, 0) > ffffff0021b8c700 zfs_lookup+0x3da(ffffff0570bd2e80, ffffff0021b8c9d0, > ffffff0021b8c808, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0, > 0, 0, 0) > ffffff0021b8c7b0 fop_lookup+0xa2(ffffff0570bd2e80, ffffff0021b8c9d0, > ffffff0021b8c808, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0, > 0, 0, 0) > ffffff0021b8c870 lo_lookup+0xbc(ffffff0554aa1240, ffffff0021b8c9d0, > ffffff0021b8cb18, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0, > 0, 0, 0) > ffffff0021b8c920 fop_lookup+0xa2(ffffff0554aa1240, ffffff0021b8c9d0, > ffffff0021b8cb18, ffffff0021b8cca0, 0, ffffff04f3598780, ffffff061b420ee0, > 0, 0, 0) > ffffff0021b8cb80 lookuppnvp+0x1f6(ffffff0021b8cca0, 0, 0, 0, > ffffff0021b8ce48, ffffff04f3598780, ffffff05fae1d980, ffffff061b420ee0) > ffffff0021b8cc20 lookuppnatcred+0x15e(ffffff0021b8cca0, 0, 0, 0, > ffffff0021b8ce48, 0, ffffff061b420ee0) > ffffff0021b8cd20 lookupnameatcred+0xe9(fffffd7fffdf4e50, 0, 0, 0, > ffffff0021b8ce48, 0, ffffff061b420ee0) > ffffff0021b8cd70 lookupnameat+0x39(fffffd7fffdf4e50, 0, 0, 0, > ffffff0021b8ce48, 0) > ffffff0021b8ce10 cstatat_getvp+0x107(ffd19553, fffffd7fffdf4e50, 0, > ffffff0021b8ce48, ffffff0021b8ce40) > ffffff0021b8ceb0 cstatat+0x6f(ffd19553, fffffd7fffdf4e50, fffffd7fffdf4dd0, > 1000, 0) > ffffff0021b8cee0 fstatat+0x42(ffd19553, fffffd7fffdf4e50, fffffd7fffdf4dd0, > 1000) > ffffff0021b8cf00 lstat+0x25(fffffd7fffdf4e50, fffffd7fffdf4dd0) > ffffff0021b8cf10 sys_syscall+0x17a() >> >> ::panicinfo > cpu 6 > thread ffffff050125c180 > message BAD TRAP: type=d (#gp General protection) > rp=ffffff0021b8c1c0 addr=ffffff0021b8c3a8 > rdi fffbff05e2a52250 > rsi e > rdx ffffff050125c180 > rcx 0 > r8 200db1e5970aea > r9 150 > rax 0 > rbx 846ce > rbp ffffff0021b8c360 > r10 fffffffffb8542b8 > r11 1 > r12 0 > r13 fffbff05e2a52250 > r14 1 > r15 ffffff0021b8c3a8 > fsbase fffffd7fff122a40 > gsbase ffffff04e6561500 > ds 4b > es 4b > fs 0 > gs 0 > trapno d > err 0 > rip fffffffffb85ef5b > cs 30 > rflags 10246 > rsp ffffff0021b8c2b8 > ss 38 > gdt_hi 0 > gdt_lo 2000ffff > idt_hi 0 > idt_lo 1000ffff > ldt 0 > task 70 > cr0 80050033 > cr2 4ab8000 > cr3 17bbf6000 > cr4 426f8 >> ffffff050125c180::thread -p > ADDR PROC LWP CRED > ffffff050125c180 ffffff04f6eae090 ffffff0509c15040 ffffff061b420ee0 >> ffffff04f6eae090::ps -ft > S PID PPID PGID SID UID FLAGS ADDR NAME > R 13538 6531 6531 6531 0 0x42000000 ffffff04f6eae090 > /opt/local/bin/rsync --daemon --config /opt/local/etc/rsync/rsyncd.conf > T 0xffffff050125c180 <TS_ONPROC> >> ffffff04f6eae090::ptree > fffffffffbc30440 sched > ffffff04e66d0010 init > ffffff0502a1d058 rsync > ffffff04f6eae090 rsync > ffffff05f1581088 rsync > > > > So it looks like the crash happened in this rsync daemon, which is running > within an OS zone. It looks like rsync was actively syncing, so there was > high I/O going on at the time. How can a crash in rsync within an OS zone > take down the entire server? zpool scrub reports clean, all memory has been > stress tested fine (and is ECC). Is there anything else I can try in mdb to > debug this further? Thanks,
The problem that you encountered is a crash in ZFS, which is why the whole machine panicked. If you could make the dump available, then we can start taking a look at it and get folks in the ZFS community involved and helping hunt it down. Robert ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
