On Thu, Apr 22, 2021 at 09:03:22AM +0200, Hrvoje Popovski wrote: > something like this: > > x3550m4# pappnaiannc:iicc :p:o ppoolo_oolcla__ddcohoe__gg_eiettt::e m > _mmcbmualg2fkpilc2_:: chppeaag > gceke: ee mmmbppttuyfyp
This was without my kernel lock around ARP bandage, right? > ddb{9}> mach ddbcpu 0xa > Stopped at x86_ipi_db+0x12: leave > x86_ipi_db(ffff800021a2aff0) at x86_ipi_db+0x12 > x86_ipi_handler() at x86_ipi_handler+0x80 > Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23 > pool_get(ffffffff8221e568,2) at pool_get+0x43 > m_gethdr(2,1) at m_gethdr+0x3f > rtm_msg1(e,ffff800026e3cf70) at rtm_msg1+0x4c > rtm_ifchg(ffff8000005b3800) at rtm_ifchg+0x61 > if_down(ffff8000005b3800) at if_down+0xa0 > if_downall() at if_downall+0x5b > boot(104) at boot+0x99 > reboot(104) at reboot+0x5b > panic(ffffffff81df855b) at panic+0x132 > pool_do_get(ffffffff8221ebc8,2,ffff800026e3d294) at pool_do_get+0x309 > pool_get(ffffffff8221ebc8,2) at pool_get+0x95 > end trace frame: 0xffff800026e3d340, count: 0 > > ddb{10}> mach ddbcpu 0xb > Stopped at db_enter+0x10: popq %rbp > db_enter() at db_enter+0x10 > panic(ffffffff81df855b) at panic+0x12a > pool_do_get(ffffffff8221e568,2,ffff800026e43294) at pool_do_get+0x309 > pool_get(ffffffff8221e568,2) at pool_get+0x95 > m_clget(0,2,802) at m_clget+0xdd > ixgbe_get_buf(ffff80000015c0e8,e) at ixgbe_get_buf+0xa3 > ixgbe_rxfill(ffff80000015c0e8) at ixgbe_rxfill+0xae > ixgbe_queue_intr(ffff80000011ac40) at ixgbe_queue_intr+0x4f > intr_handler(ffff800026e434b0,ffff8000000cd700) at intr_handler+0x6e > Xintr_ioapic_edge4_untramp() at Xintr_ioapic_edge4_untramp+0x18f > acpicpu_idle() at acpicpu_idle+0x1ea > sched_idle(ffff800021a33ff0) at sched_idle+0x27e > end trace frame: 0x0, count: 3 Two processors 10 and 11 in pool get. CPU 10 does pool_get, panic, boot, pool_get again. CPU 11 was the one that originally stopped in ddb. Did you enter boot reboot before doing mach ddbcpu 0xa? Or how did we get the boot sequence in this trace? Can it be that both CPU paniced simultaeously? The mangled massage indicates this. Then cpu 10 saw that cpu 11 already paniced to ddb and tried to reboot. There it paniced again and got stuck in a recursive call to pool_get(). The if (db_panic) in the panic() function was not written with simultaneous panics on multiple CPUs in mind. bluhm