On Thu, Apr 22, 2021 at 09:03:22AM +0200, Hrvoje Popovski wrote:
> something like this:
> 
> x3550m4# pappnaiannc:iicc :p:o  ppoolo_oolcla__ddcohoe__gg_eiettt::e m
> _mmcbmualg2fkpilc2_::  chppeaag
> gceke:  ee mmmbppttuyfyp

This was without my kernel lock around ARP bandage, right?

> ddb{9}> mach ddbcpu 0xa
> Stopped at      x86_ipi_db+0x12:        leave
> x86_ipi_db(ffff800021a2aff0) at x86_ipi_db+0x12
> x86_ipi_handler() at x86_ipi_handler+0x80
> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
> pool_get(ffffffff8221e568,2) at pool_get+0x43
> m_gethdr(2,1) at m_gethdr+0x3f
> rtm_msg1(e,ffff800026e3cf70) at rtm_msg1+0x4c
> rtm_ifchg(ffff8000005b3800) at rtm_ifchg+0x61
> if_down(ffff8000005b3800) at if_down+0xa0
> if_downall() at if_downall+0x5b
> boot(104) at boot+0x99
> reboot(104) at reboot+0x5b
> panic(ffffffff81df855b) at panic+0x132
> pool_do_get(ffffffff8221ebc8,2,ffff800026e3d294) at pool_do_get+0x309
> pool_get(ffffffff8221ebc8,2) at pool_get+0x95
> end trace frame: 0xffff800026e3d340, count: 0
> 
> ddb{10}> mach ddbcpu 0xb
> Stopped at      db_enter+0x10:  popq    %rbp
> db_enter() at db_enter+0x10
> panic(ffffffff81df855b) at panic+0x12a
> pool_do_get(ffffffff8221e568,2,ffff800026e43294) at pool_do_get+0x309
> pool_get(ffffffff8221e568,2) at pool_get+0x95
> m_clget(0,2,802) at m_clget+0xdd
> ixgbe_get_buf(ffff80000015c0e8,e) at ixgbe_get_buf+0xa3
> ixgbe_rxfill(ffff80000015c0e8) at ixgbe_rxfill+0xae
> ixgbe_queue_intr(ffff80000011ac40) at ixgbe_queue_intr+0x4f
> intr_handler(ffff800026e434b0,ffff8000000cd700) at intr_handler+0x6e
> Xintr_ioapic_edge4_untramp() at Xintr_ioapic_edge4_untramp+0x18f
> acpicpu_idle() at acpicpu_idle+0x1ea
> sched_idle(ffff800021a33ff0) at sched_idle+0x27e
> end trace frame: 0x0, count: 3

Two processors 10 and 11 in pool get.

CPU 10 does pool_get, panic, boot, pool_get again.
CPU 11 was the one that originally stopped in ddb.

Did you enter boot reboot before doing mach ddbcpu 0xa?
Or how did we get the boot sequence in this trace?

Can it be that both CPU paniced simultaeously?  The mangled massage
indicates this.  Then cpu 10 saw that cpu 11 already paniced to ddb
and tried to reboot.  There it paniced again and got stuck in a
recursive call to pool_get().

The if (db_panic) in the panic() function was not written with
simultaneous panics on multiple CPUs in mind.

bluhm

Reply via email to