On 22.4.2021. 11:02, Alexander Bluhm wrote:
> On Thu, Apr 22, 2021 at 09:03:22AM +0200, Hrvoje Popovski wrote:
>> something like this:
>>
>> x3550m4# pappnaiannc:iicc :p:o  ppoolo_oolcla__ddcohoe__gg_eiettt::e m
>> _mmcbmualg2fkpilc2_::  chppeaag
>> gceke:  ee mmmbppttuyfyp
> 
> This was without my kernel lock around ARP bandage, right?


yes, yes ...


> 
>> ddb{9}> mach ddbcpu 0xa
>> Stopped at      x86_ipi_db+0x12:        leave
>> x86_ipi_db(ffff800021a2aff0) at x86_ipi_db+0x12
>> x86_ipi_handler() at x86_ipi_handler+0x80
>> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x23
>> pool_get(ffffffff8221e568,2) at pool_get+0x43
>> m_gethdr(2,1) at m_gethdr+0x3f
>> rtm_msg1(e,ffff800026e3cf70) at rtm_msg1+0x4c
>> rtm_ifchg(ffff8000005b3800) at rtm_ifchg+0x61
>> if_down(ffff8000005b3800) at if_down+0xa0
>> if_downall() at if_downall+0x5b
>> boot(104) at boot+0x99
>> reboot(104) at reboot+0x5b
>> panic(ffffffff81df855b) at panic+0x132
>> pool_do_get(ffffffff8221ebc8,2,ffff800026e3d294) at pool_do_get+0x309
>> pool_get(ffffffff8221ebc8,2) at pool_get+0x95
>> end trace frame: 0xffff800026e3d340, count: 0
>>
>> ddb{10}> mach ddbcpu 0xb
>> Stopped at      db_enter+0x10:  popq    %rbp
>> db_enter() at db_enter+0x10
>> panic(ffffffff81df855b) at panic+0x12a
>> pool_do_get(ffffffff8221e568,2,ffff800026e43294) at pool_do_get+0x309
>> pool_get(ffffffff8221e568,2) at pool_get+0x95
>> m_clget(0,2,802) at m_clget+0xdd
>> ixgbe_get_buf(ffff80000015c0e8,e) at ixgbe_get_buf+0xa3
>> ixgbe_rxfill(ffff80000015c0e8) at ixgbe_rxfill+0xae
>> ixgbe_queue_intr(ffff80000011ac40) at ixgbe_queue_intr+0x4f
>> intr_handler(ffff800026e434b0,ffff8000000cd700) at intr_handler+0x6e
>> Xintr_ioapic_edge4_untramp() at Xintr_ioapic_edge4_untramp+0x18f
>> acpicpu_idle() at acpicpu_idle+0x1ea
>> sched_idle(ffff800021a33ff0) at sched_idle+0x27e
>> end trace frame: 0x0, count: 3
> 
> Two processors 10 and 11 in pool get.
> 
> CPU 10 does pool_get, panic, boot, pool_get again.
> CPU 11 was the one that originally stopped in ddb.
> 
> Did you enter boot reboot before doing mach ddbcpu 0xa?

nope... is doing that ever useful?


> Or how did we get the boot sequence in this trace?
> 
> Can it be that both CPU paniced simultaeously?  The mangled massage
> indicates this.  Then cpu 10 saw that cpu 11 already paniced to ddb
> and tried to reboot.  There it paniced again and got stuck in a
> recursive call to pool_get().
> 
> The if (db_panic) in the panic() function was not written with
> simultaneous panics on multiple CPUs in mind.


if you want i'll try to reproduce in on other boxes..
maybe i can trigger it here easily because of 2 sockets ?

Reply via email to