On Tue, Nov 23, 2021 at 06:54:59AM +0100, Hrvoje Popovski wrote:
> after 24 hours hitting sasyncd setup one box panic

Thanks for testing.

I have reduced my iked lifetime to about 10 seconds and got the
same panic on my new 8 core test machine.

ddb{2}> trace
db_enter() at db_enter+0x10
panic(ffffffff81eaa8e3) at panic+0xbf
pool_do_get(ffffffff821e64d8,9,ffff8000238b0524) at pool_do_get+0x35c
pool_get(ffffffff821e64d8,9) at pool_get+0x93
tdb_alloc(0) at tdb_alloc+0x62
reserve_spi(0,100,ffffffff,ffff800000d41254,ffff800000d41238,32,cbd2b00c6d3d3ec
d) at reserve_spi+0xfc
pfkeyv2_send(fffffd8739174900,ffff800001b3ba80,50) at pfkeyv2_send+0x19c6
pfkeyv2_output(fffffd80948cea00,fffffd8739174900,0,0) at pfkeyv2_output+0x8a
pfkeyv2_usrreq(fffffd8739174900,9,fffffd80948cea00,0,0,ffff8000238857b0) at pfk
eyv2_usrreq+0x1b0
sosend(fffffd8739174900,0,ffff8000238b0b60,0,0,0) at sosend+0x3a9
dofilewritev(ffff8000238857b0,3,ffff8000238b0b60,0,ffff8000238b0c60) at dofilew
ritev+0x14d
sys_writev(ffff8000238857b0,ffff8000238b0c00,ffff8000238b0c60) at sys_writev+0x
d2
syscall(ffff8000238b0cd0) at syscall+0x3a9
Xsyscall() at Xsyscall+0x128

> ddb{3}> show tdb

You have to add the pool item addr to this command.

I additionally have refcount tracing diff on my machine.  With that
I see this result:

ddb{2}> show panic
*cpu2: pool_do_get: tdb free list modified: page 0xffff800008010000; item addr 0
xffff80000801c998; offset 0x28=0xdeadbeee

ddb{2}> show tdb /f 0xffff80000801c998
tdb at 0xffff80000801c998
             hnext: 0x4c38c8f8ffb0cab5
             dnext: 0xff2c2a5ac7964242
             snext: 0xdeadbeefdeadbeef
...
     tdb_trace[78]: 350309838: refs 5 -1 cpu2 ipsec_forward_check:1081
     tdb_trace[79]: 350309839: refs 4 +1 cpu2 gettdb_dir:358
     tdb_trace[80]: 350309840: refs 5 -1 cpu2 ipsec_common_input:355
     tdb_trace[81]: 350309841: refs 4 +1 cpu2 gettdb_dir:358
     tdb_trace[82]: 350309842: refs 5 -1 cpu2 ipsec_forward_check:1081
     tdb_trace[83]: 350310888: refs 4 -1 cpu2 ipsp_spd_lookup:529
     tdb_trace[84]: 350816099: refs 3 -1 cpu0 tdb_soft_timeout:726
     tdb_trace[85]: 351266117: refs 2 +1 cpu2 gettdb_dir:358
     tdb_trace[86]: 351266118: refs 3 +0 cpu2 pfkeyv2_send:1599
     tdb_trace[87]: 351266119: refs 3 -1 cpu2 tdb_delete0:997
     tdb_trace[88]: 351271898: refs 2 -1 cpu2 pfkeyv2_send:2143
     tdb_trace[89]: 351300368: refs 1 +0 cpu0 tdb_timeout:688
     tdb_trace[90]: 351300369: refs 1 -1 cpu0 tdb_delete0:997
     tdb_trace[91]: 351300370: refs 3735928559 -1 cpu0 tdb_timeout:691

I will try mvs@ IPL_NET fix and think a bit more about the problem.

bluhm

Reply via email to