On Tue, Nov 23, 2021 at 06:54:59AM +0100, Hrvoje Popovski wrote: > after 24 hours hitting sasyncd setup one box panic
Thanks for testing. I have reduced my iked lifetime to about 10 seconds and got the same panic on my new 8 core test machine. ddb{2}> trace db_enter() at db_enter+0x10 panic(ffffffff81eaa8e3) at panic+0xbf pool_do_get(ffffffff821e64d8,9,ffff8000238b0524) at pool_do_get+0x35c pool_get(ffffffff821e64d8,9) at pool_get+0x93 tdb_alloc(0) at tdb_alloc+0x62 reserve_spi(0,100,ffffffff,ffff800000d41254,ffff800000d41238,32,cbd2b00c6d3d3ec d) at reserve_spi+0xfc pfkeyv2_send(fffffd8739174900,ffff800001b3ba80,50) at pfkeyv2_send+0x19c6 pfkeyv2_output(fffffd80948cea00,fffffd8739174900,0,0) at pfkeyv2_output+0x8a pfkeyv2_usrreq(fffffd8739174900,9,fffffd80948cea00,0,0,ffff8000238857b0) at pfk eyv2_usrreq+0x1b0 sosend(fffffd8739174900,0,ffff8000238b0b60,0,0,0) at sosend+0x3a9 dofilewritev(ffff8000238857b0,3,ffff8000238b0b60,0,ffff8000238b0c60) at dofilew ritev+0x14d sys_writev(ffff8000238857b0,ffff8000238b0c00,ffff8000238b0c60) at sys_writev+0x d2 syscall(ffff8000238b0cd0) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 > ddb{3}> show tdb You have to add the pool item addr to this command. I additionally have refcount tracing diff on my machine. With that I see this result: ddb{2}> show panic *cpu2: pool_do_get: tdb free list modified: page 0xffff800008010000; item addr 0 xffff80000801c998; offset 0x28=0xdeadbeee ddb{2}> show tdb /f 0xffff80000801c998 tdb at 0xffff80000801c998 hnext: 0x4c38c8f8ffb0cab5 dnext: 0xff2c2a5ac7964242 snext: 0xdeadbeefdeadbeef ... tdb_trace[78]: 350309838: refs 5 -1 cpu2 ipsec_forward_check:1081 tdb_trace[79]: 350309839: refs 4 +1 cpu2 gettdb_dir:358 tdb_trace[80]: 350309840: refs 5 -1 cpu2 ipsec_common_input:355 tdb_trace[81]: 350309841: refs 4 +1 cpu2 gettdb_dir:358 tdb_trace[82]: 350309842: refs 5 -1 cpu2 ipsec_forward_check:1081 tdb_trace[83]: 350310888: refs 4 -1 cpu2 ipsp_spd_lookup:529 tdb_trace[84]: 350816099: refs 3 -1 cpu0 tdb_soft_timeout:726 tdb_trace[85]: 351266117: refs 2 +1 cpu2 gettdb_dir:358 tdb_trace[86]: 351266118: refs 3 +0 cpu2 pfkeyv2_send:1599 tdb_trace[87]: 351266119: refs 3 -1 cpu2 tdb_delete0:997 tdb_trace[88]: 351271898: refs 2 -1 cpu2 pfkeyv2_send:2143 tdb_trace[89]: 351300368: refs 1 +0 cpu0 tdb_timeout:688 tdb_trace[90]: 351300369: refs 1 -1 cpu0 tdb_delete0:997 tdb_trace[91]: 351300370: refs 3735928559 -1 cpu0 tdb_timeout:691 I will try mvs@ IPL_NET fix and think a bit more about the problem. bluhm