Hi Rune, What kind of timeout do you use in your subscription? Is it 0 or low value (< 10ms) ?? Apart from the fixes Jon has suggested, we still have issues as having low timeouts are very racy.
/Partha On Wed, Nov 29, 2017 at 3:05 PM, Rune Torgersen <[email protected]> wrote: > (Resending as I think it got lost somewhere). > > A bug that I thought had been fixed is rearing its ugly head again in > latest Ubuntu 16.04 LTS kernel (4.4.0-97) > It is happening to me quite frequently (2-3 times a week). > > The application where this happens uses lots of short lived sockets, and > also lots of short-lived connections to the topology server. > > [151611.149711] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000028 > [151611.149946] IP: [<ffffffffc046a0a2>] tipc_nametbl_unsubscribe+0x72/0x100 > [tipc] > [151611.150069] PGD 0 > [151611.150104] Oops: 0002 [#1] SMP > [151611.150160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel > intel_powerclamp coretemp kvm_intel gpio_ich input_leds joydev kvm > irqbypass i7core_edac edac_core serio_raw lpc_ich shpchp hpilo 8250_fintek > ipmi_ssif acpi_power_meter mac_hid lp parport ipmi_watchdog ipmi_si > ipmi_devintf ipmi_msghandler autofs4 raid10 raid456 async_raid6_recov > async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 > multipath linear amdkfd amd_iommu_v2 radeon hid_generic i2c_algo_bit ttm > drm_kms_helper syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops usbhid > drm hid pata_acpi bnx2 hpsa netxen_nic scsi_transport_sas fjes > [151611.151291] CPU: 1 PID: 14873 Comm: kworker/u64:2 Tainted: G > I 4.4.0-97-generic #120-Ubuntu > [151611.151429] Hardware name: HP ProLiant DL360 G6, BIOS P64 05/15/2010 > [151611.151547] Workqueue: tipc_rcv tipc_recv_work [tipc] > [151611.151631] task: ffff880213c98cc0 ti: ffff8802131b8000 task.ti: > ffff8802131b8000 > [151611.151740] RIP: 0010:[<ffffffffc046a0a2>] [<ffffffffc046a0a2>] > tipc_nametbl_unsubscribe+0x72/0x100 [tipc] > [151611.151889] RSP: 0018:ffff88021f443e10 EFLAGS: 00010246 > [151611.151967] RAX: ffff880213d87f80 RBX: ffff880213d87f00 RCX: > 0000000000000020 > [151611.152071] RDX: 000000000000000e RSI: 0000000000000067 RDI: > ffff8802101a9638 > [151611.152176] RBP: ffff88021f443e30 R08: ffff88021f45a0c0 R09: > ffff880217003b00 > [151611.152280] R10: ffff8800da043f40 R11: ffff880213c98d20 R12: > ffff8802101a9600 > [151611.152385] R13: ffff8800d9fa9120 R14: ffff8802101a9638 R15: > ffff880213d87f00 > [151611.152490] FS: 0000000000000000(0000) GS:ffff88021f440000(0000) > knlGS:0000000000000000 > [151611.152631] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [151611.152730] CR2: 0000000000000028 CR3: 0000000001e0a000 CR4: > 00000000000006e0 > [151611.152835] Stack: > [151611.152865] ffff880213d87f00 ffff8800d9fa8000 ffff880213c499c8 > ffffffffc0468ed0 > [151611.152989] ffff88021f443e50 ffffffffc04688bf ffff880213d87f00 > ffff880213c499c0 > [151611.153113] ffff88021f443e78 ffffffffc0468f15 ffff88021f44ddc0 > ffff880213d87f30 > [151611.153237] Call Trace: > [151611.153275] <IRQ> > [151611.153311] [<ffffffffc0468ed0>] ? tipc_subscrb_shutdown_cb+0xc0/0xc0 > [tipc] > [151611.153422] [<ffffffffc04688bf>] tipc_subscrp_delete+0x2f/0x80 [tipc] > [151611.153523] [<ffffffffc0468f15>] tipc_subscrp_timeout+0x45/0x70 [tipc] > [151611.153624] [<ffffffff810ecfc5>] call_timer_fn+0x35/0x120 > [151611.153735] [<ffffffffc0468ed0>] ? tipc_subscrb_shutdown_cb+0xc0/0xc0 > [tipc] > [151611.153846] [<ffffffff810ed97a>] run_timer_softirq+0x23a/0x2f0 > [151611.153936] [<ffffffff81085dc1>] __do_softirq+0x101/0x290 > [151611.154017] [<ffffffff810860c3>] irq_exit+0xa3/0xb0 > [151611.154091] [<ffffffff818462a2>] smp_apic_timer_interrupt+0x42/0x50 > [151611.154185] [<ffffffff81844562>] apic_timer_interrupt+0x82/0x90 > [151611.154272] <EOI> > [151611.154305] [<ffffffff81843225>] ? _raw_spin_unlock_irqrestore+ > 0x15/0x20 > [151611.154407] [<ffffffff810eefef>] mod_timer+0x10f/0x240 > [151611.154489] [<ffffffffc0468be0>] tipc_subscrb_rcv_cb+0x1c0/0x390 > [tipc] > [151611.154591] [<ffffffffc04755e2>] tipc_receive_from_sock+0xc2/0x120 > [tipc] > [151611.154695] [<ffffffffc047526b>] tipc_recv_work+0x2b/0x60 [tipc] > [151611.154809] [<ffffffff8109a635>] process_one_work+0x165/0x480 > [151611.159008] [<ffffffff8109a99b>] worker_thread+0x4b/0x4c0 > [151611.163372] [<ffffffff8109a950>] ? process_one_work+0x480/0x480 > [151611.167622] [<ffffffff810a0c75>] kthread+0xe5/0x100 > [151611.171755] [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > [151611.175858] [<ffffffff81843b8f>] ret_from_fork+0x3f/0x70 > [151611.179999] [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0 > [151611.184004] Code: ff ff 48 85 c0 74 56 4c 8d 70 38 49 89 c4 4c 89 f7 > e8 43 92 3d c1 48 8b 8b 80 00 00 00 48 8b 93 88 00 00 00 48 8d 83 80 00 00 > 00 <48> 89 51 08 48 89 0a 48 89 83 80 00 00 00 48 89 83 88 00 00 00 > [151611.192678] RIP [<ffffffffc046a0a2>] tipc_nametbl_unsubscribe+0x72/0x100 > [tipc] > [151611.196733] RSP <ffff88021f443e10> > [151611.200739] CR2: 0000000000000028 > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > tipc-discussion mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ tipc-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/tipc-discussion
