When I saw these issues last (march) I changed how I was doing stuff.
When I open a socket to the topology server, I do so now with no timeout
(TIPC_WAIT_FOREVER), and a pollfd() call with a timeout of 1ms.
The calling order tends to be:
Open socket to topology server
Get all open servers for specified range
Close topology socket.
For list of sockets
Send message to address (using an already open socket).
Yes, were using that instead of broadcast. Historical reasons, the first tipc
driver we used (10+ years ago) did no support broadcast very well on our
platform at the time.
From: Parthasarathy Bhuvaragan [mailto:[email protected]]
Sent: Thursday, November 30, 2017 00:32
To: Rune Torgersen <[email protected]>
Cc: [email protected]
Subject: Re: [tipc-discussion] FW: Kernel crash
Hi Rune,
What kind of timeout do you use in your subscription? Is it 0 or low value (<
10ms) ??
Apart from the fixes Jon has suggested, we still have issues as having low
timeouts are very racy.
/Partha
On Wed, Nov 29, 2017 at 3:05 PM, Rune Torgersen
<[email protected]<mailto:[email protected]>> wrote:
(Resending as I think it got lost somewhere).
A bug that I thought had been fixed is rearing its ugly head again in latest
Ubuntu 16.04 LTS kernel (4.4.0-97)
It is happening to me quite frequently (2-3 times a week).
The application where this happens uses lots of short lived sockets, and also
lots of short-lived connections to the topology server.
[151611.149711] BUG: unable to handle kernel NULL pointer dereference at
0000000000000028
[151611.149946] IP: [<ffffffffc046a0a2>] tipc_nametbl_unsubscribe+0x72/0x100
[tipc]
[151611.150069] PGD 0
[151611.150104] Oops: 0002 [#1] SMP
[151611.150160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel
intel_powerclamp coretemp kvm_intel gpio_ich input_leds joydev kvm irqbypass
i7core_edac edac_core serio_raw lpc_ich shpchp hpilo 8250_fintek ipmi_ssif
acpi_power_meter mac_hid lp parport ipmi_watchdog ipmi_si ipmi_devintf
ipmi_msghandler autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear amdkfd
amd_iommu_v2 radeon hid_generic i2c_algo_bit ttm drm_kms_helper syscopyarea
sysfillrect psmouse sysimgblt fb_sys_fops usbhid drm hid pata_acpi bnx2 hpsa
netxen_nic scsi_transport_sas fjes
[151611.151291] CPU: 1 PID: 14873 Comm: kworker/u64:2 Tainted: G I
4.4.0-97-generic #120-Ubuntu
[151611.151429] Hardware name: HP ProLiant DL360 G6, BIOS P64 05/15/2010
[151611.151547] Workqueue: tipc_rcv tipc_recv_work [tipc]
[151611.151631] task: ffff880213c98cc0 ti: ffff8802131b8000 task.ti:
ffff8802131b8000
[151611.151740] RIP: 0010:[<ffffffffc046a0a2>] [<ffffffffc046a0a2>]
tipc_nametbl_unsubscribe+0x72/0x100 [tipc]
[151611.151889] RSP: 0018:ffff88021f443e10 EFLAGS: 00010246
[151611.151967] RAX: ffff880213d87f80 RBX: ffff880213d87f00 RCX:
0000000000000020
[151611.152071] RDX: 000000000000000e RSI: 0000000000000067 RDI:
ffff8802101a9638
[151611.152176] RBP: ffff88021f443e30 R08: ffff88021f45a0c0 R09:
ffff880217003b00
[151611.152280] R10: ffff8800da043f40 R11: ffff880213c98d20 R12:
ffff8802101a9600
[151611.152385] R13: ffff8800d9fa9120 R14: ffff8802101a9638 R15:
ffff880213d87f00
[151611.152490] FS: 0000000000000000(0000) GS:ffff88021f440000(0000)
knlGS:0000000000000000
[151611.152631] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[151611.152730] CR2: 0000000000000028 CR3: 0000000001e0a000 CR4:
00000000000006e0
[151611.152835] Stack:
[151611.152865] ffff880213d87f00 ffff8800d9fa8000 ffff880213c499c8
ffffffffc0468ed0
[151611.152989] ffff88021f443e50 ffffffffc04688bf ffff880213d87f00
ffff880213c499c0
[151611.153113] ffff88021f443e78 ffffffffc0468f15 ffff88021f44ddc0
ffff880213d87f30
[151611.153237] Call Trace:
[151611.153275] <IRQ>
[151611.153311] [<ffffffffc0468ed0>] ? tipc_subscrb_shutdown_cb+0xc0/0xc0
[tipc]
[151611.153422] [<ffffffffc04688bf>] tipc_subscrp_delete+0x2f/0x80 [tipc]
[151611.153523] [<ffffffffc0468f15>] tipc_subscrp_timeout+0x45/0x70 [tipc]
[151611.153624] [<ffffffff810ecfc5>] call_timer_fn+0x35/0x120
[151611.153735] [<ffffffffc0468ed0>] ? tipc_subscrb_shutdown_cb+0xc0/0xc0
[tipc]
[151611.153846] [<ffffffff810ed97a>] run_timer_softirq+0x23a/0x2f0
[151611.153936] [<ffffffff81085dc1>] __do_softirq+0x101/0x290
[151611.154017] [<ffffffff810860c3>] irq_exit+0xa3/0xb0
[151611.154091] [<ffffffff818462a2>] smp_apic_timer_interrupt+0x42/0x50
[151611.154185] [<ffffffff81844562>] apic_timer_interrupt+0x82/0x90
[151611.154272] <EOI>
[151611.154305] [<ffffffff81843225>] ? _raw_spin_unlock_irqrestore+0x15/0x20
[151611.154407] [<ffffffff810eefef>] mod_timer+0x10f/0x240
[151611.154489] [<ffffffffc0468be0>] tipc_subscrb_rcv_cb+0x1c0/0x390 [tipc]
[151611.154591] [<ffffffffc04755e2>] tipc_receive_from_sock+0xc2/0x120 [tipc]
[151611.154695] [<ffffffffc047526b>] tipc_recv_work+0x2b/0x60 [tipc]
[151611.154809] [<ffffffff8109a635>] process_one_work+0x165/0x480
[151611.159008] [<ffffffff8109a99b>] worker_thread+0x4b/0x4c0
[151611.163372] [<ffffffff8109a950>] ? process_one_work+0x480/0x480
[151611.167622] [<ffffffff810a0c75>] kthread+0xe5/0x100
[151611.171755] [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
[151611.175858] [<ffffffff81843b8f>] ret_from_fork+0x3f/0x70
[151611.179999] [<ffffffff810a0b90>] ? kthread_create_on_node+0x1e0/0x1e0
[151611.184004] Code: ff ff 48 85 c0 74 56 4c 8d 70 38 49 89 c4 4c 89 f7 e8 43
92 3d c1 48 8b 8b 80 00 00 00 48 8b 93 88 00 00 00 48 8d 83 80 00 00 00 <48> 89
51 08 48 89 0a 48 89 83 80 00 00 00 48 89 83 88 00 00 00
[151611.192678] RIP [<ffffffffc046a0a2>] tipc_nametbl_unsubscribe+0x72/0x100
[tipc]
[151611.196733] RSP <ffff88021f443e10>
[151611.200739] CR2: 0000000000000028
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion