On 01/18/2017 11:30 AM, Xue, Ying wrote:
> Hi John,
>
>
>
> Thank you for the testing.
>
>
>
> I think your suggestion is reasonable. But we need to find out its exact
> scenario. Regarding the following message, after one object refcnt is
> decreased to zero by one thread, another thread tries to increment its
> refcnt, which means that we have a race condition. So we must solve the
> race condition first before we adopt your advice.
>
>
I have fixed this in Patch #3 of v6 series that i have sent. The root 
case is:

We have another refcount warning when we grab the refcount in 
tipc_conn_lookup() for connections with flag with CF_CONNECTED not set.
This usually occurs at shutdown when the we stop the topology server and 
start reset CF_CONNECTED flag, while withdrawing TIPC_CFG_SRV.
This withdrawal of publication triggers a withdraw message to 
subscribers but the connection flag is not CF_CONNECTED.

This issue is seen now, as I moved the connection deletion from the 
topology server shutdown procedure to the connection refcount cleanup.
Thus the lifetime of the connection is longer and there might be users
who can find connections in the idr_list which have the connection flag 
is not CF_CONNECTED.

After this change the refcount handling at receive and send look identical.

/Partha

>
> Regards,
>
> Ying
>
> *From:*John Thompson [mailto:thompa....@gmail.com]
> *Sent:* Wednesday, January 18, 2017 12:23 PM
> *To:* Parthasarathy Bhuvaragan
> *Cc:* tipc-discussion@lists.sourceforge.net; Jon Maloy; Xue, Ying
> *Subject:* Re: [PATCH net-next v4 0/6] topology server fixes for
> nametable soft lockup
>
>
>
> Hi Partha,
>
>
>
> Thanks for the new patches.  I have tested them and had a kernel warning
> as per below.  This is running on a SMP system with 4 cores.
>
> The warning reads as though one thread has gone to free the item while
> another thread has gotten a reference to it.
>
> The suggestion is to use kref_get_unless_zero() instead of kref_get().
>
>
>
> ------------[ cut here ]------------
>
> WARNING: at /home/johnt/views/main/linux/include/linux/kref.h:46
>
> Modules linked in: tipc jitterentropy_rng echainiv drbg platform_driver(O)
>
> CPU: 1 PID: 1527 Comm: intrTokenReset Tainted: P           O
>
> task: a52c8600 ti: a4c98000 task.ti: a4c98000
>
> NIP: c161809c LR: c1618120 CTR: 80695270
>
> REGS: a4c99b30 TRAP: 0700   Tainted: P           O
>
> MSR: 00029002 <CE,EE,ME>  CR: 28002482  XER: 00000000
>
>
>
> GPR00: c1618694 a4c99be0 a52c8600 a4c4e840 a2763aa0 00000000 a4ab623c
> 00000030
>
> GPR08: 01001005 00000001 c1620000 00000005 80695270 107d12f0 6c23e000
> 00000009
>
> GPR16: 00000000 808e0000 808e0000 00040100 00040006 807c9428 808f0000
> a2e39ac0
>
> GPR24: 808dba00 01001005 a4ab623c a50fb300 00000030 a50fb31c a50fb300
> a4c4e840
>
> NIP [c161809c] tipc_nl_publ_dump+0x93c/0xf10 [tipc]
>
> LR [c1618120] tipc_nl_publ_dump+0x9c0/0xf10 [tipc]
>
> Call Trace:
>
> [a4c99be0] [800b9e2c] free_pages_prepare+0x18c/0x2a0 (unreliable)
>
> [a4c99c00] [c1618694] tipc_conn_sendmsg+0x24/0x150 [tipc]
>
> [a4c99c30] [c160ad5c] tipc_subscrp_report_overlap+0xbc/0xd0 [tipc]
>
> [a4c99c70] [c160b31c] tipc_topsrv_stop+0x45c/0x4f0 [tipc]
>
> [a4c99ca0] [c160b848] tipc_nametbl_remove_publ+0x58/0x110 [tipc]
>
> [a4c99cd0] [c160bd08] tipc_nametbl_withdraw+0x68/0x140 [tipc]
>
> [a4c99d00] [c1613ce4] tipc_nl_node_dump_link+0x1904/0x45d0 [tipc]
>
> [a4c99d30] [c16148f8] tipc_nl_node_dump_link+0x2518/0x45d0 [tipc]
>
> [a4c99d70] [804f5a40] sock_release+0x30/0xf0
>
> [a4c99d80] [804f5b14] sock_close+0x14/0x30
>
> [a4c99d90] [80105844] __fput+0x94/0x200
>
> [a4c99db0] [8003dca4] task_work_run+0xd4/0x100
>
> [a4c99dd0] [80023620] do_exit+0x280/0x980
>
> [a4c99e10] [80024c48] do_group_exit+0x48/0xb0
>
> [a4c99e30] [80030344] get_signal+0x244/0x4f0
>
> [a4c99e80] [80007734] do_signal+0x34/0x1c0
>
> [a4c99f30] [800079a8] do_notify_resume+0x68/0x80
>
> [a4c99f40] [8000fa1c] do_user_signal+0x74/0xc4
>
> --- interrupt: c00 at 0xf5b0cfc
>
>     LR = 0xf5b0ce8
>
> Instruction dump:
>
> 4bffffa8 7c0004ac 7d201828 31290001 7d20192d 40a2fff4 7c0004ac 2f890001
>
> 4dbd0020 3d40c162 892ac11e 69290001 <0f090000> 2f890000 4dbe0020 39200001
>
> ---[ end trace 544bc785f9258108 ]---
>
>
>
> JT
>
>
>
>
>
> On Thu, Jan 12, 2017 at 1:19 AM, Parthasarathy Bhuvaragan
> <parthasarathy.bhuvara...@ericsson.com
> <mailto:parthasarathy.bhuvara...@ericsson.com>> wrote:
>
> In this series, we revert the commit 333f796235a527 ("tipc: fix a
> race condition leading to subscriber refcnt bug") and provide an
> alternate solution to fix the race conditions in commits 2-4.
>
> We have to do this as the above commit introduced a nametbl soft
> lockup at module exit as described by patch#4.
>
> ---
> v3: introduce cleanup workqueue to fix nametbl soft lockup.
>
> Parthasarathy Bhuvaragan (6):
>   tipc: fix nametbl_lock soft lockup at node/link events
>   tipc: add subscription refcount
>   tipc: fix connection refcount error
>   tipc: fix nametbl_lock soft lockup at module exit
>   tipc: ignore requests when the connection state is not CONNECTED
>   tipc: fix cleanup at module unload
>
>  net/tipc/node.c   |   9 ++++-
>  net/tipc/server.c |  44 +++++++++------------
>  net/tipc/subscr.c | 116
> ++++++++++++++++++++++++++++++++----------------------
>  net/tipc/subscr.h |   1 +
>  4 files changed, 94 insertions(+), 76 deletions(-)
>
> --
> 2.1.4
>
>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to