Hi Jon,

I understood your concern.

I have checked the possibility of merging patch #1, #4 and #5 as one. However, 
just merging the three patch is insufficient, and at least #2 seems necessary 
too, otherwise, another deadlock still exists due to two premature 'return's in 
subcsrb_report_overlap(). Even if we merged them as one, it will lose my 
initial purpose of dividing the series as so small patches. Although each patch 
is made a small change, it's often related to a policy adjustment of locking or 
holding refcount. Moreover, as our locking policy associated with topserver 
becomes complex, I want to use the comments in each patch header to record what 
policy has been adjusted. In the future, the information can guide whether our 
changes comply with the adjusted policy or not. 

In fact, all changes contained in the series is not big. But if we merged them 
as one, all useful messages will be lost forever.

Additionally, "net-next" tree reaches 4.10-rc8, and "net" tree is 4.10-rc7 now. 
I saw today there was one developer who submitted a patch to net-next and David 
also accepted it. However, if John's testing proved the series is okay 
tomorrow, probably I can send the series to net-next tomorrow. Even for the 
worst case, we cannot submit the series until net-next is open again. But I 
have checked nobody would maintain 4.10 as a stable version. So even if there 
is a big long time gap, it seems not to cause a series issue.

Regards,
Ying

-----Original Message-----
From: Jon Maloy [mailto:jon.ma...@ericsson.com] 
Sent: Tuesday, February 21, 2017 7:12 PM
To: Xue, Ying; Parthasarathy Bhuvaragan; thompa....@gmail.com
Cc: tipc-discussion@lists.sourceforge.net
Subject: RE: [net 0/5] solve two deadlock issues

Hi Ying,
These are good design changes, that definitely should go in asap. However, I 
feel deeply uncomfortable with such a big change going into 'net', especially 
since our previous, exceptionally large, contribution now has turned out to 
have problems. I wonder if we could not get away with something simpler for 
'net'.

Looking closer at your series, it seems to me that only patches ## 1, 4, and 
the lock removal part of #5 are really needed to solve the problem we have at 
hand now. Why not merge those into one patch and deliver this to 'net', while 
reference count redesign parts can go into net-next ?

Regards
///jon


> -----Original Message-----
> From: Ying Xue [mailto:ying....@windriver.com]
> Sent: Monday, February 20, 2017 06:39 AM
> To: Jon Maloy <jon.ma...@ericsson.com>; Parthasarathy Bhuvaragan 
> <parthasarathy.bhuvara...@ericsson.com>; thompa....@gmail.com
> Cc: tipc-discussion@lists.sourceforge.net
> Subject: [net 0/5] solve two deadlock issues
> 
> Commit d094c4d5f5 ("tipc: add subscription refcount to avoid invalid
> delete") accidently introduce the following deadlock scenarios:
> 
>    CPU1:                             CPU2:
> ----------                     ----------------
> tipc_nametbl_publish
> spin_lock_bh(&tn->nametbl_lock)
> tipc_nametbl_insert_publ
> tipc_nameseq_insert_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>                                tipc_close_conn
>                                tipc_subscrb_release_cb
>                                tipc_subscrb_delete
>                                tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(&tn->nametbl_lock)
> <<grab nametbl_lock again>>
> 
>    CPU1:                              CPU2:
> ----------                     ----------------
> tipc_nametbl_stop
> spin_lock_bh(&tn->nametbl_lock)
> tipc_purge_publications
> tipc_nameseq_remove_publ
> tipc_subscrp_report_overlap
> tipc_subscrp_get
> tipc_subscrp_send_event
>                                tipc_close_conn
>                                tipc_subscrb_release_cb
>                                tipc_subscrb_delete
>                                tipc_subscrp_put
> tipc_subscrp_put
> tipc_subscrp_kref_release
> tipc_nametbl_unsubscribe
> spin_lock_bh(&tn->nametbl_lock)
> <<grab nametbl_lock again>>
> 
> The root cause of two deadlocks is that we have to hold nametbl lock 
> when subscription is freed in tipc_subscrp_kref_release(). In order to 
> eliminate the need of taking nametbl lock in 
> tipc_subscrp_kref_release(), the functions protected by nametbl lock 
> in tipc_subscrp_kref_release() are moved to other places step by step in the 
> series.
> 
> Ying Xue (5):
>   tipc: advance the time of deleting subscription from
>     subscriber->subscrp_list
>   tipc: adjust the policy of holding subscription kref
>   tipc: adjust policy that sub->timer holds subscription kref
>   tipc: advance the time of calling tipc_nametbl_unsubscribe
>   tipc: remove unnecessary increasement of subscription refcount
> 
>  net/tipc/name_table.c |  2 ++
>  net/tipc/subscr.c     | 32 ++++++++++++++------------------
>  net/tipc/subscr.h     |  3 +++
>  3 files changed, 19 insertions(+), 18 deletions(-)
> 
> --
> 2.7.4


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion

Reply via email to