Re: [tipc-discussion] Fwd: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

maloy Sat, 16 May 2020 19:54:24 -0700
Hi Hoang,After more pondering, I realize that my suggestion below was not a 
good solution either. In particular, we can not have a transition 
rcast-to-mcast (bulk-to-publ/withdraw) without sending sync to all 
destinations, something we want to avoid.I think our problem is that we (well, 
me...) are trying to solve two different problems at the same time, using a 
mechanism that was only made for of them. So, what if we separate the problems 
again?1) We send all bulk updates to all nodes, irrespective of whether they 
support broadcast binding table update or not, as mandatory rcast. No 
bcast/rcast sync needs to be involved, and I don't think it is when we use 
'mandatory'. tipc_node_bcast() can be eliminated.2) When we need to send out 
publish/withdraw, we just continue to send those as mandatory rcast to the 
nodes not supporting binding table bcast. I believe our current 
tipc_mcast_xmit(), given a list of legacy nodes, should do the trick even 
here.3) To those nodes supporting broadcast update we do send all 
publish/withdraw as mandatory broadcast.Now it remains to solve the 
synchronization between 1) and 3) type messages. This we have discussed before, 
and we know the solution is a "NOT_LAST" bit.This requires a slightly smarter 
tipc_namedistr_rcv() function which can sort between non_seq and unicast 
messages, but also refuses to dequeue any messages from the namedq until the 
'not_last' bit is zero in an arrived bulk message. But no new synchronization 
queue should be needed.I think we now have a viable solution that solves all 
our problems without too much new code. Regards///jonSent from my Samsung 
Galaxy smartphone.
-------- Original message --------From: Jon Maloy <[email protected]> Date: 
2020-05-12  17:21  (GMT-05:00) To: [email protected], 
[email protected], Xin Long <[email protected]>, Ying Xue 
<[email protected]>, Shuang Li <[email protected]> Subject: 
[tipc-discussion] Fwd: Re: FW: [PATCH 2/2] tipc: update a binding
  service via broadcast cc to tipc-discussion and others.///jon-------- 
Forwarded Message --------Subject:      Re: FW: [PATCH 2/2] tipc: update a 
binding service via broadcastDate:   Tue, 12 May 2020 14:38:29 -0400From:    
Jon Maloy <[email protected]>To:        Hoang Huu Le 
<[email protected]>, [email protected] <[email protected]>Hi Hoang,I 
had to re-read the discussion we had back in November to refresh my memory 
about this.To me v2 looks like a better base for further development than the 
most recent version,so I am sorry to have wasted your time here. I am still 
convinced that the mcast/rcast synchprotocol we developed can solve almost all 
our problems if we understand to use it right.What about the following:Sending 
side:- Even the initial bulk is sent via tipc_mxast_xmit(), but with only one   
destination node  and as "mandatory rcast". The broadcast link may   still 
choose to send it as bcast in case rcast has been manually disabled   by the 
user, but the surplus messages will be dropped on the   non-destination nodes. 
(In case of legacy nodes this is certain, in the   case of newer nodes we must 
make sure that it happens by setting   the 'destnode' field in the header and 
always check for this in   tipc_named_rcv().)  I think this is a weird and rare 
case where we   can accept the extra overhead of broadcast. Under all normal   
conditions rcast will be used here.- Regular PUBLISH/WITHDRAW messages are sent 
as "mandatory bcast",   or if some nodes don't not support TIPC_MCAST_RBCTL as 
"mandatory rcast".   The broadcast link may still choose to send it as rcast, 
but this is nothing   we need to worry about as long as the syncronization 
mechanism is   active. Under all normal conditions bcast will be selected when 
possible,   since  all nodes are destinations.Receiving side:- We need a 
deferred queue, but last time i missed that we don't need   one per node, it is 
sufficient to have a global one, as you were suggesting.   So to me the code is 
ok here.- As mentioned earlier, tipc_named_rcv() only accepts messages where   
destnode is "self" (bulk) or zero (publish/withdraw).The point with all this is 
that all messages will now be subject to the  synchronization mechanism, so the 
bulk messages areguaranteed to be delivered in order and ahead of 
individualpublications/withdraws, and even topology changes will almostnever 
lead to change of sending method and need for synchronization.We must make one 
change to the broadcast protocol though: even"mandatory" messages must be made 
subject to the synchronizationmechanism, something that doesn't seem to be the 
case now.("Or maybe we don't need to set "mandatory" at all? This needs to   be 
checked.)The advantage of this method is simplicity.- We don't need any 
separate synchronization mechanism for   NAME_DISTR messages at all, we just 
use what is already there.The disadvantage is that there will be a lot of SYN 
messages sentif we are not careful.- Every time a new node comes up, the other 
nodes will send it a   a bcast SYN before they send the bulk, because they 
believe they have   just switched  from bcast (normal mode)  to rcast (bulk 
mode).   This one is in reality unnecessary, since we can be sure that   the 
new node has not been sent any previous bcast that needs   to be synched with.- 
At the first publication/withdraw sent after the bulk the neighbor   nodes will 
 send an rcast  SYN to all other nodes, because they just   "switched" back  
from rcast to bcast.   In this case we really need to send an rcast SYN to the 
new node   that came up, since this is the only one where there is risk of   a 
race. This message does in reality serve as "end-of-bulk" message   but is 
handled like any other SYN by the reciving tipc_mcast_filter_rcv()   So, to 
avoid this the broadcast protocol needs to be able to recognize   NAME_DISTR 
messages and treat then slightly different from the others,   or it needs to be 
told what to do by the name_distr.c code via the API.   Maybe a couple of new 
fields in struct tipc_mcast_method?What do you think?Regards///jonOn 5/12/20 
6:22 AM, Hoang Huu Le wrote:> Just forward the patch I mentioned.>> 
-----Original Message-----> From: Hoang Le <[email protected]>> Sent: 
Tuesday, November 19, 2019 5:01 PM> To: [email protected]; 
[email protected]; [email protected]> Subject: [PATCH 2/2] tipc: update a 
binding service via broadcast>> Currently, updating binding table (add service 
binding to> name table/withdraw a service binding) is being sent over 
replicast.> However, if we are scaling up clusters to > 100 nodes/containers 
this> method is less affection because of looping through nodes in a cluster 
one> by one.>> It is worth to use broadcast to update a binding service. Then 
binding> table updates in all nodes for one shot.>> The mechanism is backward 
compatible as sync message slient dropped.>> v2: resolve synchronization 
problem when switching from unicast to> broadcast>> Signed-off-by: Hoang Le 
<[email protected]>> ---> net/tipc/bcast.c | 3 ++-> net/tipc/link.c | 6 
++++++> net/tipc/name_table.c | 33 ++++++++++++++++++++++++++++++---> 
net/tipc/name_table.h | 4 ++++> net/tipc/node.c | 32 
++++++++++++++++++++++++++++++++> net/tipc/node.h | 2 ++> 6 files changed, 76 
insertions(+), 4 deletions(-)>> diff --git a/net/tipc/bcast.c 
b/net/tipc/bcast.c> index e06f05d55534..44ed481fec47 100644> --- 
a/net/tipc/bcast.c> +++ b/net/tipc/bcast.c> @@ -324,7 +324,8 @@ static int 
tipc_mcast_send_sync(struct net *net, > struct sk_buff *skb,> hdr = 
buf_msg(skb);> if (msg_user(hdr) == MSG_FRAGMENTER)> hdr = msg_inner_hdr(hdr);> 
- if (msg_type(hdr) != TIPC_MCAST_MSG)> + if (msg_user(hdr) != NAME_DISTRIBUTOR 
&&> + msg_type(hdr) != TIPC_MCAST_MSG)> return 0;> /* Allocate dummy message 
*/> diff --git a/net/tipc/link.c b/net/tipc/link.c> index 
fb72031228c9..a2e9a64d5a0f 100644> --- a/net/tipc/link.c> +++ 
b/net/tipc/link.c> @@ -1190,6 +1190,8 @@ static bool tipc_data_input(struct 
tipc_link *l, > struct sk_buff *skb,> struct sk_buff_head *inputq)> {> struct 
sk_buff_head *mc_inputq = l->bc_rcvlink->inputq;> + struct name_table *nt = 
tipc_name_table(l->net);> + struct sk_buff_head *defnq = &nt->defer_namedq;> 
struct tipc_msg *hdr = buf_msg(skb);> switch (msg_user(hdr)) {> @@ -1211,6 
+1213,10 @@ static bool tipc_data_input(struct tipc_link > *l, struct sk_buff 
*skb,> case NAME_DISTRIBUTOR:> l->bc_rcvlink->state = LINK_ESTABLISHED;> 
skb_queue_tail(l->namedq, skb);> +> + spin_lock_bh(&defnq->lock);> + 
tipc_mcast_filter_msg(l->net, defnq, l->namedq);> + 
spin_unlock_bh(&defnq->lock);> return true;> case MSG_BUNDLER:> case 
TUNNEL_PROTOCOL:> diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c> 
index 66a65c2cdb23..593dcd11357f 100644> --- a/net/tipc/name_table.c> +++ 
b/net/tipc/name_table.c> @@ -615,9 +615,11 @@ struct publication 
*tipc_nametbl_publish(struct > net *net, u32 type, u32 lower,> struct tipc_net 
*tn = tipc_net(net);> struct publication *p = NULL;> struct sk_buff *skb = 
NULL;> + bool rcast;> spin_lock_bh(&tn->nametbl_lock);> + rcast = nt->rcast;> 
if (nt->local_publ_count >= TIPC_MAX_PUBL) {> pr_warn("Bind failed, max limit 
%u reached\n", TIPC_MAX_PUBL);> goto exit;> @@ -632,8 +634,18 @@ struct 
publication *tipc_nametbl_publish(struct > net *net, u32 type, u32 lower,> 
exit:> spin_unlock_bh(&tn->nametbl_lock);> - if (skb)> - 
tipc_node_broadcast(net, skb);> + if (skb) {> + /* Use broadcast if all nodes 
support broadcast NAME_DISTR */> + if (tipc_net(net)->capabilities & 
TIPC_MCAST_RBCTL) {> + tipc_node_broadcast_named_publish(net, skb, &rcast);> + 
spin_lock_bh(&tn->nametbl_lock);> + nt->rcast = rcast;> + 
spin_unlock_bh(&tn->nametbl_lock);> + } else {> + /* Otherwise, be backwards 
compatible */> + tipc_node_broadcast(net, skb);> + }> + }> return p;> }> @@ 
-648,8 +660,10 @@ int tipc_nametbl_withdraw(struct net *net, u32 > type, u32 
lower,> u32 self = tipc_own_addr(net);> struct sk_buff *skb = NULL;> struct 
publication *p;> + bool rcast;> spin_lock_bh(&tn->nametbl_lock);> + rcast = 
nt->rcast;> p = tipc_nametbl_remove_publ(net, type, lower, upper, self, key);> 
if (p) {> @@ -664,7 +678,16 @@ int tipc_nametbl_withdraw(struct net *net, u32 > 
type, u32 lower,> spin_unlock_bh(&tn->nametbl_lock);> if (skb) {> - 
tipc_node_broadcast(net, skb);> + /* Use broadcast if all nodes support 
broadcast NAME_DISTR */> + if (tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) 
{> + tipc_node_broadcast_named_publish(net, skb, &rcast);> + 
spin_lock_bh(&tn->nametbl_lock);> + nt->rcast = rcast;> + 
spin_unlock_bh(&tn->nametbl_lock);> + } else {> + /* Otherwise, be backwards 
compatible */> + tipc_node_broadcast(net, skb);> + }> return 1;> }> return 0;> 
@@ -746,6 +769,9 @@ int tipc_nametbl_init(struct net *net)> 
INIT_LIST_HEAD(&nt->cluster_scope);> rwlock_init(&nt->cluster_scope_lock);> 
tn->nametbl = nt;> + /* 'bulk' updated messages via unicast */> + nt->rcast = 
true;> + skb_queue_head_init(&nt->defer_namedq);> 
spin_lock_init(&tn->nametbl_lock);> return 0;> }> @@ -784,6 +810,7 @@ void 
tipc_nametbl_stop(struct net *net)> * publications, then release the name 
table> */> spin_lock_bh(&tn->nametbl_lock);> + 
skb_queue_purge(&nt->defer_namedq);> for (i = 0; i < TIPC_NAMETBL_SIZE; i++) {> 
if (hlist_empty(&nt->services[i]))> continue;> diff --git 
a/net/tipc/name_table.h b/net/tipc/name_table.h> index 
f79066334cc8..b8cdf2a29d48 100644> --- a/net/tipc/name_table.h> +++ 
b/net/tipc/name_table.h> @@ -95,6 +95,8 @@ struct publication {> * - used by 
name_distr to send bulk updates to new nodes> * - used by name_distr during 
re-init of name table> * @local_publ_count: number of publications issued by 
this node> + * @defer_namedq: temporarily queue for 'synching' update> + * 
@rcast: previous method used to publish/withdraw a service> */> struct 
name_table {> struct hlist_head services[TIPC_NAMETBL_SIZE];> @@ -102,6 +104,8 
@@ struct name_table {> struct list_head cluster_scope;> rwlock_t 
cluster_scope_lock;> u32 local_publ_count;> + struct sk_buff_head 
defer_namedq;> + bool rcast;> };> int tipc_nl_name_table_dump(struct sk_buff 
*skb, struct > netlink_callback *cb);> diff --git a/net/tipc/node.c 
b/net/tipc/node.c> index aaf595613e6e..b058647fa78b 100644> --- 
a/net/tipc/node.c> +++ b/net/tipc/node.c> @@ -2981,3 +2981,35 @@ void 
tipc_node_pre_cleanup_net(struct net > *exit_net)> }> rcu_read_unlock();> }> +> 
+int tipc_node_broadcast_named_publish(struct net *net, struct sk_buff > *skb,> 
+ bool *rcast)> +{> + struct tipc_mc_method method = {.rcast = *rcast};> + 
struct sk_buff_head xmitq;> + struct tipc_nlist dests;> + struct tipc_node *n;> 
+ u16 cong_link_cnt;> + int rc = 0;> +> + __skb_queue_head_init(&xmitq);> + 
__skb_queue_tail(&xmitq, skb);> +> + tipc_nlist_init(&dests, 
tipc_own_addr(net));> + rcu_read_lock();> + list_for_each_entry_rcu(n, 
tipc_nodes(net), list) {> + if (in_own_node(net, n->addr))> + continue;> + if 
(!node_is_up(n))> + continue;> + tipc_nlist_add(&dests, n->addr);> + }> + 
rcu_read_unlock();> +> + rc = tipc_mcast_xmit(net, &xmitq, &method, &dests, 
&cong_link_cnt);> + *rcast = method.rcast;> +> + tipc_nlist_purge(&dests);> + 
__skb_queue_purge(&xmitq);> + return rc;> +}> diff --git a/net/tipc/node.h 
b/net/tipc/node.h> index a6803b449a2c..d7d19f9932b1 100644> --- 
a/net/tipc/node.h> +++ b/net/tipc/node.h> @@ -124,4 +124,6 @@ int 
tipc_nl_node_set_key(struct sk_buff *skb, > struct genl_info *info);> int 
tipc_nl_node_flush_key(struct sk_buff *skb, struct genl_info *info);> #endif> 
void tipc_node_pre_cleanup_net(struct net *exit_net);> +int 
tipc_node_broadcast_named_publish(struct net *net, struct sk_buff > *skb,> + 
bool *rcast);> 
#endif_______________________________________________tipc-discussion mailing 
[email protected]https://lists.sourceforge.net/lists/listinfo/tipc-discussion
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] Fwd: Re: FW: [PATCH 2/2] tipc: update a binding service via broadcast

Reply via email to