Hi Howard,
This is the code executed when TIPC receives a NETDEV_CHANGE event:
switch (evt) {
| case NETDEV_CHANGE:
| | if (netif_carrier_ok(dev) && netif_oper_up(dev)) {
| | | test_and_set_bit_lock(0, &b->up);
| | | break;
| | }
| | fallthrough;
| case NETDEV_GOING_DOWN:
| | clear_bit_unlock(0, &b->up);
| | tipc_reset_bearer(net, b);
| | break;
| case NETDEV_UP:
| | test_and_set_bit_lock(0, &b->up);
| | break;
| case NETDEV_CHANGEMTU:
So, unless the bond interface really reports that it is going down TIPC
doesn't reset any links. And if it *does* report that it is going down,
what else can we do?
To me this looks more like a problem with the bond device rather than
with TIPC, but we might of course have misunderstood its expected behavior.
We will look into this.
On a different note, you could instead omit the bond interface and try
using dual TIPC links, which work in active-active mode and give better
performance.
Is that an option for you?
BR
Jon Maloy
On 11/19/20 11:36 PM, Howard Finer wrote:
I am trying to use TIPC (kernel version 4.19) over a bond device that is
configured for active-backup and arp monitoring for the slaves. If a slave
goes down, TIPC is receiving a netdev_change during the timeframe that the
bond device is working towards brining up the new slave. This causes TIPC
to disable the bearer, which in turn causes a temporary loss of
communication between the nodes.
Instrumentation of the bond and tipc drivers shows the following:
<6> 1 2020-11-19T23:58:33.111549+01:00 LABNBS5A kernel - - - [ 153.655776]
Enabled bearer <eth:bond0>, priority 10
<6> 1 2020-11-20T00:07:58.544040+01:00 LABNBS5A kernel - - - [ 718.799259]
bond0: bond_ab_arp_commit: BOND_LINK_DOWN: link status definitely down for
interface eth1, disabling it
<6> 1 2020-11-20T00:07:58.544063+01:00 LABNBS5A kernel - - - [ 718.799261]
bond0: bond_ab_arp_commit: do_failover, block netpoll_tx and call
select_active_slave
<6> 1 2020-11-20T00:07:58.544069+01:00 LABNBS5A kernel - - - [ 718.799263]
bond0: bond_select_active_slave: bond_find_best_slave returned NULL
<6> 1 2020-11-20T00:07:58.544072+01:00 LABNBS5A kernel - - - [ 718.799347]
bond0: bond_select_active_slave: now running without any active interface!
<6> 1 2020-11-20T00:07:58.544080+01:00 LABNBS5A kernel - - - [ 718.799349]
bond0: bond_ab_arp_commit: do_failover, returned from select_active_slave
and unblock netpoll tx
<6> 1 2020-11-20T00:07:58.544081+01:00 LABNBS5A kernel - - - [ 718.799611]
Resetting bearer <eth:bond0>
<6> 1 2020-11-20T00:07:58.655535+01:00 LABNBS5A kernel - - - [ 718.907245]
bond0: bond_ab_arp_commit: BOND_LINK_UP: link status definitely up for
interface eth0
<6> 1 2020-11-20T00:07:58.655545+01:00 LABNBS5A kernel - - - [ 718.907247]
bond0: bond_ab_arp_commit: do_failover, block netpoll_tx and call
select_active_slave
<6> 1 2020-11-20T00:07:58.655548+01:00 LABNBS5A kernel - - - [ 718.907248]
bond0: bond_select_active_slave: bond_find_best_slave returned slave eth0
<6> 1 2020-11-20T00:07:58.655559+01:00 LABNBS5A kernel - - - [ 718.907249]
bond0: making interface eth0 the new active one
<6> 1 2020-11-20T00:07:58.655562+01:00 LABNBS5A kernel - - - [ 718.907560]
bond0: bond_select_active_slave: first active interface up!
With arp based monitoring only 1 slave will be 'up'. When the active slave
goes down, the other slave needs to be brought up. During that timeframe we
see TIPC is resetting the bearer. That defeats the entire purpose of
using the bond device.
It seems that the handling of the netdev_change event for a active/backup
bond device is not correct. It needs to leave the bearer intact so that
when the backup slave is brought up the communication is properly restored
without any upper layer applications being aware that something happened at
the lower level.
Thanks,
Howard
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion
_______________________________________________
tipc-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tipc-discussion