Public bug reported:

The default gateway can vanish from the HA router namespace after
certain operations.

My setup:
Fedora 20
keepalived-1.2.13-1.fc20.x86_64
Network manager turned off.

I can reproduce this reliably on my system, but cannot reproduce this on
a RHEL 7 system. Even on that system, the issue manifests on its own, I
just can't reproduce it at will.

How I reproduce on my system:
Create an HA router
Set it as a gateway
Go to the master instance
Observe that the namespace has a default gateway
Add an internal interface (Make sure that the IP is 'lower' than the IP of the 
external interface, this is explained below)
Default gateway will no longer exist

Cause:
keepalived.conf has two sections for VIPs: virtual_ipaddress, and 
virtual_ipaddress_excluded. The difference is that any VIPs that go in the 
first section will be propagated on the wire, and any VIPs in the excluded 
section do not. Traditional configuration of keepalived places one VIP in the 
normal section, henceforth known as the 'primary VIP', and all other VIPs in 
the excluded section. Currently the keepalived manager does this by sorting the 
VIPs (Internal IPs, external SNAT IP, and all floating IPs), placing the lowest 
one (By string comparison) as the primary, and the rest of the VIPs in the 
excluded section: 
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/keepalived.py#L155

That code is ran, and keepalived.conf is built when ever a router is
updated. This means that the primary VIP can change on router updates.
As it turns out, after a conversation with a keepalived developer,
keepalived assumes that the order does not change (This is possibly a
keepalived bug, depending on your view on life, the ordering of the
stars when keepalived is executed and the wind speed in the Falkland
Islands in the past leap year). On my system, with the currently
installed keepalived version, whenever the primary VIP changes, the
default gateway (Present in the virtual_routes section of
keepalived.conf) is violently removed.

Possible solution:
Make sure that the primary VIP never changes. For example: Fabricate an IP per 
HA router cluster (Derived from the VRID?), add it as a VIP on the HA device, 
configure it as the primary VIP. I played around with a hacky variation of this 
solution and I could no longer reproduce the issue.

** Affects: neutron
     Importance: Undecided
     Assignee: Assaf Muller (amuller)
         Status: New


** Tags: juno-backport-potential l3-ha

** Changed in: neutron
     Assignee: (unassigned) => Assaf Muller (amuller)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1404945

Title:
  Default gateway can vanish from HA routers, destroying external
  connectivity for all VMs on that network

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  The default gateway can vanish from the HA router namespace after
  certain operations.

  My setup:
  Fedora 20
  keepalived-1.2.13-1.fc20.x86_64
  Network manager turned off.

  I can reproduce this reliably on my system, but cannot reproduce this
  on a RHEL 7 system. Even on that system, the issue manifests on its
  own, I just can't reproduce it at will.

  How I reproduce on my system:
  Create an HA router
  Set it as a gateway
  Go to the master instance
  Observe that the namespace has a default gateway
  Add an internal interface (Make sure that the IP is 'lower' than the IP of 
the external interface, this is explained below)
  Default gateway will no longer exist

  Cause:
  keepalived.conf has two sections for VIPs: virtual_ipaddress, and 
virtual_ipaddress_excluded. The difference is that any VIPs that go in the 
first section will be propagated on the wire, and any VIPs in the excluded 
section do not. Traditional configuration of keepalived places one VIP in the 
normal section, henceforth known as the 'primary VIP', and all other VIPs in 
the excluded section. Currently the keepalived manager does this by sorting the 
VIPs (Internal IPs, external SNAT IP, and all floating IPs), placing the lowest 
one (By string comparison) as the primary, and the rest of the VIPs in the 
excluded section: 
  
https://github.com/openstack/neutron/blob/master/neutron/agent/linux/keepalived.py#L155

  That code is ran, and keepalived.conf is built when ever a router is
  updated. This means that the primary VIP can change on router updates.
  As it turns out, after a conversation with a keepalived developer,
  keepalived assumes that the order does not change (This is possibly a
  keepalived bug, depending on your view on life, the ordering of the
  stars when keepalived is executed and the wind speed in the Falkland
  Islands in the past leap year). On my system, with the currently
  installed keepalived version, whenever the primary VIP changes, the
  default gateway (Present in the virtual_routes section of
  keepalived.conf) is violently removed.

  Possible solution:
  Make sure that the primary VIP never changes. For example: Fabricate an IP 
per HA router cluster (Derived from the VRID?), add it as a VIP on the HA 
device, configure it as the primary VIP. I played around with a hacky variation 
of this solution and I could no longer reproduce the issue.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1404945/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to