Public bug reported:

neutron version: 14.0.2
general deployment version: stein
deployment method: kolla-ansible
neutron configuration:
 - l3 = ha
 - agent_mode = dvr_snat
 - ovs
general info: multi node deployment, ca ~100 computes

when spawning larger heat stacks with multiple instances (think k8s
infrastructure) sometimes (roughly 50%) we get a "split brain" on snat
namespaces.

logs looks like this on one of the three controller/network nodes.

11:53:43.402    Handling notification for router 
2a218a31-2ef6-406a-a719-17965600e182, state master 11:53:43.403      enqueue 
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master

and then this happens on another of the three controller/network nodes.

11:53:57.582        Handling notification for router 
2a218a31-2ef6-406a-a719-17965600e182, state master enqueue 
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
11:53:57.583        Router 2a218a31-2ef6-406a-a719-17965600e182 
transitioned to master

so neutron sets up all routes in both controller nodes and wrecks havoc on 
session that instances are creating to the outside. obviously deleting the 
routes from the faulty namespace solves the issue.
i can't really find the reason for it being promoted to master even when 
looking through the debug logs. would greatly appreciate any helpful pointers.
the only thing i can think of is some kind of race condition happening and 
therefor everything in neutron looks fine.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1863110

Title:
  2/3 snat namespace transitions to master

Status in neutron:
  New

Bug description:
  neutron version: 14.0.2
  general deployment version: stein
  deployment method: kolla-ansible
  neutron configuration:
   - l3 = ha
   - agent_mode = dvr_snat
   - ovs
  general info: multi node deployment, ca ~100 computes

  when spawning larger heat stacks with multiple instances (think k8s
  infrastructure) sometimes (roughly 50%) we get a "split brain" on snat
  namespaces.

  logs looks like this on one of the three controller/network nodes.

  11:53:43.402    Handling notification for router 
2a218a31-2ef6-406a-a719-17965600e182, state master 11:53:43.403    enqueue 
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
  Router 2a218a31-2ef6-406a-a719-17965600e182 transitioned to master

  and then this happens on another of the three controller/network
  nodes.

  11:53:57.582      Handling notification for router 
2a218a31-2ef6-406a-a719-17965600e182, state master enqueue 
/var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/l3/ha.py:50
  11:53:57.583      Router 2a218a31-2ef6-406a-a719-17965600e182 
transitioned to master

  so neutron sets up all routes in both controller nodes and wrecks havoc on 
session that instances are creating to the outside. obviously deleting the 
routes from the faulty namespace solves the issue.
  i can't really find the reason for it being promoted to master even when 
looking through the debug logs. would greatly appreciate any helpful pointers.
  the only thing i can think of is some kind of race condition happening and 
therefor everything in neutron looks fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1863110/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to