** Also affects: networking-midonet
   Importance: Undecided
       Status: New

** Changed in: networking-midonet
   Importance: Undecided => Medium

** Changed in: networking-midonet
       Status: New => In Progress

** Changed in: networking-midonet
    Milestone: None => 5.0.0

** Changed in: networking-midonet
     Assignee: (unassigned) => YAMAMOTO Takashi (yamamoto)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1659760

Title:
  General scale issue on neutron-fwaas due to RPC broadcast usage
  (fanout)

Status in networking-midonet:
  In Progress
Status in neutron:
  Fix Released

Bug description:
  Actually on all CRUDs methods used on FWaaS resources (Firewall, 
FirewallPolicy, FirewallRule, Firewallgroup, ...) an AMQP fanout cast is sent 
to all L3 agents.
  This is a wrong design, AMPQ cast should be send only to L3Agents managing 
routers with firewalls related to the tenant.

  This wrong design result in many bugs already reported:

  1) FirewallNotFound during firewall_deleted
  https://bugs.launchpad.net/neutron/+bug/1622460
  https://bugs.launchpad.net/neutron/+bug/1658060

  Explanation using 2 L3agents:
  agent1: host router with firewall for tenant
  agent2: doesn't host tenant router

    1. neutron firewall-delete <firewall>
    2. neutron-server send an AMQP call "delete_firewall" to agent1 and agent2
    3. agent1 clean router firewall and send back "firewall_deleted" to 
neutron-server
    4. neutron-server delete firewall resource from database
    5. agent2 has nothing to clean and send back firewall_deleted to 
neutron-server
    6. neutron-server get an exception "FirewallNotFound"
       http://paste.openstack.org/raw/94663/

    But this is not ended :(
    7. agent2 get back the "FirewallNotfound" exception
    8. on RPC error it will performed a kind of "full synchronisation" 
(process_services_sync)
       send an AMQP call "get_tenants_with_firewalls"
    9. neutron-server will respond back with a ALL tenants (even if it's not 
related to this agents)
    10 FOR each tenant agent2 will sent a AMQP call:
       get_firewalls_for_tenant()

  Full sync bug is already reported here:
  https://bugs.launchpad.net/neutron/+bug/1618244

  2) Intermittent failed on Tempest check is probably link:
  https://bugs.launchpad.net/neutron/+bug/1649703

  3) More generally on FWaaS CRUDs operations neutron-server flood and is 
flooded by many AMQP requests.
  => this result in neutron-server RPC worker fully busy
  => AMQP messages accumulated in q-firewall-plugin queue
  => RPC Timeout appears on agents after (60s)
  => full synchronisation triggered
  => etc, etc...

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-midonet/+bug/1659760/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to