** Also affects: networking-midonet Importance: Undecided Status: New
** Changed in: networking-midonet Importance: Undecided => Medium ** Changed in: networking-midonet Status: New => In Progress ** Changed in: networking-midonet Milestone: None => 5.0.0 ** Changed in: networking-midonet Assignee: (unassigned) => YAMAMOTO Takashi (yamamoto) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1659760 Title: General scale issue on neutron-fwaas due to RPC broadcast usage (fanout) Status in networking-midonet: In Progress Status in neutron: Fix Released Bug description: Actually on all CRUDs methods used on FWaaS resources (Firewall, FirewallPolicy, FirewallRule, Firewallgroup, ...) an AMQP fanout cast is sent to all L3 agents. This is a wrong design, AMPQ cast should be send only to L3Agents managing routers with firewalls related to the tenant. This wrong design result in many bugs already reported: 1) FirewallNotFound during firewall_deleted https://bugs.launchpad.net/neutron/+bug/1622460 https://bugs.launchpad.net/neutron/+bug/1658060 Explanation using 2 L3agents: agent1: host router with firewall for tenant agent2: doesn't host tenant router 1. neutron firewall-delete <firewall> 2. neutron-server send an AMQP call "delete_firewall" to agent1 and agent2 3. agent1 clean router firewall and send back "firewall_deleted" to neutron-server 4. neutron-server delete firewall resource from database 5. agent2 has nothing to clean and send back firewall_deleted to neutron-server 6. neutron-server get an exception "FirewallNotFound" http://paste.openstack.org/raw/94663/ But this is not ended :( 7. agent2 get back the "FirewallNotfound" exception 8. on RPC error it will performed a kind of "full synchronisation" (process_services_sync) send an AMQP call "get_tenants_with_firewalls" 9. neutron-server will respond back with a ALL tenants (even if it's not related to this agents) 10 FOR each tenant agent2 will sent a AMQP call: get_firewalls_for_tenant() Full sync bug is already reported here: https://bugs.launchpad.net/neutron/+bug/1618244 2) Intermittent failed on Tempest check is probably link: https://bugs.launchpad.net/neutron/+bug/1649703 3) More generally on FWaaS CRUDs operations neutron-server flood and is flooded by many AMQP requests. => this result in neutron-server RPC worker fully busy => AMQP messages accumulated in q-firewall-plugin queue => RPC Timeout appears on agents after (60s) => full synchronisation triggered => etc, etc... To manage notifications about this bug go to: https://bugs.launchpad.net/networking-midonet/+bug/1659760/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp