Public bug reported:

# Summary

Race condition while processing security_groups_member_updated events
(ipset)

# Overview

We have a customer that uses heat templates to deploy large environments
(e.g. 21 instances) with a significant number of security groups (e.g.
60) that use bi-directional remote group references for both ingress and
egress filtering.  These heat stacks are deployed using a CI pipeline
and intermittently suffer from application layer failures due to broken
network connectivity.  We found that this was caused by the ipsets used
to implement remote_group memberships missing IPs from their member
lists.  Troubleshooting suggests this is caused by a race condition,
which I've attempted to describe in detail below.

Version: `54e1a6b1bc378c0745afc03987d0fea241b826ae` (HEAD of
stable/rocky as of Jan 26, 2020), though I suspect this issue persists
through master.

I'm working on getting some multi-node environments deployed (I don't
think it's possible to reproduce this with a single hypervisor) and hope
to provide reproduction steps on Rocky and master soon.  I wanted to get
this report submitted as-is with the hopes that an experienced Neutron
dev might be able to spot possible solutions or provide diagnostic
insight that I am not yet able to produce.

I suspect this report may be easier to read with some markdown, so
please feel free to read it in a gist:
https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b

Also, this diagram is probably critical to following along: https
://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-
11ea-9bad-085019c0f954.png

# Race condition symptoms

Given the following security groups/rules:

```
| secgroup name | secgroup id                          | direction | remote 
group                         | dest port |
|---------------|--------------------------------------|-----------|--------------------------------------|-----------|
| server        | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress   | 
b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092      |
| client        | b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress    | 
fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092      |
```

And the following instances:

```
| instance name | hypervisor | ip          | secgroup assignment |
|---------------|------------|-------------|---------------------|
| server01      | compute01  | 192.168.0.1 | server              |
| server02      | compute02  | 192.168.0.2 | server              |
| server03      | compute03  | 192.168.0.3 | server              |
| client01      | compute04  | 192.168.0.4 | client              |
```

We would expect to find the following ipset representing the `server`
security group members on `compute04`:

```
# ipset list NIPv4fcd6cf12-2ac9-4704-9208-
Name: NIPv4fcd6cf12-2ac9-4704-9208-
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 536
References: 4
Number of entries: 3
Members:
192.168.0.1
192.168.0.2
192.168.0.3
```

What we actually get when the race condition is triggered is an
incomplete list of members in the ipset.  The member list could contain
anywhere between zero and two of the expected IPs.

# Triggering the race condition

The problem occurs when `security_group_member_updated` events arrive between 
`port_update` steps 12 and 22 (see diagram and process details below).  
  - `port_update` step 12 retrieves the remote security groups' member lists, 
which are not necessarily complete yet.
  - `port_update` step 22 adds the port to `IptablesFirewallDriver.ports()`.

This results in `security_group_member_updated` step 3 looking for the
port to apply the updated member list to (in
`IptablesFirewallDriver.ports()`) BEFORE it has been added by
`port_update`'s step 22. This causes the membership update event to
effectively be discarded.  We are then left with whatever the remote
security group's member list was when the `port_update` process
retrieved it at step 12.  This state persists until something triggers
the port being re-added to the `updated_ports` list (e.g. agent restart,
another remote group membership change, local security group
addition/removal, etc).


# Race condition details

The race condition occurs in the linuxbridge agent between the two following 
operations:
  1) Processing a `port_update` event when an instance is first created
  2) Processing `security_group_member_updated` events for the instance's 
remote security groups.

Either of these operations can result in creating or mutating an ipset
from `IpsetManager.set_members()`.  The relevant control flow sequence
for each operation is listed below.  I've left out any branches that did
not seem to be relevant to the race condition.

## Processing a `port_update` event:
  1) We receive an RPC port_update event via 
`LinuxBridgeRpcCallbacks.port_update()`, which adds the tap device to the 
`LinuxBridgeRpcCallbacks.updated_devices` list
  2) Sleep until the next `CommonAgentLoop.daemon_loop()` iteration
  3) `CommonAgentLoop.daemon_loop()` calls `CommonAgentLoop.scan_devices()`
  4) `CommonAgentLoop.scan_devices()` calls 
`LinuxBridgeRpcCallbacks.get_and_clear_updated_devices()` to retrieve and clear 
the `updated_devices` list from step 1.
  5) `CommonAgentLoop.scan_devices()` performs some calculations and returns 
control to `CommonAgentLoop.daemon_loop()` along with a list of added or 
updated devices.
  6) `CommonAgentLoop.daemon_loop()` calls 
`CommonAgentLoop.process_network_devices()` when there are added or updated 
devices from step 5.
  7) `CommonAgentLoop.process_network_devices()` calls 
`SecurityGroupAgentRpc.setup_port_filters()` with the added or updated devices 
from step 5.
  8) `SecurityGroupAgentRpc.setup_port_filters()` checks for devices that were 
added to the `SecurityGroupAgentRpc.devices_to_refilter` list (which happens in 
step 4 of the `security_group_member_updated` process) and appends them to the 
list of updated devices.  This is where devices from the 
`security_group_member_updated` process would have been processed if they 
weren't lost due to the race condition.
  9)  `SecurityGroupAgentRpc.setup_port_filters()` calls 
`SecurityGroupAgentRpc.prepare_devices_filter()` with the updated port IDs
  10) `SecurityGroupAgentRpc.prepare_devices_filter()` calls 
`SecurityGroupAgentRpc._apply_port_filter()` with the updated port IDs
  11) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`SecurityGroupServerRpcApi.security_group_info_for_devices()` with the tap 
device id
  12) `SecurityGroupServerRpcApi.security_group_info_for_devices()` retrieves 
detailed information about the port via RPC.  The important detail here is a 
`sg_member_ids` dict which contains a list of member IPs for each remote 
security groups applicable to the port.
  13) `SecurityGroupServerRpcApi.security_group_info_for_devices()` returns 
control to `SecurityGroupAgentRpc._apply_port_filter()` along with the port and 
security group details.
  14) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`SecurityGroupAgentRpc._update_security_group_info()` with the remote security 
groups and their member IP list.
  15) `SecurityGroupAgentRpc._update_security_group_info()` iterates over the 
remote security groups and calls 
`IptablesFirewallDriver.update_security_group_members()` for each one
  16) `IptablesFirewallDriver.update_security_group_members()` calls 
`IptablesFirewallDriver._update_ipset_members()`
  17) `IptablesFirewallDriver._update_ipset_members()` calls 
`IpsetManager.set_members()`
  18) `IpsetManager.set_members()` calls a number of methods to create or 
mutate the ipset using linux's `ipset` commands.
  19) The stack unwinds back up to `SecurityGroupAgentRpc._apply_port_filter()` 
(last seen in step 11)
  20) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`IptablesFirewallDriver.prepare_port_filter()` with the port
  21) `IptablesFirewallDriver.prepare_port_filter()` calls 
`IptablesFirewallDriver._set_ports()` with the port details retrieved in step 12
  22) `IptablesFirewallDriver._set_ports()` adds the port to 
`IptablesFirewallDriver.filtered_ports`, which indicates the port is known to 
exist on the hypervisor.  It will now be returned by 
IptablesFirewallDriver.ports().

## Processing a `security_group_member_updated` event:
  1) We receive an RPC security_group_member_updated event via 
`SecurityGroupAgentRpc.security_groups_member_updated()`
  2) `SecurityGroupAgentRpc.security_groups_member_updated()` calls 
`SecurityGroupAgentRpc._security_group_updated()` with the list of security 
groups whose membership was updated.
  3) `SecurityGroupAgentRpc._security_group_updated()` consults 
`IptablesFirewallDriver.ports()` to find ports affected by the membership 
updates.  In the race condition, `port_update` has not yet reached step 22 to 
add the port we need to find, so we stop here and the following step does not 
occur.
  4) `SecurityGroupAgentRpc._security_group_updated()` adds each affected port 
from the previous step to the `SecurityGroupAgentRpc.devices_to_refilter` list. 
 This list will be processed next time we reach `port_event` step 8, but in the 
race condition we never get here.

** Affects: neutron
     Importance: Undecided
         Status: New

** Attachment added: "ipset race condition diagram 20200713a.png"
   
https://bugs.launchpad.net/bugs/1887405/+attachment/5392348/+files/ipset%20race%20condition%20diagram%2020200713a.png

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887405

Title:
  Race condition while processing security_groups_member_updated events
  (ipset)

Status in neutron:
  New

Bug description:
  # Summary

  Race condition while processing security_groups_member_updated events
  (ipset)

  # Overview

  We have a customer that uses heat templates to deploy large
  environments (e.g. 21 instances) with a significant number of security
  groups (e.g. 60) that use bi-directional remote group references for
  both ingress and egress filtering.  These heat stacks are deployed
  using a CI pipeline and intermittently suffer from application layer
  failures due to broken network connectivity.  We found that this was
  caused by the ipsets used to implement remote_group memberships
  missing IPs from their member lists.  Troubleshooting suggests this is
  caused by a race condition, which I've attempted to describe in detail
  below.

  Version: `54e1a6b1bc378c0745afc03987d0fea241b826ae` (HEAD of
  stable/rocky as of Jan 26, 2020), though I suspect this issue persists
  through master.

  I'm working on getting some multi-node environments deployed (I don't
  think it's possible to reproduce this with a single hypervisor) and
  hope to provide reproduction steps on Rocky and master soon.  I wanted
  to get this report submitted as-is with the hopes that an experienced
  Neutron dev might be able to spot possible solutions or provide
  diagnostic insight that I am not yet able to produce.

  I suspect this report may be easier to read with some markdown, so
  please feel free to read it in a gist:
  https://gist.github.com/cfarquhar/20fddf2000a83216021bd15b512f772b

  Also, this diagram is probably critical to following along: https
  ://user-images.githubusercontent.com/1253665/87317744-0a75b180-c4ed-
  11ea-9bad-085019c0f954.png

  # Race condition symptoms

  Given the following security groups/rules:

  ```
  | secgroup name | secgroup id                          | direction | remote 
group                         | dest port |
  
|---------------|--------------------------------------|-----------|--------------------------------------|-----------|
  | server        | fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | ingress   | 
b52c8c54-b97a-477d-8b68-f4075e7595d9 | 9092      |
  | client        | b52c8c54-b97a-477d-8b68-f4075e7595d9 | egress    | 
fcd6cf12-2ac9-4704-9208-7c6cb83d1a71 | 9092      |
  ```

  And the following instances:

  ```
  | instance name | hypervisor | ip          | secgroup assignment |
  |---------------|------------|-------------|---------------------|
  | server01      | compute01  | 192.168.0.1 | server              |
  | server02      | compute02  | 192.168.0.2 | server              |
  | server03      | compute03  | 192.168.0.3 | server              |
  | client01      | compute04  | 192.168.0.4 | client              |
  ```

  We would expect to find the following ipset representing the `server`
  security group members on `compute04`:

  ```
  # ipset list NIPv4fcd6cf12-2ac9-4704-9208-
  Name: NIPv4fcd6cf12-2ac9-4704-9208-
  Type: hash:net
  Revision: 6
  Header: family inet hashsize 1024 maxelem 65536
  Size in memory: 536
  References: 4
  Number of entries: 3
  Members:
  192.168.0.1
  192.168.0.2
  192.168.0.3
  ```

  What we actually get when the race condition is triggered is an
  incomplete list of members in the ipset.  The member list could
  contain anywhere between zero and two of the expected IPs.

  # Triggering the race condition

  The problem occurs when `security_group_member_updated` events arrive between 
`port_update` steps 12 and 22 (see diagram and process details below).  
    - `port_update` step 12 retrieves the remote security groups' member lists, 
which are not necessarily complete yet.
    - `port_update` step 22 adds the port to `IptablesFirewallDriver.ports()`.

  This results in `security_group_member_updated` step 3 looking for the
  port to apply the updated member list to (in
  `IptablesFirewallDriver.ports()`) BEFORE it has been added by
  `port_update`'s step 22. This causes the membership update event to
  effectively be discarded.  We are then left with whatever the remote
  security group's member list was when the `port_update` process
  retrieved it at step 12.  This state persists until something triggers
  the port being re-added to the `updated_ports` list (e.g. agent
  restart, another remote group membership change, local security group
  addition/removal, etc).

  
  # Race condition details

  The race condition occurs in the linuxbridge agent between the two following 
operations:
    1) Processing a `port_update` event when an instance is first created
    2) Processing `security_group_member_updated` events for the instance's 
remote security groups.

  Either of these operations can result in creating or mutating an ipset
  from `IpsetManager.set_members()`.  The relevant control flow sequence
  for each operation is listed below.  I've left out any branches that
  did not seem to be relevant to the race condition.

  ## Processing a `port_update` event:
    1) We receive an RPC port_update event via 
`LinuxBridgeRpcCallbacks.port_update()`, which adds the tap device to the 
`LinuxBridgeRpcCallbacks.updated_devices` list
    2) Sleep until the next `CommonAgentLoop.daemon_loop()` iteration
    3) `CommonAgentLoop.daemon_loop()` calls `CommonAgentLoop.scan_devices()`
    4) `CommonAgentLoop.scan_devices()` calls 
`LinuxBridgeRpcCallbacks.get_and_clear_updated_devices()` to retrieve and clear 
the `updated_devices` list from step 1.
    5) `CommonAgentLoop.scan_devices()` performs some calculations and returns 
control to `CommonAgentLoop.daemon_loop()` along with a list of added or 
updated devices.
    6) `CommonAgentLoop.daemon_loop()` calls 
`CommonAgentLoop.process_network_devices()` when there are added or updated 
devices from step 5.
    7) `CommonAgentLoop.process_network_devices()` calls 
`SecurityGroupAgentRpc.setup_port_filters()` with the added or updated devices 
from step 5.
    8) `SecurityGroupAgentRpc.setup_port_filters()` checks for devices that 
were added to the `SecurityGroupAgentRpc.devices_to_refilter` list (which 
happens in step 4 of the `security_group_member_updated` process) and appends 
them to the list of updated devices.  This is where devices from the 
`security_group_member_updated` process would have been processed if they 
weren't lost due to the race condition.
    9)  `SecurityGroupAgentRpc.setup_port_filters()` calls 
`SecurityGroupAgentRpc.prepare_devices_filter()` with the updated port IDs
    10) `SecurityGroupAgentRpc.prepare_devices_filter()` calls 
`SecurityGroupAgentRpc._apply_port_filter()` with the updated port IDs
    11) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`SecurityGroupServerRpcApi.security_group_info_for_devices()` with the tap 
device id
    12) `SecurityGroupServerRpcApi.security_group_info_for_devices()` retrieves 
detailed information about the port via RPC.  The important detail here is a 
`sg_member_ids` dict which contains a list of member IPs for each remote 
security groups applicable to the port.
    13) `SecurityGroupServerRpcApi.security_group_info_for_devices()` returns 
control to `SecurityGroupAgentRpc._apply_port_filter()` along with the port and 
security group details.
    14) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`SecurityGroupAgentRpc._update_security_group_info()` with the remote security 
groups and their member IP list.
    15) `SecurityGroupAgentRpc._update_security_group_info()` iterates over the 
remote security groups and calls 
`IptablesFirewallDriver.update_security_group_members()` for each one
    16) `IptablesFirewallDriver.update_security_group_members()` calls 
`IptablesFirewallDriver._update_ipset_members()`
    17) `IptablesFirewallDriver._update_ipset_members()` calls 
`IpsetManager.set_members()`
    18) `IpsetManager.set_members()` calls a number of methods to create or 
mutate the ipset using linux's `ipset` commands.
    19) The stack unwinds back up to 
`SecurityGroupAgentRpc._apply_port_filter()` (last seen in step 11)
    20) `SecurityGroupAgentRpc._apply_port_filter()` calls 
`IptablesFirewallDriver.prepare_port_filter()` with the port
    21) `IptablesFirewallDriver.prepare_port_filter()` calls 
`IptablesFirewallDriver._set_ports()` with the port details retrieved in step 12
    22) `IptablesFirewallDriver._set_ports()` adds the port to 
`IptablesFirewallDriver.filtered_ports`, which indicates the port is known to 
exist on the hypervisor.  It will now be returned by 
IptablesFirewallDriver.ports().

  ## Processing a `security_group_member_updated` event:
    1) We receive an RPC security_group_member_updated event via 
`SecurityGroupAgentRpc.security_groups_member_updated()`
    2) `SecurityGroupAgentRpc.security_groups_member_updated()` calls 
`SecurityGroupAgentRpc._security_group_updated()` with the list of security 
groups whose membership was updated.
    3) `SecurityGroupAgentRpc._security_group_updated()` consults 
`IptablesFirewallDriver.ports()` to find ports affected by the membership 
updates.  In the race condition, `port_update` has not yet reached step 22 to 
add the port we need to find, so we stop here and the following step does not 
occur.
    4) `SecurityGroupAgentRpc._security_group_updated()` adds each affected 
port from the previous step to the `SecurityGroupAgentRpc.devices_to_refilter` 
list.  This list will be processed next time we reach `port_event` step 8, but 
in the race condition we never get here.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1887405/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to