[Yahoo-eng-team] [Bug 1923412] [NEW] [stable/stein] Tempest fails with unrecognized arguments: --exclude-regex

2021-04-12 Thread Bernard Cafarelli
Public bug reported:

Stein neutron-tempest-plugin jobs with exclude regexes now fail 100%, for 
example:
https://review.opendev.org/c/openstack/neutron/+/774258
https://zuul.opendev.org/t/openstack/build/cf9c6880833041ffabd7726059875090

all run-test: commands[1] | tempest run --regex 
'(^neutron_tempest_plugin.scenario)|(^tempest.api.compute.servers.test_attach_interfaces)|(^tempest.api.compute.servers.test_multiple_create)'
 --concurrency=3 
'--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)'
usage: tempest run [-h] [--workspace WORKSPACE]
   [--workspace-path WORKSPACE_PATH]
   [--config-file CONFIG_FILE] [--smoke | --regex REGEX]
   [--black-regex BLACK_REGEX]
   [--whitelist-file WHITELIST_FILE]
   [--blacklist-file BLACKLIST_FILE] [--load-list LOAD_LIST]
   [--list-tests] [--concurrency CONCURRENCY]
   [--parallel | --serial] [--save-state] [--subunit]
   [--combine]
tempest run: error: unrecognized arguments: 
--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)
ERROR: InvocationError for command /opt/stack/tempest/.tox/tempest/bin/tempest 
run --regex 
'(^neutron_tempest_plugin.scenario)|(^tempest.api.compute.servers.test_attach_interfaces)|(^tempest.api.compute.servers.test_multiple_create)'
 --concurrency=3 
'--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)'
 (exited with code 2)


Most probably caused by 
https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/775257 change, 
jobs for stein should be updated to keep old parameter name

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923412

Title:
  [stable/stein] Tempest fails with unrecognized arguments: --exclude-
  regex

Status in neutron:
  New

Bug description:
  Stein neutron-tempest-plugin jobs with exclude regexes now fail 100%, for 
example:
  https://review.opendev.org/c/openstack/neutron/+/774258
  https://zuul.opendev.org/t/openstack/build/cf9c6880833041ffabd7726059875090

  all run-test: commands[1] | tempest run --regex 
'(^neutron_tempest_plugin.scenario)|(^tempest.api.compute.servers.test_attach_interfaces)|(^tempest.api.compute.servers.test_multiple_create)'
 --concurrency=3 
'--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)'
  usage: tempest run [-h] [--workspace WORKSPACE]
 [--workspace-path WORKSPACE_PATH]
 [--config-file CONFIG_FILE] [--smoke | --regex REGEX]
 [--black-regex BLACK_REGEX]
 [--whitelist-file WHITELIST_FILE]
 [--blacklist-file BLACKLIST_FILE] [--load-list LOAD_LIST]
 [--list-tests] [--concurrency CONCURRENCY]
 [--parallel | --serial] [--save-state] [--subunit]
 [--combine]
  tempest run: error: unrecognized arguments: 
--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)
  ERROR: InvocationError for command 
/opt/stack/tempest/.tox/tempest/bin/tempest run --regex 
'(^neutron_tempest_plugin.scenario)|(^tempest.api.compute.servers.test_attach_interfaces)|(^tempest.api.compute.servers.test_multiple_create)'
 --concurrency=3 
'--exclude-regex=(^neutron_tempest_plugin.scenario.test_vlan_transparency.VlanTransparencyTest)'
 (exited with code 2)

  
  Most probably caused by 
https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/775257 change, 
jobs for stein should be updated to keep old parameter name

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923412/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923413] [NEW] [stable/rocky and older] Tempest jobs fail on alembic dependency

2021-04-12 Thread Bernard Cafarelli
Public bug reported:

alembic dropped some python versions support apparently, and we do not have an 
upper cap on it. So tempest jobs fail with POST_FAILURE, like (rocky):
https://review.opendev.org/c/openstack/neutron/+/783544
https://zuul.opendev.org/t/openstack/build/f300e1a82627435da71bc133445bc279

Collecting alembic>=0.8.10 (from subunit2sql>=0.8.0->stackviz==0.0.1.dev320)
  Downloading 
http://mirror.gra1.ovh.opendev.org/wheel/ubuntu-16.04-x86_64/alembic/alembic-1.5.5-py2.py3-none-any.whl
 (156kB)

:stderr: alembic requires Python
'!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7' but the running
Python is 3.5.2

And similar failure in queens, for example in
https://review.opendev.org/c/openstack/neutron/+/776455

** Affects: neutron
 Importance: Undecided
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923413

Title:
  [stable/rocky and older] Tempest jobs fail on alembic dependency

Status in neutron:
  New

Bug description:
  alembic dropped some python versions support apparently, and we do not have 
an upper cap on it. So tempest jobs fail with POST_FAILURE, like (rocky):
  https://review.opendev.org/c/openstack/neutron/+/783544
  https://zuul.opendev.org/t/openstack/build/f300e1a82627435da71bc133445bc279

  Collecting alembic>=0.8.10 (from subunit2sql>=0.8.0->stackviz==0.0.1.dev320)
Downloading 
http://mirror.gra1.ovh.opendev.org/wheel/ubuntu-16.04-x86_64/alembic/alembic-1.5.5-py2.py3-none-any.whl
 (156kB)

  :stderr: alembic requires Python
  '!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7' but the
  running Python is 3.5.2

  And similar failure in queens, for example in
  https://review.opendev.org/c/openstack/neutron/+/776455

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923413/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923423] [NEW] l3_router.service_providers DriverController's _attrs_to_driver is not py3 compatible

2021-04-12 Thread Lajos Katona
Public bug reported:

Currently l3_router.service_providers.DriverController._attrs_to_driver has the 
following:
...
drivers = self.drivers.values()
# make sure default is tried before the rest if defined
if self.default_provider:
drivers.insert(0, self.drivers[self.default_provider])

As in python3 dict.values() gives back "dict_values" instead of list, insert 
will fail with:
"AttributeError: 'dict_values' object has no attribute 'insert'"

** Affects: neutron
 Importance: Undecided
 Assignee: Lajos Katona (lajos-katona)
 Status: New


** Tags: trivial

** Tags added: trivial

** Changed in: neutron
 Assignee: (unassigned) => Lajos Katona (lajos-katona)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923423

Title:
  l3_router.service_providers DriverController's _attrs_to_driver is not
  py3 compatible

Status in neutron:
  New

Bug description:
  Currently l3_router.service_providers.DriverController._attrs_to_driver has 
the following:
  ...
  drivers = self.drivers.values()
  # make sure default is tried before the rest if defined
  if self.default_provider:
  drivers.insert(0, self.drivers[self.default_provider])

  As in python3 dict.values() gives back "dict_values" instead of list, insert 
will fail with:
  "AttributeError: 'dict_values' object has no attribute 'insert'"

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923423/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923449] [NEW] test_security_group_recreated_on_port_update fails with not yet created default group

2021-04-12 Thread Lajos Katona
Public bug reported:

test_security_group_recreated_on_port_update from neutron-tempest-plugin seems 
to be failing sporadically with no default group present after port update, 
example:
https://40d1580bb656fd0ed240-3f272db0dacf207a646e9867f60c7e03.ssl.cf1.rackcdn.com/785830/1/check/neutron-tempest-plugin-api/f16b2f0/testr_results.html

Logstash query:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22testtools.matchers._impl.MismatchError%3A%20'default'%20not%20in%20%5B%5D%5C%22

** Affects: neutron
 Importance: Medium
 Assignee: Lajos Katona (lajos-katona)
 Status: New


** Tags: tempest

** Tags added: tempest

** Changed in: neutron
 Assignee: (unassigned) => Lajos Katona (lajos-katona)

** Changed in: neutron
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923449

Title:
  test_security_group_recreated_on_port_update fails with not yet
  created default group

Status in neutron:
  New

Bug description:
  test_security_group_recreated_on_port_update from neutron-tempest-plugin 
seems to be failing sporadically with no default group present after port 
update, example:
  
https://40d1580bb656fd0ed240-3f272db0dacf207a646e9867f60c7e03.ssl.cf1.rackcdn.com/785830/1/check/neutron-tempest-plugin-api/f16b2f0/testr_results.html

  Logstash query:
  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22testtools.matchers._impl.MismatchError%3A%20'default'%20not%20in%20%5B%5D%5C%22

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923449/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1822849] Re: Timezone offset displayed in horizon / user / settings is always using daylight saving

2021-04-12 Thread Vishal Manchanda
It is fixed by https://review.opendev.org/c/openstack/horizon/+/649379/.

** Changed in: horizon
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1822849

Title:
  Timezone offset displayed in horizon / user / settings is always using
  daylight saving

Status in OpenStack Dashboard (Horizon):
  Fix Released

Bug description:
  Timezone offset displayed in horizon / user / settings is always using
  daylight saving

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1822849/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1869129] Re: neutron accepts CIDR in security groups that are invalid in ovn

2021-04-12 Thread Slawek Kaplonski
** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869129

Title:
  neutron accepts CIDR in security groups that are invalid in ovn

Status in neutron:
  Fix Released

Bug description:
  We have found that there are some CIDR accepted by neutron, which does
  not work in networking ovn. Specifically, these are network CIDRs with
  the host bits set.

  Steps to reproduce

  - Create VM. Attach a floating IP to it

  - Remove all security group. Attach a blank security group to it

  - Add a security group rule and start ping

  For example, if my IP is 10.10.10.175/26 (first 3 octets changed for
  privacy), the following security rules work

  openstack security group rule create --protocol icmp --remote-ip 
10.10.10.175/32 cidr
  openstack security group rule create --protocol icmp --remote-ip 
10.10.10.128/26 cidr

  However, the following security group rule do not work

  openstack security group rule create --protocol icmp --remote-ip
  10.10.10.175/26 cidr

  FWIW, in our testing, CIDRs like 10.10.10.175/26 work in other
  drivers, like linuxbridge and midonet.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1869129/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1894843] Re: [dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host

2021-04-12 Thread Corey Bryant
This bug was fixed in the package neutron - 2:15.3.2-0ubuntu1~cloud2
---

 neutron (2:15.3.2-0ubuntu1~cloud2) bionic-train; urgency=medium
 .
   * Backport fix for dvr-snat missig rfp interfaces (LP: #1894843)
 - d/p/0001-Fix-deletion-of-rfp-interfaces-when-router-is-re-ena.patch


** Changed in: cloud-archive/train
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1894843

Title:
  [dvr_snat] Router update deletes rfp interface from qrouter even when
  VM port is present on this host

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Triaged
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  When neutron schedules snat namespaces it sometimes deletes the rfp interface 
from qrouter namespaces which breaks external network (fip) connectivity. The 
fix prevents this from happening.

  [Test Case]
   * deploy Openstack (Ussuri or above) with dvr_snat enabled in compute hosts.
   * ensure min. 2 compute hosts
   * create one ext network and one private network
   * add private subnet to router and ext as gateway
   * check which compute has the snat ns (ip netns| grep snat)
   * create a vm on each compute host
   * check that qrouter ns on both computes has rfp interface
   * ip netns| grep qrouter; ip netns exec  ip a s| grep rfp
   * disable and re-enable router
   * openstack router set --disable ;  openstack router set --enable 

   * check again
   * ip netns| grep qrouter; ip netns exec  ip a s| grep rfp

  [Where problems could occur]
  This patch is in fact restoring expected behaviour and is not expected to
  introduce any new regressions.

  -

  Hello,

  In the case of dvr_snat l3 agents are deployed on hypervisors there
  can be race condition. The agent creates snat namespaces on each
  scheduled host and removes them at second step. At this second step
  agent removes the rfp interface from qrouter even when there is VM
  with floating IP on the host.

  When VM is deployed at the time of second step we can lost external
  access to VMs floating IP. The issue can be reproduced by hand:

  1. Create tenant network and router with external gateway
  2. Create VM with floating ip
  3. Ensure that VM on the hypervisor without snat-* namespace
  4. Set the router to disabled state (openstack router set --disable )
  5. Set the router to enabled state (openstack router set --enabled )
  6. The external access to VMs FIP have lost because L3 agent creates the 
qrouter namespace without rfp interface.

  Environment:

  1. Neutron with ML2 OVS plugin.
  2. L3 agents in dvr_snat mode on each hypervisor
  3. openstack-neutron-common-15.1.1-0.2020061910.7d97420.el8ost.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1894843/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1894843] Re: [dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host

2021-04-12 Thread Corey Bryant
This bug was fixed in the package neutron - 2:13.0.7-0ubuntu1~cloud5
---

 neutron (2:13.0.7-0ubuntu1~cloud5) bionic-rocky; urgency=medium
 .
   * Backport fix for dvr-snat missig rfp interfaces (LP: #1894843)
 - d/p/0001-Fix-deletion-of-rfp-interfaces-when-router-is-re-ena.patch


** Changed in: cloud-archive/stein
   Status: Fix Committed => Fix Released

** Changed in: cloud-archive/rocky
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1894843

Title:
  [dvr_snat] Router update deletes rfp interface from qrouter even when
  VM port is present on this host

Status in Ubuntu Cloud Archive:
  Fix Committed
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Triaged
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  [Impact]
  When neutron schedules snat namespaces it sometimes deletes the rfp interface 
from qrouter namespaces which breaks external network (fip) connectivity. The 
fix prevents this from happening.

  [Test Case]
   * deploy Openstack (Ussuri or above) with dvr_snat enabled in compute hosts.
   * ensure min. 2 compute hosts
   * create one ext network and one private network
   * add private subnet to router and ext as gateway
   * check which compute has the snat ns (ip netns| grep snat)
   * create a vm on each compute host
   * check that qrouter ns on both computes has rfp interface
   * ip netns| grep qrouter; ip netns exec  ip a s| grep rfp
   * disable and re-enable router
   * openstack router set --disable ;  openstack router set --enable 

   * check again
   * ip netns| grep qrouter; ip netns exec  ip a s| grep rfp

  [Where problems could occur]
  This patch is in fact restoring expected behaviour and is not expected to
  introduce any new regressions.

  -

  Hello,

  In the case of dvr_snat l3 agents are deployed on hypervisors there
  can be race condition. The agent creates snat namespaces on each
  scheduled host and removes them at second step. At this second step
  agent removes the rfp interface from qrouter even when there is VM
  with floating IP on the host.

  When VM is deployed at the time of second step we can lost external
  access to VMs floating IP. The issue can be reproduced by hand:

  1. Create tenant network and router with external gateway
  2. Create VM with floating ip
  3. Ensure that VM on the hypervisor without snat-* namespace
  4. Set the router to disabled state (openstack router set --disable )
  5. Set the router to enabled state (openstack router set --enabled )
  6. The external access to VMs FIP have lost because L3 agent creates the 
qrouter namespace without rfp interface.

  Environment:

  1. Neutron with ML2 OVS plugin.
  2. L3 agents in dvr_snat mode on each hypervisor
  3. openstack-neutron-common-15.1.1-0.2020061910.7d97420.el8ost.noarch

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1894843/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887148] Re: Network loop between physical networks with DVR

2021-04-12 Thread Corey Bryant
This bug was fixed in the package neutron - 2:12.1.1-0ubuntu4~cloud0
---

 neutron (2:12.1.1-0ubuntu4~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:12.1.1-0ubuntu4) bionic; urgency=medium
 .
   * Fix interrupt of VLAN traffic on reboot of neutron-ovs-agent:
   - d/p/0001-ovs-agent-signal-to-plugin-if-tunnel-refresh-needed.patch (LP: 
#1853613)
   - d/p/0002-Do-not-block-connection-between-br-int-and-br-phys-o.patch (LP: 
#1869808)
   - d/p/0003-Ensure-that-stale-flows-are-cleaned-from-phys_bridge.patch (LP: 
#1864822)
   - d/p/0004-DVR-Reconfigure-re-created-physical-bridges-for-dvr-.patch (LP: 
#1864822)
   - d/p/0005-Ensure-drop-flows-on-br-int-at-agent-startup-for-DVR.patch (LP: 
#1887148)
   - d/p/0006-Don-t-check-if-any-bridges-were-recrected-when-OVS-w.patch (LP: 
#1864822)
   - d/p/0007-Not-remove-the-running-router-when-MQ-is-unreachable.patch (LP: 
#1871850)


** Changed in: cloud-archive/queens
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887148

Title:
  Network loop between physical networks with DVR

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released

Bug description:
  (For SRU template, please see bug 1869808, as the SRU info there
  applies to this bug also)

  
  Our CI experienced a network loop due to 
https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than 
one physical bridge mapping, and the neutron server was not available when the 
ovs agents were started.

  Steps
  =
  # add more physical bridges
  ovs-vsctl add-br br-physnet1
  ip link set dev br-physnet1 up

  ovs-vsctl add-br br-physnet2
  ip link set dev br-physnet2 up

  # set a broadcast going from one bridge
  ip address add 1.1.1.1/31 dev br-physnet1
  arping -b -I br-physnet1 1.1.1.1

  # listen on the other
  tcpdump -eni br-physnet2

  # Update /etc/neutron/plugins/ml2/ml2_conf.ini
  [ml2_type_vlan]
  network_vlan_ranges = public,physnet1,physnet2

  [ovs]
  datapath_type = system
  bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
  tunnel_bridge = br-tun
  local_ip = 127.0.0.1

  [agent]
  tunnel_types = vxlan
  root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon 
/etc/neutron/rootwrap.conf
  root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
  enable_distributed_routing = True
  l2_population = True

  # stop server and agent
  systemctl stop devstack@q-svc
  systemctl stop devstack@q-agt

  # clear all flows
  for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows 
$BR; done

  # start agent
  systemctl start devstack@q-agt

  $ sudo tcpdump -eni br-physnet2
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 
bytes
  09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, 
length 28
  09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, 
length 28
  ...

  If there is more than one node running the ovs agent in this state,
  then there will be a network loop and packets can multiple quickly and
  overwhelm the network. We saw ~1 million packets/sec.

  I think because the neutron server is not available, the get_dvr_mac_address 
rpc is blocked and the required drops are not installed:
  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
  
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1887148/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-04-12 Thread Corey Bryant
This bug was fixed in the package neutron - 2:12.1.1-0ubuntu4~cloud0
---

 neutron (2:12.1.1-0ubuntu4~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:12.1.1-0ubuntu4) bionic; urgency=medium
 .
   * Fix interrupt of VLAN traffic on reboot of neutron-ovs-agent:
   - d/p/0001-ovs-agent-signal-to-plugin-if-tunnel-refresh-needed.patch (LP: 
#1853613)
   - d/p/0002-Do-not-block-connection-between-br-int-and-br-phys-o.patch (LP: 
#1869808)
   - d/p/0003-Ensure-that-stale-flows-are-cleaned-from-phys_bridge.patch (LP: 
#1864822)
   - d/p/0004-DVR-Reconfigure-re-created-physical-bridges-for-dvr-.patch (LP: 
#1864822)
   - d/p/0005-Ensure-drop-flows-on-br-int-at-agent-startup-for-DVR.patch (LP: 
#1887148)
   - d/p/0006-Don-t-check-if-any-bridges-were-recrected-when-OVS-w.patch (LP: 
#1864822)
   - d/p/0007-Not-remove-the-running-router-when-MQ-is-unreachable.patch (LP: 
#1871850)


** Changed in: cloud-archive/queens
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  (SRU template copied from comment 42)

  [Impact]

  - When there is a RabbitMQ or neutron-api outage, the neutron-
  openvswitch-agent undergoes a "resync" process and temporarily blocks
  all VM traffic. This always happens for a short time period (maybe <1
  second) but in some high scale environments this lasts for minutes. If
  RabbitMQ is down again during the re-sync, traffic will also be
  blocked until it can connect which may be for a long period. This also
  affects situations where neutron-openvswitch-agent is intentionally
  restarted while RabbitMQ is down. Bug #1869808 addresses this issue
  and Bug #1887148 is a fix for that fix to prevent network loops during
  DVR startup.

  - In the same situation, the neutron-l3-agent can delete the L3 router
  (Bug #1871850), or may need to refresh the tunnel (Bug #1853613), or
  may need to update flows or reconfigure bridges (Bug #1864822)

  [Test Plan]

  (1) Deploy Openstack Bionic-Queens with DVR and a *VLAN* tenant
  network (VXLAN or FLAT will not reproduce the issue). With a standard
  deployment, simply enabling DHCP on the ext_net subnet will allow VMs
  to be booted directly on the ext_net provider network. "openstack
  subnet set --dhcp ext_net and then deploy the VM directly to ext_net"

  (2) Deploy a VM to the VLAN network

  (3) Start pinging the VM from an external network

  (4) Stop all RabbitMQ servers

  (5) Restart neutron-openvswitch-agent

  (6) Ping traffic should NOT see interruption

  (7) Start all RabbitMQ servers

  (8) Ping traffic should still be fine

  [Where problems could occur]

  These patches are all cherry-picked from the upstream stable branches,
  and have existed upstream including the stable/queens branch for many
  months and in Ubuntu all supported subsequent releases (Stein onwards)
  have also had these patches for many months with the exception of
  Queens.

  There is a chance that not installing these drop flows during startup
  could have traffic go somewhere that's not expected when the network
  is in a partially setup case, this was the case for DVR and in setups
  where more than 1 DVR external network port existed a network loop was
  possibly temporarily created. This was already addressed with the
  included patch for Bug #1869808. Checked and could not locate any
  other merged changes to this drop_port logic that also need to be
  backported.

  [Other Info]

  [original description]

  We are using Openstack Neutron 13.0.6 and it is deployed using
  OpenStack-helm.

  I test ping servers in the same vlan while rebooting neutron-ovs-
  agent. The result shows

  root@mgt01:~# openstack server list
  
+--+-++--+--+---+
  | ID   | Name| Status | Networks  
   | Image| Flav

[Yahoo-eng-team] [Bug 1871850] Re: [L3] existing router resources are partial deleted unexpectedly when MQ is gone

2021-04-12 Thread Corey Bryant
This bug was fixed in the package neutron - 2:12.1.1-0ubuntu4~cloud0
---

 neutron (2:12.1.1-0ubuntu4~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:12.1.1-0ubuntu4) bionic; urgency=medium
 .
   * Fix interrupt of VLAN traffic on reboot of neutron-ovs-agent:
   - d/p/0001-ovs-agent-signal-to-plugin-if-tunnel-refresh-needed.patch (LP: 
#1853613)
   - d/p/0002-Do-not-block-connection-between-br-int-and-br-phys-o.patch (LP: 
#1869808)
   - d/p/0003-Ensure-that-stale-flows-are-cleaned-from-phys_bridge.patch (LP: 
#1864822)
   - d/p/0004-DVR-Reconfigure-re-created-physical-bridges-for-dvr-.patch (LP: 
#1864822)
   - d/p/0005-Ensure-drop-flows-on-br-int-at-agent-startup-for-DVR.patch (LP: 
#1887148)
   - d/p/0006-Don-t-check-if-any-bridges-were-recrected-when-OVS-w.patch (LP: 
#1864822)
   - d/p/0007-Not-remove-the-running-router-when-MQ-is-unreachable.patch (LP: 
#1871850)


** Changed in: cloud-archive/queens
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1871850

Title:
  [L3] existing router resources are partial deleted unexpectedly when
  MQ is gone

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Committed
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released

Bug description:
  (For SRU template, please see bug 1869808, as the SRU info there
  applies to this bug also)

  ENV: meet this issue on our stable/queens deployment, but master
  branch has the same code logic

  When the L3 agent get a router update notification, it will try to
  retrieve the router info from DB server [1]. But at this time, if the
  message queue is down/unreachable. It will get exceptions related
  message queue. A resync action will be run then [2]. Sometimes, from
  my personal experience, rabbitMQ cluster is not so much easy to
  recover. Long time MQ recover time will cause the router info sync RPC
  never get successful until it meets the max retry time [3]. So the bad
  thing happens, L3 agent is trying to remove the router now [4]. It
  basically shutdown all the existing L3 traffic of this router.

  [1] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L705
  [2] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L710
  [3] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L666
  [4] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L671

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1871850/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923470] [NEW] test_security_group_recreated_on_port_update fails in CI

2021-04-12 Thread Oleg Bondarev
Public bug reported:

neutron-tempest-plugin-api job start failing,
test_security_group_recreated_on_port_update:

Traceback (most recent call last):
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py",
 line 43, in test_security_group_recreated_on_port_update
self.assertIn('default', names)
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 421, in assertIn
self.assertThat(haystack, Contains(needle), message)
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 502, in assertThat
raise mismatch_error
testtools.matchers._impl.MismatchError: 'default' not in []

Seems the culprit is patch
https://review.opendev.org/c/openstack/neutron/+/777605.

** Affects: neutron
 Importance: Critical
 Assignee: Oleg Bondarev (obondarev)
 Status: New


** Tags: gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923470

Title:
  test_security_group_recreated_on_port_update fails in CI

Status in neutron:
  New

Bug description:
  neutron-tempest-plugin-api job start failing,
  test_security_group_recreated_on_port_update:

  Traceback (most recent call last):
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/neutron_tempest_plugin/api/admin/test_security_groups.py",
 line 43, in test_security_group_recreated_on_port_update
  self.assertIn('default', names)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 421, in assertIn
  self.assertThat(haystack, Contains(needle), message)
File 
"/opt/stack/tempest/.tox/tempest/lib/python3.8/site-packages/testtools/testcase.py",
 line 502, in assertThat
  raise mismatch_error
  testtools.matchers._impl.MismatchError: 'default' not in []

  Seems the culprit is patch
  https://review.opendev.org/c/openstack/neutron/+/777605.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923470/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1923413] Re: [stable/rocky and older] Tempest jobs fail on alembic dependency

2021-04-12 Thread Bernard Cafarelli
OK, there were some failed tests, but not that specific issue. Maybe the
stackviz fixes from
https://review.opendev.org/q/Ifee04f28ecee52e74803f1623aba5cfe5ee5ec90
helped here too?

Anyway, marking as invalid

** Changed in: neutron
   Status: Incomplete => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1923413

Title:
  [stable/rocky and older] Tempest jobs fail on alembic dependency

Status in neutron:
  Invalid

Bug description:
  alembic dropped some python versions support apparently, and we do not have 
an upper cap on it. So tempest jobs fail with POST_FAILURE, like (rocky):
  https://review.opendev.org/c/openstack/neutron/+/783544
  https://zuul.opendev.org/t/openstack/build/f300e1a82627435da71bc133445bc279

  Collecting alembic>=0.8.10 (from subunit2sql>=0.8.0->stackviz==0.0.1.dev320)
Downloading 
http://mirror.gra1.ovh.opendev.org/wheel/ubuntu-16.04-x86_64/alembic/alembic-1.5.5-py2.py3-none-any.whl
 (156kB)

  :stderr: alembic requires Python
  '!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*,!=3.5.*,>=2.7' but the
  running Python is 3.5.2

  And similar failure in queens, for example in
  https://review.opendev.org/c/openstack/neutron/+/776455

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1923413/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1887148] Re: Network loop between physical networks with DVR

2021-04-12 Thread Corey Bryant
** Changed in: cloud-archive/rocky
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1887148

Title:
  Network loop between physical networks with DVR

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released

Bug description:
  (For SRU template, please see bug 1869808, as the SRU info there
  applies to this bug also)

  
  Our CI experienced a network loop due to 
https://review.opendev.org/#/c/733568/ . DVR is enabled and there is more than 
one physical bridge mapping, and the neutron server was not available when the 
ovs agents were started.

  Steps
  =
  # add more physical bridges
  ovs-vsctl add-br br-physnet1
  ip link set dev br-physnet1 up

  ovs-vsctl add-br br-physnet2
  ip link set dev br-physnet2 up

  # set a broadcast going from one bridge
  ip address add 1.1.1.1/31 dev br-physnet1
  arping -b -I br-physnet1 1.1.1.1

  # listen on the other
  tcpdump -eni br-physnet2

  # Update /etc/neutron/plugins/ml2/ml2_conf.ini
  [ml2_type_vlan]
  network_vlan_ranges = public,physnet1,physnet2

  [ovs]
  datapath_type = system
  bridge_mappings = public:br-ex,physnet1:br-physnet1,physnet2:br-physnet2
  tunnel_bridge = br-tun
  local_ip = 127.0.0.1

  [agent]
  tunnel_types = vxlan
  root_helper_daemon = sudo /usr/local/bin/neutron-rootwrap-daemon 
/etc/neutron/rootwrap.conf
  root_helper = sudo /usr/local/bin/neutron-rootwrap /etc/neutron/rootwrap.conf
  enable_distributed_routing = True
  l2_population = True

  # stop server and agent
  systemctl stop devstack@q-svc
  systemctl stop devstack@q-agt

  # clear all flows
  for BR in $(sudo ovs-vsctl list-br); do echo $BR; sudo ovs-ofctl del-flows 
$BR; done

  # start agent
  systemctl start devstack@q-agt

  $ sudo tcpdump -eni br-physnet2
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on br-physnet2, link-type EN10MB (Ethernet), capture size 262144 
bytes
  09:46:56.577183 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, 
length 28
  09:46:57.577568 e2:ab:d4:16:46:4d > ff:ff:ff:ff:ff:ff, ethertype ARP 
(0x0806), length 42: Request who-has 1.1.1.1 (ff:ff:ff:ff:ff:ff) tell 1.1.1.1, 
length 28
  ...

  If there is more than one node running the ovs agent in this state,
  then there will be a network loop and packets can multiple quickly and
  overwhelm the network. We saw ~1 million packets/sec.

  I think because the neutron server is not available, the get_dvr_mac_address 
rpc is blocked and the required drops are not installed:
  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L138
  
https://github.com/openstack/neutron/blob/5999716cfc4a00ac426e016eabbb51247ba0b190/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L230-L234

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1887148/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1871850] Re: [L3] existing router resources are partial deleted unexpectedly when MQ is gone

2021-04-12 Thread Corey Bryant
** Changed in: cloud-archive/rocky
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1871850

Title:
  [L3] existing router resources are partial deleted unexpectedly when
  MQ is gone

Status in Ubuntu Cloud Archive:
  Invalid
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released

Bug description:
  (For SRU template, please see bug 1869808, as the SRU info there
  applies to this bug also)

  ENV: meet this issue on our stable/queens deployment, but master
  branch has the same code logic

  When the L3 agent get a router update notification, it will try to
  retrieve the router info from DB server [1]. But at this time, if the
  message queue is down/unreachable. It will get exceptions related
  message queue. A resync action will be run then [2]. Sometimes, from
  my personal experience, rabbitMQ cluster is not so much easy to
  recover. Long time MQ recover time will cause the router info sync RPC
  never get successful until it meets the max retry time [3]. So the bad
  thing happens, L3 agent is trying to remove the router now [4]. It
  basically shutdown all the existing L3 traffic of this router.

  [1] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L705
  [2] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L710
  [3] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L666
  [4] 
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L671

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1871850/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1869808] Re: reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

2021-04-12 Thread Corey Bryant
** Changed in: cloud-archive/rocky
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1869808

Title:
  reboot neutron-ovs-agent introduces a short interrupt of vlan traffic

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Fix Released
Status in Ubuntu Cloud Archive rocky series:
  Fix Released
Status in Ubuntu Cloud Archive stein series:
  Fix Released
Status in Ubuntu Cloud Archive train series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  Fix Released
Status in Ubuntu Cloud Archive victoria series:
  Fix Released
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Bionic:
  Fix Released
Status in neutron source package in Focal:
  Fix Released
Status in neutron source package in Groovy:
  Fix Released
Status in neutron source package in Hirsute:
  Fix Released

Bug description:
  (SRU template copied from comment 42)

  [Impact]

  - When there is a RabbitMQ or neutron-api outage, the neutron-
  openvswitch-agent undergoes a "resync" process and temporarily blocks
  all VM traffic. This always happens for a short time period (maybe <1
  second) but in some high scale environments this lasts for minutes. If
  RabbitMQ is down again during the re-sync, traffic will also be
  blocked until it can connect which may be for a long period. This also
  affects situations where neutron-openvswitch-agent is intentionally
  restarted while RabbitMQ is down. Bug #1869808 addresses this issue
  and Bug #1887148 is a fix for that fix to prevent network loops during
  DVR startup.

  - In the same situation, the neutron-l3-agent can delete the L3 router
  (Bug #1871850), or may need to refresh the tunnel (Bug #1853613), or
  may need to update flows or reconfigure bridges (Bug #1864822)

  [Test Plan]

  (1) Deploy Openstack Bionic-Queens with DVR and a *VLAN* tenant
  network (VXLAN or FLAT will not reproduce the issue). With a standard
  deployment, simply enabling DHCP on the ext_net subnet will allow VMs
  to be booted directly on the ext_net provider network. "openstack
  subnet set --dhcp ext_net and then deploy the VM directly to ext_net"

  (2) Deploy a VM to the VLAN network

  (3) Start pinging the VM from an external network

  (4) Stop all RabbitMQ servers

  (5) Restart neutron-openvswitch-agent

  (6) Ping traffic should NOT see interruption

  (7) Start all RabbitMQ servers

  (8) Ping traffic should still be fine

  [Where problems could occur]

  These patches are all cherry-picked from the upstream stable branches,
  and have existed upstream including the stable/queens branch for many
  months and in Ubuntu all supported subsequent releases (Stein onwards)
  have also had these patches for many months with the exception of
  Queens.

  There is a chance that not installing these drop flows during startup
  could have traffic go somewhere that's not expected when the network
  is in a partially setup case, this was the case for DVR and in setups
  where more than 1 DVR external network port existed a network loop was
  possibly temporarily created. This was already addressed with the
  included patch for Bug #1869808. Checked and could not locate any
  other merged changes to this drop_port logic that also need to be
  backported.

  [Other Info]

  [original description]

  We are using Openstack Neutron 13.0.6 and it is deployed using
  OpenStack-helm.

  I test ping servers in the same vlan while rebooting neutron-ovs-
  agent. The result shows

  root@mgt01:~# openstack server list
  
+--+-++--+--+---+
  | ID   | Name| Status | Networks  
   | Image| Flavor|
  
+--+-++--+--+---+
  | 22d55077-b1b5-452e-8eba-cbcd2d1514a8 | test-1-1| ACTIVE | 
vlan105=172.31.10.4  | Cirros 0.4.0 64-bit  | 
m1.tiny   |
  | 726bc888-7767-44bc-b68a-7a1f3a6babf1 | test-1-2| ACTIVE | 
vlan105=172.31.10.18 | Cirros 0.4.0 64-bit  | 
m1.tiny   |

  $ ping 172.31.10.4
  PING 172.31.10.4 (172.31.10.4): 56 data bytes
  ..
  64 bytes from 172.31.10.4: seq=59 ttl=64 time=0.465 ms
  64 bytes from 172.31.10.4: seq=60 ttl=64 time=0.510 ms <
  64 bytes from 172.31.10.4: seq=61 ttl=64 time=0.446 ms
  64 bytes from 172.31.10.4: seq=63 ttl=64 time=0.744 ms
  64 bytes from 172.31.10.4: seq=64 ttl=64 time=0.477 ms
  64 bytes from 172.31.10.4: seq=65 ttl=64 time=0.441 ms
  64 bytes from 172.31.10.4: seq=66 ttl=

[Yahoo-eng-team] [Bug 1911891] Re: Post live migration at destination failed:

2021-04-12 Thread Launchpad Bug Tracker
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1911891

Title:
  Post live migration at destination failed:

Status in OpenStack Compute (nova):
  Expired

Bug description:
  nova 2:21.1.0-0ubuntu1

  Recently migrated from OVS to OVN networking.

  Performing a series of live migrations (block migrations) to evacuate
  a broken hypervisor. Most of these have succeeded or have failed for
  understandable reasons. One of the migrations is stuck in this strange
  state, however. The instance UUID in this case is 1eb3c33c-8c2b-
  4d28-9050-70ac740ceb34.

  $ nova migration-list --instance-uuid 1eb3c33c-8c2b-4d28-9050-70ac740ceb34
  
+-+--+-+-++--+-+---+--++++++--+--+
  | Id  | UUID | Source Node | Dest Node   | 
Source Compute | Dest Compute | Dest Host   | Status| Instance UUID 
   | Old Flavor | New Flavor | Created At | 
Updated At | Type   | Project ID   
| User ID  |
  
+-+--+-+-++--+-+---+--++++++--+--+
  | 943 | f0013171-b3bf-4884-a94e-b627c48b1530 | oshv01.maas | oshv04.maas | 
oshv01 | oshv04.maas  | -   | completed | 
1eb3c33c-8c2b-4d28-9050-70ac740ceb34 | 35 | 35 | 
2021-01-14T14:20:34.00 | 2021-01-14T18:00:21.00 | live-migration | 
d25b3d096e5c47a3b1701b3fff2c2823 | 41f5b059d3bf4bebad08ffcabd0899bf |

  This shows that this migration from oshv01 to oshv04 as "completed".
  However, the task status for the instance still shows it in migration;

  $ openstack server show 1eb3c33c-8c2b-4d28-9050-70ac740ceb34 -fvalue 
-cOS-EXT-SRV-ATTR:host
  oshv01

  $ openstack server show 1eb3c33c-8c2b-4d28-9050-70ac740ceb34 -fvalue 
-cOS-EXT-STS:task_state
  migrating

  The instance is apparently running on the intended destination oshv04
  but nova apparently thinks it is still running on the source host
  oshv01.

  I can find some log messages from around the time the migration
  "almost completed". In relation to the migration UUID I have;

  /var/log/syslog.1:Jan 14 17:59:40 oshv04 nova-compute: 2021-01-14 
17:59:40.714 2343417 INFO nova.compute.resource_tracker 
[req-4f8cf1b6-a022-4619-b1dc-2b4e92a35300 - - - - -] [instance: 
1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration 
f0013171-b3bf-4884-a94e-b627c48b1530
  /var/log/syslog.1:Jan 14 18:00:20 oshv01 nova-compute: 2021-01-14 
18:00:20.511 2620747 INFO nova.compute.resource_tracker 
[req-e30fbdb7-7e5f-476b-b4f1-829ca65add60 - - - - -] [instance: 
1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration 
f0013171-b3bf-4884-a94e-b627c48b1530
  /var/log/syslog.1:Jan 14 18:00:21 oshv01 nova-compute: 2021-01-14 
18:00:21.039 2620747 INFO nova.compute.resource_tracker 
[req-61749f8c-26d0-491a-8935-d8700d7a4de6 - - - - -] [instance: 
1eb3c33c-8c2b-4d28-9050-70ac740ceb34] Updating resource usage from migration 
f0013171-b3bf-4884-a94e-b627c48b1530
  /var/log/syslog.1:Jan 14 18:00:21 juju-533538-38-lxd-17 placement-api: 
2021-01-14 18:00:21.193 1189353 INFO placement.requestlog 
[req-3ef495bb-1c3d-4a0c-9a2a-8c9253f3e842 74f0533fa60b4c03ba0e472b4a951b8c 
0ccc5b9ded584deda3683e014cba8b67 - 174a4647a56b4d09bc6ab8952c0105f1 
174a4647a56b4d09bc6ab8952c0105f1] 127.0.0.1 "GET 
/allocations/f0013171-b3bf-4884-a94e-b627c48b1530" status: 200 len: 264 
microversion: 1.28
  /var/log/syslog.1:Jan 14 18:00:21 juju-533538-38-lxd-17 placement-api: 
2021-01-14 18:00:21.254 1189350 INFO placement.requestlog 
[req-3fc0ba8c-d8f8-4c23-8aea-563e9930df99 74f0533fa60b4c03ba0e472b4a951b8c 
0ccc5b9ded584deda3683e014cba8b67 - 174a4647a56b4d09bc6ab8952c0105f1 
174a4647a56b4d09bc6ab8952c0105f1] 127.0.0.1 "PUT 
/allocations/f0013171-b3bf-4884-a94e-b627c48b1530" status: 204 len: 0 
microversion: 1.28
  /var/log/syslog.1:Jan 14 18:00:21 oshv01 nova-compute: 2021-01-14 
18:00:21.256 2620747 INFO nova.scheduler.client.report 
[req-61749f8c-26d0-491a-8935-d8700d7a4de6 - - - - -] Deleted allocation for 
migration f0013171-b3bf-4884-a94e-b627c48b1530

  Also in relation to the instance UUI

[Yahoo-eng-team] [Bug 1923560] [NEW] retrieving security group is slow for server detail

2021-04-12 Thread norman shen
Public bug reported:

Description
===

querying large number of vms through server detail is slow, and a lot of time
is wasted on calling neutron api to obtain security group info.


Expected result
===

obtaining security group info should not consumes half of total query
time

Actual result
=

too slow...

Environment
===
1. ubuntu 18.04 + nova 22

2. libvirt + qemu + kvm

2. ceph

3. vxlan + vlan

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1923560

Title:
  retrieving security group is slow for server detail

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===

  querying large number of vms through server detail is slow, and a lot of time
  is wasted on calling neutron api to obtain security group info.

  
  Expected result
  ===

  obtaining security group info should not consumes half of total query
  time

  Actual result
  =

  too slow...

  Environment
  ===
  1. ubuntu 18.04 + nova 22

  2. libvirt + qemu + kvm

  2. ceph

  3. vxlan + vlan

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1923560/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp