Public bug reported:

At the moment, if the cloud sustain a large number of VIF plugging
timeouts, it will lead into a ton of leaked green threads which can
cause the nova-compute process to stop reporting/responding.

The tracebacks that would occur would be:

===
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7230, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     guest = self._create_guest(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/usr/lib/python3.8/contextlib.py", line 120, in __exit__
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     next(self.gen)
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
479, in wait_for_instance_event
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     actual_event = event.wait()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/event.py", line 125, 
in wait
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     result = hub.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 
313, in switch
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     return self.greenlet.switch()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] eventlet.timeout.Timeout: 300 seconds
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] During handling of the above exception, 
another exception occurred:
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
2409, in _build_and_run_instance
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     self.driver.spawn(context, instance, 
image_meta,
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 4193, in spawn
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     self._create_guest_with_network(
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7256, in _create_guest_with_network
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     raise 
exception.VirtualInterfaceCreateException()
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation 
failed
2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
===

Eventually, with enough of these, the nova-compute process would hang.
The output of GMR shows nearly 6094 threads, with around 3038 of them
having the traceback below:

===
------                        Green Thread                        ------

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:355 in run
    `self.fire_timers(self.clock())`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:476 in 
fire_timers
    `timer()`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/timer.py:59 in 
__call__
    `cb(*args, **kw)`

/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/__init__.py:151 in 
_timeout
    `current.throw(exc)`
===

In addition, 3039 of those threads would output the following:

===
------                        Green Thread                        ------

No Traceback!
===

In total, that puts 6077 green threads in total with that weird state.
We've had a discussion about this here, and it seems that it may be
related to the use of `spawn_n`.

https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-
nova.2022-05-05.log.html#t2022-05-05T16:20:37

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1971760

Title:
  nova-compute leaks green threads

Status in OpenStack Compute (nova):
  New

Bug description:
  At the moment, if the cloud sustain a large number of VIF plugging
  timeouts, it will lead into a ton of leaked green threads which can
  cause the nova-compute process to stop reporting/responding.

  The tracebacks that would occur would be:

  ===
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7230, in _create_guest_with_network
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     guest = self._create_guest(
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/usr/lib/python3.8/contextlib.py", line 120, in __exit__
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     next(self.gen)
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
479, in wait_for_instance_event
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     actual_event = event.wait()
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/event.py", line 125, 
in wait
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     result = hub.switch()
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 
313, in switch
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     return self.greenlet.switch()
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] eventlet.timeout.Timeout: 300 seconds
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] During handling of the above exception, 
another exception occurred:
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] Traceback (most recent call last):
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
2409, in _build_and_run_instance
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     self.driver.spawn(context, instance, 
image_meta,
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 4193, in spawn
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     self._create_guest_with_network(
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7256, in _create_guest_with_network
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b]     raise 
exception.VirtualInterfaceCreateException()
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
nova.exception.VirtualInterfaceCreateException: Virtual Interface creation 
failed
  2022-04-17 00:21:28.651 877893 ERROR nova.compute.manager [instance: 
0c0d2422-781c-4bd2-b6bd-e5e3c94b602b] 
  ===

  Eventually, with enough of these, the nova-compute process would hang.
  The output of GMR shows nearly 6094 threads, with around 3038 of them
  having the traceback below:

  ===
  ------                        Green Thread                        ------

  /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:355 in run
      `self.fire_timers(self.clock())`

  /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/hub.py:476 in 
fire_timers
      `timer()`

  /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/timer.py:59 in 
__call__
      `cb(*args, **kw)`

  /var/lib/openstack/lib/python3.8/site-packages/eventlet/hubs/__init__.py:151 
in _timeout
      `current.throw(exc)`
  ===

  In addition, 3039 of those threads would output the following:

  ===
  ------                        Green Thread                        ------

  No Traceback!
  ===

  In total, that puts 6077 green threads in total with that weird state.
  We've had a discussion about this here, and it seems that it may be
  related to the use of `spawn_n`.

  https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-
  nova.2022-05-05.log.html#t2022-05-05T16:20:37

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1971760/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to