[Yahoo-eng-team] [Bug 1745838] [NEW] legacy-tempest-dsvm-cells constantly failing on stable pike and ocata due to libvirt connection reset

Matt Riedemann Sun, 28 Jan 2018 06:51:47 -0800

Public bug reported:

The cellsv1 job has been failing pretty constantly within the last week
or two due to a libvirt connection reset:


http://logs.openstack.org/36/536936/1/check/legacy-tempest-dsvm-
cells/a9ff792/logs/libvirt/libvirtd.txt.gz#_2018-01-28_01_25_23_762

2018-01-28 01:25:23.762+0000: 3896: error :
virKeepAliveTimerInternal:143 : internal error: connection closed due to
keepalive timeout

http://logs.openstack.org/36/536936/1/check/legacy-tempest-dsvm-
cells/a9ff792/logs/screen-n-cpu.txt.gz?level=TRACE#_2018-01-28_01_25_23_766

2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager 
[req-392410f9-c834-4bdc-a439-ac20476fe212 - -] Error updating resources for 
node ubuntu-xenial-inap-mtl01-0002208439.
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager Traceback (most recent 
call last):
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/compute/manager.py", line 6590, in 
update_available_resource_for_node
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     
rt.update_available_resource(context, nodename)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 535, in 
update_available_resource
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     resources = 
self.driver.get_available_resource(nodename)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5675, in 
get_available_resource
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     data["vcpus_used"] 
= self._get_vcpu_used()
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5316, in _get_vcpu_used
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     for guest in 
self._host.list_guests():
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/host.py", line 573, in list_guests
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     
only_running=only_running, only_guests=only_guests)]
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/host.py", line 593, in 
list_instance_domains
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     alldoms = 
self.get_connection().listAllDomains(flags)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     result = 
proxy_call(self._autowrap, f, *args, **kwargs)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in 
proxy_call
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     rv = execute(f, 
*args, **kwargs)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     six.reraise(c, e, 
tb)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     rv = meth(*args, 
**kwargs)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/libvirt.py", line 4953, in 
listAllDomains
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     raise 
libvirtError("virConnectListAllDomains() failed", conn=self)
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager libvirtError: Cannot 
recv data: Connection reset by peer
2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager 

It seems to be totally random. I'm not sure what is different about this
job running on stable vs master, but it doesn't appear to be an issue on
master:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22libvirtError%3A%20Cannot%20recv%20data%3A%20Connection%20reset%20by%20peer%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22%20AND%20build_name%3A%5C
%22legacy-tempest-dsvm-cells%5C%22&from=7d

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: cells libvirt testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1745838

Title:
  legacy-tempest-dsvm-cells constantly failing on stable pike and ocata
  due to libvirt connection reset

Status in OpenStack Compute (nova):
  New

Bug description:
  The cellsv1 job has been failing pretty constantly within the last
  week or two due to a libvirt connection reset:

  http://logs.openstack.org/36/536936/1/check/legacy-tempest-dsvm-
  cells/a9ff792/logs/libvirt/libvirtd.txt.gz#_2018-01-28_01_25_23_762

  2018-01-28 01:25:23.762+0000: 3896: error :
  virKeepAliveTimerInternal:143 : internal error: connection closed due
  to keepalive timeout

  http://logs.openstack.org/36/536936/1/check/legacy-tempest-dsvm-
  cells/a9ff792/logs/screen-n-cpu.txt.gz?level=TRACE#_2018-01-28_01_25_23_766

  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager 
[req-392410f9-c834-4bdc-a439-ac20476fe212 - -] Error updating resources for 
node ubuntu-xenial-inap-mtl01-0002208439.
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager Traceback (most 
recent call last):
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/compute/manager.py", line 6590, in 
update_available_resource_for_node
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     
rt.update_available_resource(context, nodename)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 535, in 
update_available_resource
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     resources = 
self.driver.get_available_resource(nodename)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5675, in 
get_available_resource
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     
data["vcpus_used"] = self._get_vcpu_used()
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 5316, in _get_vcpu_used
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     for guest in 
self._host.list_guests():
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/host.py", line 573, in list_guests
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     
only_running=only_running, only_guests=only_guests)]
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/opt/stack/new/nova/nova/virt/libvirt/host.py", line 593, in 
list_instance_domains
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     alldoms = 
self.get_connection().listAllDomains(flags)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     result = 
proxy_call(self._autowrap, f, *args, **kwargs)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in 
proxy_call
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     rv = execute(f, 
*args, **kwargs)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     six.reraise(c, 
e, tb)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     rv = meth(*args, 
**kwargs)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager   File 
"/usr/local/lib/python2.7/dist-packages/libvirt.py", line 4953, in 
listAllDomains
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager     raise 
libvirtError("virConnectListAllDomains() failed", conn=self)
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager libvirtError: Cannot 
recv data: Connection reset by peer
  2018-01-28 01:25:23.766 16360 ERROR nova.compute.manager 

  It seems to be totally random. I'm not sure what is different about
  this job running on stable vs master, but it doesn't appear to be an
  issue on master:

  
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22libvirtError%3A%20Cannot%20recv%20data%3A%20Connection%20reset%20by%20peer%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22%20AND%20build_name%3A%5C
  %22legacy-tempest-dsvm-cells%5C%22&from=7d

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1745838/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 1745838] [NEW] legacy-tempest-dsvm-cells constantly failing on stable pike and ocata due to libvirt connection reset

Reply via email to