Public bug reported: When sending request to NSX controller, the TCP connection is reset by remote (in our case, by Load-Balancer), which is a transient failure (could be caused by LB ratelimit). The return code is [Errno 104] Connection reset by peer.
In this case, retry should work. However, error 104 is returned immediately after sending the request, and the retry logic in NSX plugin doesn't delay at all, so continuous retries in a row failed. >From the log below we can see all the 3 requests were performed and failed in 1 ms. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... Above log is from Havana. Below is my patch based on master branch to solve the issue: diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py index c55ccfa..46232ea 100644 --- a/neutron/plugins/vmware/api_client/eventlet_request.py +++ b/neutron/plugins/vmware/api_client/eventlet_request.py @@ -17,6 +17,8 @@ import eventlet import httplib import urllib +import socket +import errno from neutron.openstack.common import jsonutils from neutron.openstack.common import log as logging @@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest): self._request_error = None response = req else: + if (isinstance(req, socket.error) and + req.errno == errno.ECONNRESET): + timeout = 0.5 LOG.info(_('[%(rid)d] Error while handling request: %(req)s'), {'rid': self._rid(), 'req': req}) self._request_error = req ** Affects: neutron Importance: Undecided Status: New ** Tags: vmware -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1381094 Title: NSX plugin request retry should delay in case of "[Errno 104] Connection reset by peer" Status in OpenStack Neutron (virtual network service): New Bug description: When sending request to NSX controller, the TCP connection is reset by remote (in our case, by Load-Balancer), which is a transient failure (could be caused by LB ratelimit). The return code is [Errno 104] Connection reset by peer. In this case, retry should work. However, error 104 is returned immediately after sending the request, and the retry logic in NSX plugin doesn't delay at all, so continuous retries in a row failed. From the log below we can see all the 3 requests were performed and failed in 1 ms. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,163 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,164 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,164 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,165 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,165 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,166 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,166 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,166 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Acquired connection https://os-nvp.vip.ppp01.corp.com:443. 14 connection(s) available. 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] [13168] Issuing - request GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,167 43793136 DEBUG [neutron.plugins.nicira.api_client.request] Setting X-Nvp-Wait-For-Config-Generation request header: '377462645' 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Exception issuing request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,167 43793136 WARNING [neutron.plugins.nicira.api_client.request] [13168] Failed request 'GET https://os-nvp.vip.ppp01.corp.com:443//ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus': '[Errno 104] Connection reset by peer' (0.00 seconds) 2014-10-12 11:07:18,168 43793136 WARNING [neutron.plugins.nicira.api_client.client] [13168] Connection returned in bad state, reconnecting to https://os-nvp.vip.ppp01.corp.com:443 2014-10-12 11:07:18,168 43793136 DEBUG [neutron.plugins.nicira.api_client.client] [13168] Released connection https://os-nvp.vip.ppp01.corp.com:443. 15 connection(s) available. 2014-10-12 11:07:18,168 43793136 INFO [neutron.plugins.nicira.api_client.request_eventlet] [13168] Error while handling request: [Errno 104] Connection reset by peer 2014-10-12 11:07:18,168 133917104 ERROR [NVPApiHelper] Request timed out: GET to /ws.v1/lswitch/7d13735a-ca65-43b3-a3fb-ef8b1ca0f552?relations=LogicalSwitchStatus 2014-10-12 11:07:18,169 133917104 ERROR [NeutronPlugin] An exception occured while selecting logical switch for the port ... Above log is from Havana. Below is my patch based on master branch to solve the issue: diff --git a/neutron/plugins/vmware/api_client/eventlet_request.py b/neutron/plugins/vmware/api_client/eventlet_request.py index c55ccfa..46232ea 100644 --- a/neutron/plugins/vmware/api_client/eventlet_request.py +++ b/neutron/plugins/vmware/api_client/eventlet_request.py @@ -17,6 +17,8 @@ import eventlet import httplib import urllib +import socket +import errno from neutron.openstack.common import jsonutils from neutron.openstack.common import log as logging @@ -152,6 +154,9 @@ class EventletApiRequest(request.ApiRequest): self._request_error = None response = req else: + if (isinstance(req, socket.error) and + req.errno == errno.ECONNRESET): + timeout = 0.5 LOG.info(_('[%(rid)d] Error while handling request: %(req)s'), {'rid': self._rid(), 'req': req}) self._request_error = req To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1381094/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp