Public bug reported:

It was visible in couple of jobs already that random tempest scenario jobs are 
failing due to timeout while SSHing to the guest vm.
In the VM's console log there is clearly problem with reaching metadata server:

2024-02-02 17:37:28.665832 | controller | forked to background, child pid 250
2024-02-02 17:37:28.665857 | controller | OK
2024-02-02 17:37:28.665883 | controller | checking 
http://169.254.169.254/2009-04-04/instance-id
2024-02-02 17:37:28.665908 | controller | failed 1/20: up 26.07. request failed
2024-02-02 17:37:28.665933 | controller | failed 2/20: up 28.37. request failed
2024-02-02 17:37:28.665958 | controller | failed 3/20: up 30.67. request failed
2024-02-02 17:37:28.665983 | controller | failed 4/20: up 32.96. request failed
2024-02-02 17:37:28.666008 | controller | failed 5/20: up 82.24. request failed
2024-02-02 17:37:28.666033 | controller | failed 6/20: up 131.56. request failed


When looking at the logs of the neutron-ovn-metadata-agent and then journal log 
it seems for me that those requests are never delivered to the haproxy spawned 
in the ovnmeta-xxx namespace as there is no any log with the log-tag configured 
in haproxy for that network.

Examples of failures like that:
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt

** Affects: neutron
     Importance: Critical
     Assignee: Slawek Kaplonski (slaweq)
         Status: Confirmed


** Tags: gate-failure tempest

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2052787

Title:
  SSH timeouts due to problems with metadata server in ML2/OVN backend

Status in neutron:
  Confirmed

Bug description:
  It was visible in couple of jobs already that random tempest scenario jobs 
are failing due to timeout while SSHing to the guest vm.
  In the VM's console log there is clearly problem with reaching metadata 
server:

  2024-02-02 17:37:28.665832 | controller | forked to background, child pid 250
  2024-02-02 17:37:28.665857 | controller | OK
  2024-02-02 17:37:28.665883 | controller | checking 
http://169.254.169.254/2009-04-04/instance-id
  2024-02-02 17:37:28.665908 | controller | failed 1/20: up 26.07. request 
failed
  2024-02-02 17:37:28.665933 | controller | failed 2/20: up 28.37. request 
failed
  2024-02-02 17:37:28.665958 | controller | failed 3/20: up 30.67. request 
failed
  2024-02-02 17:37:28.665983 | controller | failed 4/20: up 32.96. request 
failed
  2024-02-02 17:37:28.666008 | controller | failed 5/20: up 82.24. request 
failed
  2024-02-02 17:37:28.666033 | controller | failed 6/20: up 131.56. request 
failed

  
  When looking at the logs of the neutron-ovn-metadata-agent and then journal 
log it seems for me that those requests are never delivered to the haproxy 
spawned in the ovnmeta-xxx namespace as there is no any log with the log-tag 
configured in haproxy for that network.

  Examples of failures like that:
  
https://3c8c3cc132d3ca41c1a0-8df332a8f6cbb54ee498032ff97f9d17.ssl.cf1.rackcdn.com/882350/2/check/cinder-plugin-ceph-tempest-mn-aa/df2995a/job-output.txt
  
https://ac3deee033df2f80309a-9b1010a8ed0ed23e4a7e66dfa043a295.ssl.cf5.rackcdn.com/907418/2/check/tempest-slow-py3/6dff044/job-output.txt

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2052787/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to