Public bug reported:

Running tempest test
tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group
fails sometimes when trying to authenticate via public key to the access
point instance [0].

After debugging, I managed to connect to the instance via virsh console
and check that the instance had not the SSH key installed:

  ____               ____  ____
 / __/ __ ____ ____ / __ \/ __/
/ /__ / // __// __// /_/ /\ \
\___//_//_/  /_/   \____/___/
   http://cirros-cloud.net
login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root.
tempest-server-tempest-testsecuritygroupsbasicops-679466304-acc login: cirros
Password:
$ cat .ssh/authorized_keys
$


Checking up in the console log, I can see the following:
cirros-ds 'net' up at 6.37
checking http://169.254.169.254/2009-04-04/instance-id
successful after 1/20 tries: up 6.54. iid=i-0000006e
failed to get 
http://169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key
warning: no ec2 metadata for public-keys
failed to get http://169.254.169.254/2009-04-04/user-data
warning: no ec2 metadata for user-data
found datasource (ec2, net)


So it looks like it is able to fetch the instance-id but not getting the 
public-key.
When I try to do it manually, it retrieves it successfully:

$ curl 169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key
ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ
 Generated-by-Nova

Also, running the following command doesn't work:

$ sudo cirros-apply net -v
$ cat .ssh/authorized_keys
$

If, instead I run the following command and reboot, it will get properly 
installed:
$ sudo cirros-per boot cirros-apply-net cirros-apply net && reboot
...
$ cat .ssh/authorized_keys
ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ
 Generated-by-Nova


After checking the ovn metadata proxy log and also nova-metadata-api logs, I 
can see the requests and the 200 OK responses:

2018-04-19 22:31:37.383 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 
4.0820560
2018-04-19 22:31:38.800 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200  len: 183 time: 
1.1210902
2018-04-19 22:31:49.148 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 
0.0230849
2018-04-19 22:31:49.387 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/ami-launch-index HTTP/1.1" status: 200  len: 136 
time: 0.0262089
2018-04-19 22:31:50.225 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-type HTTP/1.1" status: 200  len: 142 time: 
0.7244408
2018-04-19 22:31:50.482 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/local-ipv4 HTTP/1.1" status: 200  len: 146 time: 
0.0143349
2018-04-19 22:31:50.612 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-ipv4 HTTP/1.1" status: 200  len: 144 time: 
0.0130348
2018-04-19 22:31:50.793 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/hostname HTTP/1.1" status: 200  len: 199 time: 
0.0100901
2018-04-19 22:31:51.039 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/local-hostname HTTP/1.1" status: 200  len: 199 time: 
0.0094490
2018-04-19 22:31:51.197 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/user-data HTTP/1.1" status: 404  len: 297 time: 0.0226381
2018-04-19 22:31:51.475 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping HTTP/1.1" status: 200  len: 143 
time: 0.0118120
2018-04-19 22:31:51.579 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping/ami HTTP/1.1" status: 200  len: 
138 time: 0.0084291
2018-04-19 22:31:51.672 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200  
len: 535 time: 12.7038500
2018-04-19 22:31:51.735 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping/root HTTP/1.1" status: 200  
len: 143 time: 0.0147779
2018-04-19 22:31:51.930 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-hostname HTTP/1.1" status: 200  len: 199 
time: 0.0083301

nova-api-metadata.log:


2018-04-19 22:31:38.852 5 INFO nova.metadata.wsgi.server 
[req-454724c8-a0cb-4c16-8c85-dd5940c751e2 - - - - -] 10.100.0.7,172.16.2.18 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 164 time: 
1.1147470
2018-04-19 22:31:51.474 5 INFO nova.metadata.wsgi.server 
[req-f1ef4033-3378-4074-baaf-6dfae80b81cf - - - - -] 10.100.0.7,172.16.2.18 
"GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 
516 time: 12.6919320


This is a TripleO (tripleo-ci-master) setup running 3 controllers and 1 compute.
Please, note that the test doesn't always fail... I ran it 10 times in a row 
and it failed with this error in 3 of them.

Even though it is an OVN environment, I suspect that it might have to do
either with Nova or with the cirros image itself.

+--------------------------------------+----------------------------------+--------+
| ID                                   | Name                             | 
Status |
+--------------------------------------+----------------------------------+--------+
| 671c2d8e-5909-4f61-9cd3-d81dbdb892d4 | cirros-0.3.5-x86_64-disk.img     | 
active |
| e4d8721c-743a-4cbb-b026-b0ec1b5c84bf | cirros-0.3.5-x86_64-disk.img_alt | 
active |

The test would use the image with id
671c2d8e-5909-4f61-9cd3-d81dbdb892d4.

I can't see any error traces in ovn-metadata-agent, ovn-controller nor
neutron-server.

Thanks,
Daniel 


[0] 
https://github.com/openstack/tempest/blob/c961a656ccdc0f1242b4ff3237a16d4a7cdf4e07/tempest/scenario/test_security_groups_basic_ops.py#L609

** Affects: networking-ovn
     Importance: Undecided
         Status: New

** Affects: nova
     Importance: Undecided
         Status: New

** Also affects: nova
   Importance: Undecided
       Status: New

** Summary changed:

- 
tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group
 fails due to metadata error
+ 
tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group
 fails due to instances failing to retrieve public key via metadata service

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1765545

Title:
  
tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group
  fails due to instances failing to retrieve public key via metadata
  service

Status in networking-ovn:
  New
Status in OpenStack Compute (nova):
  New

Bug description:
  Running tempest test
  
tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group
  fails sometimes when trying to authenticate via public key to the
  access point instance [0].

  After debugging, I managed to connect to the instance via virsh
  console and check that the instance had not the SSH key installed:

    ____               ____  ____
   / __/ __ ____ ____ / __ \/ __/
  / /__ / // __// __// /_/ /\ \
  \___//_//_/  /_/   \____/___/
     http://cirros-cloud.net
  login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root.
  tempest-server-tempest-testsecuritygroupsbasicops-679466304-acc login: cirros
  Password:
  $ cat .ssh/authorized_keys
  $

  
  Checking up in the console log, I can see the following:
  cirros-ds 'net' up at 6.37
  checking http://169.254.169.254/2009-04-04/instance-id
  successful after 1/20 tries: up 6.54. iid=i-0000006e
  failed to get 
http://169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key
  warning: no ec2 metadata for public-keys
  failed to get http://169.254.169.254/2009-04-04/user-data
  warning: no ec2 metadata for user-data
  found datasource (ec2, net)

  
  So it looks like it is able to fetch the instance-id but not getting the 
public-key.
  When I try to do it manually, it retrieves it successfully:

  $ curl 169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key
  ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ
 Generated-by-Nova

  Also, running the following command doesn't work:

  $ sudo cirros-apply net -v
  $ cat .ssh/authorized_keys
  $

  If, instead I run the following command and reboot, it will get properly 
installed:
  $ sudo cirros-per boot cirros-apply-net cirros-apply net && reboot
  ...
  $ cat .ssh/authorized_keys
  ssh-rsa 
AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ
 Generated-by-Nova

  
  After checking the ovn metadata proxy log and also nova-metadata-api logs, I 
can see the requests and the 200 OK responses:

  2018-04-19 22:31:37.383 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 
4.0820560
  2018-04-19 22:31:38.800 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200  len: 183 time: 
1.1210902
  2018-04-19 22:31:49.148 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200  len: 146 time: 
0.0230849
  2018-04-19 22:31:49.387 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/ami-launch-index HTTP/1.1" status: 200  len: 136 
time: 0.0262089
  2018-04-19 22:31:50.225 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/instance-type HTTP/1.1" status: 200  len: 142 time: 
0.7244408
  2018-04-19 22:31:50.482 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/local-ipv4 HTTP/1.1" status: 200  len: 146 time: 
0.0143349
  2018-04-19 22:31:50.612 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-ipv4 HTTP/1.1" status: 200  len: 144 time: 
0.0130348
  2018-04-19 22:31:50.793 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/hostname HTTP/1.1" status: 200  len: 199 time: 
0.0100901
  2018-04-19 22:31:51.039 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/local-hostname HTTP/1.1" status: 200  len: 199 time: 
0.0094490
  2018-04-19 22:31:51.197 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/user-data HTTP/1.1" status: 404  len: 297 time: 0.0226381
  2018-04-19 22:31:51.475 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping HTTP/1.1" status: 200  len: 143 
time: 0.0118120
  2018-04-19 22:31:51.579 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping/ami HTTP/1.1" status: 200  len: 
138 time: 0.0084291
  2018-04-19 22:31:51.672 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200  
len: 535 time: 12.7038500
  2018-04-19 22:31:51.735 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/block-device-mapping/root HTTP/1.1" status: 200  
len: 143 time: 0.0147779
  2018-04-19 22:31:51.930 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> 
"GET /2009-04-04/meta-data/public-hostname HTTP/1.1" status: 200  len: 199 
time: 0.0083301

  nova-api-metadata.log:

  
  2018-04-19 22:31:38.852 5 INFO nova.metadata.wsgi.server 
[req-454724c8-a0cb-4c16-8c85-dd5940c751e2 - - - - -] 10.100.0.7,172.16.2.18 
"GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 164 time: 
1.1147470
  2018-04-19 22:31:51.474 5 INFO nova.metadata.wsgi.server 
[req-f1ef4033-3378-4074-baaf-6dfae80b81cf - - - - -] 10.100.0.7,172.16.2.18 
"GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 
516 time: 12.6919320

  
  This is a TripleO (tripleo-ci-master) setup running 3 controllers and 1 
compute.
  Please, note that the test doesn't always fail... I ran it 10 times in a row 
and it failed with this error in 3 of them.

  Even though it is an OVN environment, I suspect that it might have to
  do either with Nova or with the cirros image itself.

  
+--------------------------------------+----------------------------------+--------+
  | ID                                   | Name                             | 
Status |
  
+--------------------------------------+----------------------------------+--------+
  | 671c2d8e-5909-4f61-9cd3-d81dbdb892d4 | cirros-0.3.5-x86_64-disk.img     | 
active |
  | e4d8721c-743a-4cbb-b026-b0ec1b5c84bf | cirros-0.3.5-x86_64-disk.img_alt | 
active |

  The test would use the image with id
  671c2d8e-5909-4f61-9cd3-d81dbdb892d4.

  I can't see any error traces in ovn-metadata-agent, ovn-controller nor
  neutron-server.

  Thanks,
  Daniel 

  
  [0] 
https://github.com/openstack/tempest/blob/c961a656ccdc0f1242b4ff3237a16d4a7cdf4e07/tempest/scenario/test_security_groups_basic_ops.py#L609

To manage notifications about this bug go to:
https://bugs.launchpad.net/networking-ovn/+bug/1765545/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to