Public bug reported: Running tempest test tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group fails sometimes when trying to authenticate via public key to the access point instance [0].
After debugging, I managed to connect to the instance via virsh console and check that the instance had not the SSH key installed: ____ ____ ____ / __/ __ ____ ____ / __ \/ __/ / /__ / // __// __// /_/ /\ \ \___//_//_/ /_/ \____/___/ http://cirros-cloud.net login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root. tempest-server-tempest-testsecuritygroupsbasicops-679466304-acc login: cirros Password: $ cat .ssh/authorized_keys $ Checking up in the console log, I can see the following: cirros-ds 'net' up at 6.37 checking http://169.254.169.254/2009-04-04/instance-id successful after 1/20 tries: up 6.54. iid=i-0000006e failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key warning: no ec2 metadata for public-keys failed to get http://169.254.169.254/2009-04-04/user-data warning: no ec2 metadata for user-data found datasource (ec2, net) So it looks like it is able to fetch the instance-id but not getting the public-key. When I try to do it manually, it retrieves it successfully: $ curl 169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ Generated-by-Nova Also, running the following command doesn't work: $ sudo cirros-apply net -v $ cat .ssh/authorized_keys $ If, instead I run the following command and reboot, it will get properly installed: $ sudo cirros-per boot cirros-apply-net cirros-apply net && reboot ... $ cat .ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ Generated-by-Nova After checking the ovn metadata proxy log and also nova-metadata-api logs, I can see the requests and the 200 OK responses: 2018-04-19 22:31:37.383 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 4.0820560 2018-04-19 22:31:38.800 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 183 time: 1.1210902 2018-04-19 22:31:49.148 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 0.0230849 2018-04-19 22:31:49.387 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/ami-launch-index HTTP/1.1" status: 200 len: 136 time: 0.0262089 2018-04-19 22:31:50.225 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-type HTTP/1.1" status: 200 len: 142 time: 0.7244408 2018-04-19 22:31:50.482 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/local-ipv4 HTTP/1.1" status: 200 len: 146 time: 0.0143349 2018-04-19 22:31:50.612 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-ipv4 HTTP/1.1" status: 200 len: 144 time: 0.0130348 2018-04-19 22:31:50.793 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/hostname HTTP/1.1" status: 200 len: 199 time: 0.0100901 2018-04-19 22:31:51.039 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/local-hostname HTTP/1.1" status: 200 len: 199 time: 0.0094490 2018-04-19 22:31:51.197 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/user-data HTTP/1.1" status: 404 len: 297 time: 0.0226381 2018-04-19 22:31:51.475 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping HTTP/1.1" status: 200 len: 143 time: 0.0118120 2018-04-19 22:31:51.579 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping/ami HTTP/1.1" status: 200 len: 138 time: 0.0084291 2018-04-19 22:31:51.672 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 535 time: 12.7038500 2018-04-19 22:31:51.735 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping/root HTTP/1.1" status: 200 len: 143 time: 0.0147779 2018-04-19 22:31:51.930 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-hostname HTTP/1.1" status: 200 len: 199 time: 0.0083301 nova-api-metadata.log: 2018-04-19 22:31:38.852 5 INFO nova.metadata.wsgi.server [req-454724c8-a0cb-4c16-8c85-dd5940c751e2 - - - - -] 10.100.0.7,172.16.2.18 "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 164 time: 1.1147470 2018-04-19 22:31:51.474 5 INFO nova.metadata.wsgi.server [req-f1ef4033-3378-4074-baaf-6dfae80b81cf - - - - -] 10.100.0.7,172.16.2.18 "GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 516 time: 12.6919320 This is a TripleO (tripleo-ci-master) setup running 3 controllers and 1 compute. Please, note that the test doesn't always fail... I ran it 10 times in a row and it failed with this error in 3 of them. Even though it is an OVN environment, I suspect that it might have to do either with Nova or with the cirros image itself. +--------------------------------------+----------------------------------+--------+ | ID | Name | Status | +--------------------------------------+----------------------------------+--------+ | 671c2d8e-5909-4f61-9cd3-d81dbdb892d4 | cirros-0.3.5-x86_64-disk.img | active | | e4d8721c-743a-4cbb-b026-b0ec1b5c84bf | cirros-0.3.5-x86_64-disk.img_alt | active | The test would use the image with id 671c2d8e-5909-4f61-9cd3-d81dbdb892d4. I can't see any error traces in ovn-metadata-agent, ovn-controller nor neutron-server. Thanks, Daniel [0] https://github.com/openstack/tempest/blob/c961a656ccdc0f1242b4ff3237a16d4a7cdf4e07/tempest/scenario/test_security_groups_basic_ops.py#L609 ** Affects: networking-ovn Importance: Undecided Status: New ** Affects: nova Importance: Undecided Status: New ** Also affects: nova Importance: Undecided Status: New ** Summary changed: - tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group fails due to metadata error + tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group fails due to instances failing to retrieve public key via metadata service -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1765545 Title: tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group fails due to instances failing to retrieve public key via metadata service Status in networking-ovn: New Status in OpenStack Compute (nova): New Bug description: Running tempest test tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_port_security_disable_security_group fails sometimes when trying to authenticate via public key to the access point instance [0]. After debugging, I managed to connect to the instance via virsh console and check that the instance had not the SSH key installed: ____ ____ ____ / __/ __ ____ ____ / __ \/ __/ / /__ / // __// __// /_/ /\ \ \___//_//_/ /_/ \____/___/ http://cirros-cloud.net login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root. tempest-server-tempest-testsecuritygroupsbasicops-679466304-acc login: cirros Password: $ cat .ssh/authorized_keys $ Checking up in the console log, I can see the following: cirros-ds 'net' up at 6.37 checking http://169.254.169.254/2009-04-04/instance-id successful after 1/20 tries: up 6.54. iid=i-0000006e failed to get http://169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key warning: no ec2 metadata for public-keys failed to get http://169.254.169.254/2009-04-04/user-data warning: no ec2 metadata for user-data found datasource (ec2, net) So it looks like it is able to fetch the instance-id but not getting the public-key. When I try to do it manually, it retrieves it successfully: $ curl 169.254.169.254/2009-04-04/meta-data/public-keys/0/openssh-key ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ Generated-by-Nova Also, running the following command doesn't work: $ sudo cirros-apply net -v $ cat .ssh/authorized_keys $ If, instead I run the following command and reboot, it will get properly installed: $ sudo cirros-per boot cirros-apply-net cirros-apply net && reboot ... $ cat .ssh/authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDICvVroPErVzHbx+a1lhI4RU33f0Nb4DT2FiNbKhaI1ZBl4/zRbqFY5a4lMipV810dCzJSViGJVw0VzNgDOf/zCt6Joosem5qC8hKwRgX5tcEXQ0UnVCiXddP1bydbRVt4BofTCTUPb4SZ3Z4zl0+L4WWB1CY58KYl19Lr7H4zqMXPqa6Mw+k1dpo0YBk3ZZR4pIxGtN916w6x6vtSIy2oDg4zaxUuewGaQNp9wENEuP3+TOseTymBxpbdys2RpUKXM2vhWWDDbrzG0+juOFxn111SgFYom05sjONDM310xHX5KBm6QuJO6ObCkSIKre9wvU60i19YW7pxBtyfztIJ Generated-by-Nova After checking the ovn metadata proxy log and also nova-metadata-api logs, I can see the requests and the 200 OK responses: 2018-04-19 22:31:37.383 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 4.0820560 2018-04-19 22:31:38.800 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 183 time: 1.1210902 2018-04-19 22:31:49.148 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 200 len: 146 time: 0.0230849 2018-04-19 22:31:49.387 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/ami-launch-index HTTP/1.1" status: 200 len: 136 time: 0.0262089 2018-04-19 22:31:50.225 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/instance-type HTTP/1.1" status: 200 len: 142 time: 0.7244408 2018-04-19 22:31:50.482 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/local-ipv4 HTTP/1.1" status: 200 len: 146 time: 0.0143349 2018-04-19 22:31:50.612 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-ipv4 HTTP/1.1" status: 200 len: 144 time: 0.0130348 2018-04-19 22:31:50.793 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/hostname HTTP/1.1" status: 200 len: 199 time: 0.0100901 2018-04-19 22:31:51.039 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/local-hostname HTTP/1.1" status: 200 len: 199 time: 0.0094490 2018-04-19 22:31:51.197 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/user-data HTTP/1.1" status: 404 len: 297 time: 0.0226381 2018-04-19 22:31:51.475 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping HTTP/1.1" status: 200 len: 143 time: 0.0118120 2018-04-19 22:31:51.579 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping/ami HTTP/1.1" status: 200 len: 138 time: 0.0084291 2018-04-19 22:31:51.672 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 535 time: 12.7038500 2018-04-19 22:31:51.735 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/block-device-mapping/root HTTP/1.1" status: 200 len: 143 time: 0.0147779 2018-04-19 22:31:51.930 24 INFO eventlet.wsgi.server [-] 10.100.0.7,<local> "GET /2009-04-04/meta-data/public-hostname HTTP/1.1" status: 200 len: 199 time: 0.0083301 nova-api-metadata.log: 2018-04-19 22:31:38.852 5 INFO nova.metadata.wsgi.server [req-454724c8-a0cb-4c16-8c85-dd5940c751e2 - - - - -] 10.100.0.7,172.16.2.18 "GET /2009-04-04/meta-data/public-keys HTTP/1.1" status: 200 len: 164 time: 1.1147470 2018-04-19 22:31:51.474 5 INFO nova.metadata.wsgi.server [req-f1ef4033-3378-4074-baaf-6dfae80b81cf - - - - -] 10.100.0.7,172.16.2.18 "GET /2009-04-04/meta-data/public-keys/0/openssh-key HTTP/1.1" status: 200 len: 516 time: 12.6919320 This is a TripleO (tripleo-ci-master) setup running 3 controllers and 1 compute. Please, note that the test doesn't always fail... I ran it 10 times in a row and it failed with this error in 3 of them. Even though it is an OVN environment, I suspect that it might have to do either with Nova or with the cirros image itself. +--------------------------------------+----------------------------------+--------+ | ID | Name | Status | +--------------------------------------+----------------------------------+--------+ | 671c2d8e-5909-4f61-9cd3-d81dbdb892d4 | cirros-0.3.5-x86_64-disk.img | active | | e4d8721c-743a-4cbb-b026-b0ec1b5c84bf | cirros-0.3.5-x86_64-disk.img_alt | active | The test would use the image with id 671c2d8e-5909-4f61-9cd3-d81dbdb892d4. I can't see any error traces in ovn-metadata-agent, ovn-controller nor neutron-server. Thanks, Daniel [0] https://github.com/openstack/tempest/blob/c961a656ccdc0f1242b4ff3237a16d4a7cdf4e07/tempest/scenario/test_security_groups_basic_ops.py#L609 To manage notifications about this bug go to: https://bugs.launchpad.net/networking-ovn/+bug/1765545/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp