Public bug reported: Sometimes on instance initialization, the metadata step fails.
On metadata-agent.log there are lots of 404: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070 On nova-api.log we get 404 too: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances. The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists. This problem only occurs with cache enabled on neuton metadata-agent. Version: Queens How to reproduce: --- #!/bin/bash computenodelist=( 'computenode00.test.openstack.net' 'computenode01.test.openstack.net' 'computenode02.test.openstack.net' 'computenode03.test.openstack.net' ) validate_metadata(){ cat << EOF > /tmp/metadata #!/bin/sh -x if curl 192.168.10.2 then echo "ControllerNode00 - OK" else echo "ControllerNode00 - ERROR" fi EOF #SUBNAME=$(date +%s) openstack server delete "${node}" 2>/dev/null source /root/admin-openrc openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null i=0 until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00" do i=$((i+1)) sleep 1 done openstack console log show "${node}" | grep -q "ControllerNode00 - OK" if [ $? == 0 ]; then echo "Metadata Servers OK: ${node}" else echo "Metadata Servers ERROR: ${node}" fi rm /tmp/metadata } for node in ${computenodelist[@]} do export node validate_metadata done echo -e "\n" --- ** Affects: neutron Importance: Undecided Status: New ** Description changed: Sometimes on instance initialization, the metadata step fails. On metadata-agent.log there are lots of 404: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070 On nova-api.log we get 404 too: - "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 + "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 - - After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances. - The problem is related to cache implementation on method "_get_instance_and_tenant_id()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists. + After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances. + The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists. This problem only occurs with cache enabled on neuton metadata-agent. Version: Queens How to reproduce: --- #!/bin/bash - + computenodelist=( - 'computenode00.test.openstack.net' - 'computenode01.test.openstack.net' - 'computenode02.test.openstack.net' - 'computenode03.test.openstack.net' + 'computenode00.test.openstack.net' + 'computenode01.test.openstack.net' + 'computenode02.test.openstack.net' + 'computenode03.test.openstack.net' ) validate_metadata(){ cat << EOF > /tmp/metadata #!/bin/sh -x if curl 192.168.10.2 then - echo "ControllerNode00 - OK" + echo "ControllerNode00 - OK" else - echo "ControllerNode00 - ERROR" + echo "ControllerNode00 - ERROR" fi EOF - #SUBNAME=$(date +%s) - openstack server delete "${node}" 2>/dev/null - source /root/admin-openrc - openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null + #SUBNAME=$(date +%s) + openstack server delete "${node}" 2>/dev/null + source /root/admin-openrc + openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null + i=0 + until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00" + do + i=$((i+1)) + sleep 1 + done + openstack console log show "${node}" | grep -q "ControllerNode00 - OK" + if [ $? == 0 ]; then + echo "Metadata Servers OK: ${node}" + else + echo "Metadata Servers ERROR: ${node}" + fi - i=0 - until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00" - do - i=$((i+1)) - sleep 1 - done - openstack console log show "${node}" | grep -q "ControllerNode00 - OK" - if [ $? == 0 ]; then - echo "Metadata Servers OK: ${node}" - else - echo "Metadata Servers ERROR: ${node}" - fi - - rm /tmp/metadata + rm /tmp/metadata } - for node in ${computenodelist[@]} do - export node - validate_metadata + export node + validate_metadata done echo -e "\n" --- -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1836253 Title: Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id() Status in neutron: New Bug description: Sometimes on instance initialization, the metadata step fails. On metadata-agent.log there are lots of 404: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 len: 297 time: 0.0771070 On nova-api.log we get 404 too: "GET /2009-04-04/meta-data/instance-id HTTP/1.1" status: 404 After some debuging we get that problem occurs when new instances is getting same IP used by deleted instances. The problem is related to cache implementation on method "_get_ports_for_remote_address()" on "/neutron/agent/metadata/agent.py" that returns an port from deleted instance (with the same IP) which returns wrong InstanceID that will be sent to nova-api which will fail because this instanceId not exists. This problem only occurs with cache enabled on neuton metadata-agent. Version: Queens How to reproduce: --- #!/bin/bash computenodelist=( 'computenode00.test.openstack.net' 'computenode01.test.openstack.net' 'computenode02.test.openstack.net' 'computenode03.test.openstack.net' ) validate_metadata(){ cat << EOF > /tmp/metadata #!/bin/sh -x if curl 192.168.10.2 then echo "ControllerNode00 - OK" else echo "ControllerNode00 - ERROR" fi EOF #SUBNAME=$(date +%s) openstack server delete "${node}" 2>/dev/null source /root/admin-openrc openstack server create --image cirros --nic net-id=internal --flavor Cirros --security-group default --user-data /tmp/metadata --availability-zone nova:${node} --wait "${node}" &> /dev/null i=0 until [ $i -gt 3 ] || openstack console log show "${node}" | grep -q "ControllerNode00" do i=$((i+1)) sleep 1 done openstack console log show "${node}" | grep -q "ControllerNode00 - OK" if [ $? == 0 ]; then echo "Metadata Servers OK: ${node}" else echo "Metadata Servers ERROR: ${node}" fi rm /tmp/metadata } for node in ${computenodelist[@]} do export node validate_metadata done echo -e "\n" --- To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1836253/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp