[Yahoo-eng-team] [Bug 2052916] [NEW] HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error
Public bug reported: When doing a HTTP GET against s3tokens and ec2tokens endpoint we should get a 405 method not allowed but because the get method is getting enforced we get a 500 internal server error instead. AssertionError: PROGRAMMING ERROR: enforcement (`keystone.common.rbac_enforcer.enforcer.RBACEnforcer.enforce_call()`) has not been called; API is unenforced. ** Affects: keystone Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/2052916 Title: HTTP get on s3tokens and ec2tokens endpoint gives 500 internal error Status in OpenStack Identity (keystone): In Progress Bug description: When doing a HTTP GET against s3tokens and ec2tokens endpoint we should get a 405 method not allowed but because the get method is getting enforced we get a 500 internal server error instead. AssertionError: PROGRAMMING ERROR: enforcement (`keystone.common.rbac_enforcer.enforcer.RBACEnforcer.enforce_call()`) has not been called; API is unenforced. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/2052916/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2049899] [NEW] disk remaining in logs during live migration says 100 when no disk is migrated
Public bug reported: when doing live migrations for bfv instances the disk remaining in the nova log says 100 even if there is no disk to migrate ** Affects: nova Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2049899 Title: disk remaining in logs during live migration says 100 when no disk is migrated Status in OpenStack Compute (nova): In Progress Bug description: when doing live migrations for bfv instances the disk remaining in the nova log says 100 even if there is no disk to migrate To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2049899/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2049903] [NEW] nova-compute starts even if resource provider creation fails with conflict
Public bug reported: if an operator has a compute node and reinstalls it but forget to do a "openstack compute service delete " first (that would wipe the nova- compute service record and the resource provider in placement) the reinstalled compute node with the same hostname happily reports it's state as up even though the resource provider creation that nova-compute tried failed due to a conflict with the resource providers. to do operators a big favor we should make nova-compute startup fail if the resource provider creation failed (like when there is a conflict) ** Affects: nova Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2049903 Title: nova-compute starts even if resource provider creation fails with conflict Status in OpenStack Compute (nova): In Progress Bug description: if an operator has a compute node and reinstalls it but forget to do a "openstack compute service delete " first (that would wipe the nova-compute service record and the resource provider in placement) the reinstalled compute node with the same hostname happily reports it's state as up even though the resource provider creation that nova- compute tried failed due to a conflict with the resource providers. to do operators a big favor we should make nova-compute startup fail if the resource provider creation failed (like when there is a conflict) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2049903/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 2034035] [NEW] neutron allowed address pair with same ip address causes ValueError
Public bug reported: when managing allowed address pairs in horizon for a neutron port and you create two identical ip_address but with different mac_address, horizon crashes because the id in the table is the same, see below traceback. solution is to concat mac_address if set in the ID for that row Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 51, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 35, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 35, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 111, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 83, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 98, in dispatch return handler(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/tabs/views.py", line 156, in post return self.get(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/tabs/views.py", line 135, in get handled = self.handle_table(self._table_dict[table_name]) File "/usr/lib/python3.6/site-packages/horizon/tabs/views.py", line 116, in handle_table handled = tab._tables[table_name].maybe_handle() File "/usr/lib/python3.6/site-packages/horizon/tables/base.py", line 1802, in maybe_handle return self.take_action(action_name, obj_id) File "/usr/lib/python3.6/site-packages/horizon/tables/base.py", line 1644, in take_action response = action.multiple(self, self.request, obj_ids) File "/usr/lib/python3.6/site-packages/horizon/tables/actions.py", line 305, in multiple return self.handle(data_table, request, object_ids) File "/usr/lib/python3.6/site-packages/horizon/tables/actions.py", line 760, in handle datum = table.get_object_by_id(datum_id) File "/usr/lib/python3.6/site-packages/horizon/tables/base.py", line 1480, in get_object_by_id % matches) ValueError: Multiple matches were returned for that id: [, ]. ** Affects: horizon Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/2034035 Title: neutron allowed address pair with same ip address causes ValueError Status in OpenStack Dashboard (Horizon): In Progress Bug description: when managing allowed address pairs in horizon for a neutron port and you create two identical ip_address but with different mac_address, horizon crashes because the id in the table is the same, see below traceback. solution is to concat mac_address if set in the ID for that row Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/django/core/handlers/exception.py", line 47, in inner response = get_response(request) File "/usr/lib/python3.6/site-packages/django/core/handlers/base.py", line 181, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 51, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 35, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 35, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 111, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/decorators.py", line 83, in dec return view_func(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 70, in view return self.dispatch(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/django/views/generic/base.py", line 98, in dispatch return handler(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/tabs/views.py", line 156, in post return self.get(request, *args, **kwargs) File "/usr/lib/python3.6/site-packages/horizon/tabs/views.py", line 135, in get handled =
[Yahoo-eng-team] [Bug 1787385] Re: vpnaas and dynamic-routing missing neutron-tempest-plugin in test-requirements.txt
I have no idea what I was referring to there so will set as invalid, past me should have posted more details :p ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1787385 Title: vpnaas and dynamic-routing missing neutron-tempest-plugin in test- requirements.txt Status in neutron: Invalid Bug description: The vpnaas and dynamic routing projects are missing the neutron- tempest-plugin in test-requirements.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1787385/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1960230] [NEW] resize fails with FileExistsError if earlier resize attempt failed to cleanup
Public bug reported: This bug is related to resize with the libvirt driver If you are performing a resize and it fails the _cleanup_remote_migration() [1] function in the libvirt driver will try to cleanup the /var/lib/nova/instances/_resize directory on the remote side [2] - if this fails the _resize directory will be left behind and block any future resize attempts. 2021-12-14 14:40:12.535 175177 INFO nova.virt.libvirt.driver [req-9d3477d4-3bb2-456f-9be6-dce9893b0e95 23d6aa8884ab44ef9f214ad195d273c0 050c556faa5944a8953126c867313770 - default default] [instance: 99287438-c37b-44b0-834e-55685b6e83eb] Deletion of /var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize failed Then on next resize attempt a long time later 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10429, in migrate_disk_and_power_off 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server os.rename(inst_base, inst_base_resize) 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server FileExistsError: [Errno 17] File exists: '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb' -> '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize' This is happens here [3] because os.rename tries to rename the /var/lib/nova/instances/ dir to _resize that already exists and fails with FileExistsError. We should check if the directory exists before trying to rename and delete it before. [1] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10773 [2] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10965 [3] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10915 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1960230 Title: resize fails with FileExistsError if earlier resize attempt failed to cleanup Status in OpenStack Compute (nova): New Bug description: This bug is related to resize with the libvirt driver If you are performing a resize and it fails the _cleanup_remote_migration() [1] function in the libvirt driver will try to cleanup the /var/lib/nova/instances/_resize directory on the remote side [2] - if this fails the _resize directory will be left behind and block any future resize attempts. 2021-12-14 14:40:12.535 175177 INFO nova.virt.libvirt.driver [req-9d3477d4-3bb2-456f-9be6-dce9893b0e95 23d6aa8884ab44ef9f214ad195d273c0 050c556faa5944a8953126c867313770 - default default] [instance: 99287438-c37b-44b0-834e-55685b6e83eb] Deletion of /var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize failed Then on next resize attempt a long time later 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10429, in migrate_disk_and_power_off 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server os.rename(inst_base, inst_base_resize) 2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server FileExistsError: [Errno 17] File exists: '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb' -> '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize' This is happens here [3] because os.rename tries to rename the /var/lib/nova/instances/ dir to _resize that already exists and fails with FileExistsError. We should check if the directory exists before trying to rename and delete it before. [1] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10773 [2] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10965 [3] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10915 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1960230/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1957167] [NEW] glance constraint for sqlalchemy is too low for xena
Public bug reported: The glance requirement for sqlalchemy says >= 1.0.10 but using 1.3.2 gives error when trying to db sync this is xena release versions openstack-glance-21.1.0-1.el8.noarch python3-glance-store-2.3.0-2.el8.noarch python3-glanceclient-3.5.0-1.el8.noarch python3-glance-21.1.0-1.el8.noarch upgrading sqlalchemy to 1.4.18 makes it working, which means the requirements is not properly set 2022-01-11 17:38:48.627 196461 CRITICAL glance [-] Unhandled error: TypeError: 'int' object is not iterable 2022-01-11 17:38:48.627 196461 ERROR glance Traceback (most recent call last): 2022-01-11 17:38:48.627 196461 ERROR glance File "/bin/glance-manage", line 10, in 2022-01-11 17:38:48.627 196461 ERROR glance sys.exit(main()) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/cmd/manage.py", line 557, in main 2022-01-11 17:38:48.627 196461 ERROR glance return CONF.command.action_fn() 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/cmd/manage.py", line 391, in sync 2022-01-11 17:38:48.627 196461 ERROR glance self.command_object.sync(CONF.command.version) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/cmd/manage.py", line 152, in sync 2022-01-11 17:38:48.627 196461 ERROR glance curr_heads = alembic_migrations.get_current_alembic_heads() 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/db/sqlalchemy/alembic_migrations/__init__.py", line 46, in get_current_alembic_heads 2022-01-11 17:38:48.627 196461 ERROR glance engine = db_api.get_engine() 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/db/sqlalchemy/api.py", line 98, in get_engine 2022-01-11 17:38:48.627 196461 ERROR glance facade = _create_facade_lazily() 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/glance/db/sqlalchemy/api.py", line 88, in _create_facade_lazily 2022-01-11 17:38:48.627 196461 ERROR glance _FACADE = session.EngineFacade.from_config(CONF) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1370, in from_config 2022-01-11 17:38:48.627 196461 ERROR glance expire_on_commit=expire_on_commit, _conf=conf) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1291, in __init__ 2022-01-11 17:38:48.627 196461 ERROR glance slave_connection=slave_connection) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 506, in _start 2022-01-11 17:38:48.627 196461 ERROR glance engine_args, maker_args) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 530, in _setup_for_connection 2022-01-11 17:38:48.627 196461 ERROR glance sql_connection=sql_connection, **engine_kwargs) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/debtcollector/renames.py", line 43, in decorator 2022-01-11 17:38:48.627 196461 ERROR glance return wrapped(*args, **kwargs) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py", line 211, in create_engine 2022-01-11 17:38:48.627 196461 ERROR glance test_conn = _test_connection(engine, max_retries, retry_interval) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py", line 386, in _test_connection 2022-01-11 17:38:48.627 196461 ERROR glance return engine.connect() 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 2193, in connect 2022-01-11 17:38:48.627 196461 ERROR glance return self._connection_cls(self, **kwargs) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 125, in __init__ 2022-01-11 17:38:48.627 196461 ERROR glance self.dispatch.engine_connect(self, self.__branch) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib64/python3.6/site-packages/sqlalchemy/event/attr.py", line 297, in __call__ 2022-01-11 17:38:48.627 196461 ERROR glance fn(*args, **kw) 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/engines.py", line 73, in _connect_ping_listener 2022-01-11 17:38:48.627 196461 ERROR glance connection.scalar(select(1)) 2022-01-11 17:38:48.627 196461 ERROR glance File "", line 2, in select 2022-01-11 17:38:48.627 196461 ERROR glance File "", line 2, in __init__ 2022-01-11 17:38:48.627 196461 ERROR glance File "/usr/lib64/python3.6/site-packages/sqlalchemy/util/deprecations.py", line 130, in warned 2022-01-11
[Yahoo-eng-team] [Bug 1948676] [NEW] rpc response timeout for agent report_state is not possible
Public bug reported: When hosting a large amount of routers and/or networks the RPC calls from the agents can take a long time which requires us to increase the rpc_response_timeout from the default of 60 seconds to a higher value for the agents to not timeout. This has the side effect that if a rabbitmq or neutron-server is restarted all agents that is currently reporting there will hang for a long time until report_state times out, during this time neutron-server has not got any reports causing it to set the agent as down. When it times out and tries again the reporting will succeed but a full sync will be triggered for all agents that was previously dead. This in itself can cause a very high load on the control plane. Consider the fact that a configuration change is deployed using tooling to all neutron-server nodes which is restarted, all agents will die, when they either 1) come back after rpc_response_timeout is reached and tries again or 2) is restarted manually all of them will do a full sync. We should have a configuration option that only applies to the rpc timeout for the report_state RPC call from agents because that could be lowered to be within the bounds of the agent not being seen as down. The old behavior can be kept by simply falling back to rpc_response_timeout by default instead of introducing a new default in this override. ** Affects: neutron Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1948676 Title: rpc response timeout for agent report_state is not possible Status in neutron: In Progress Bug description: When hosting a large amount of routers and/or networks the RPC calls from the agents can take a long time which requires us to increase the rpc_response_timeout from the default of 60 seconds to a higher value for the agents to not timeout. This has the side effect that if a rabbitmq or neutron-server is restarted all agents that is currently reporting there will hang for a long time until report_state times out, during this time neutron- server has not got any reports causing it to set the agent as down. When it times out and tries again the reporting will succeed but a full sync will be triggered for all agents that was previously dead. This in itself can cause a very high load on the control plane. Consider the fact that a configuration change is deployed using tooling to all neutron-server nodes which is restarted, all agents will die, when they either 1) come back after rpc_response_timeout is reached and tries again or 2) is restarted manually all of them will do a full sync. We should have a configuration option that only applies to the rpc timeout for the report_state RPC call from agents because that could be lowered to be within the bounds of the agent not being seen as down. The old behavior can be kept by simply falling back to rpc_response_timeout by default instead of introducing a new default in this override. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1948676/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1928875] Re: Neutron L3 HA state transition fails with KeyError and agent stops processing
That seems about right. We upgraded from Train to Victoria recently so I would assume this was related to that. Ideally it should have handled the word 'master' as a transition and perhaps converted any /var/lib/neutron/ha_confs//state files or similar since the service did not work after one reboot I assume. Rebooting the service fixes all the state files, so for anybody wondering you should restart the l3-agent service multiple times after upgrading with that change. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928875 Title: Neutron L3 HA state transition fails with KeyError and agent stops processing Status in neutron: Invalid Bug description: When using L3 HA enabled for routers sometimes the state transitions reports master instead of primary causing a KeyError in the TRANSLATION_MAP. This causes the agent to fail and stop processing them all together, if you then move a router (haven't tried for new routers) to the agent it will become the primary but since state transition doesn't happen it will not get any routes. Victoria release. python3-neutronclient-6.14.1-1.el8.noarch python3-neutron-17.1.0-1.el8.noarch openstack-neutron-openvswitch-17.1.0-1.el8.noarch python3-neutron-lib-2.6.1-2.el8.noarch openstack-neutron-common-17.1.0-1.el8.noarch openstack-neutron-ml2-17.1.0-1.el8.noarch python3-neutron-dynamic-routing-17.0.0-2.el8.noarch openstack-neutron-bgp-dragent-17.0.0-2.el8.noarch openstack-neutron-17.1.0-1.el8.noarch openstack-neutron-dynamic-routing-common-17.0.0-2.el8.noarch keepalived-2.0.10-11.el8_3.1.x86_64 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a67b5215-a905-4303-8dc6-75e0f45aa6c6: KeyError: 'master' 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 788, in _process_routers_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 617, in _process_router_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 671, in _process_updated_router 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent router['id'], router.get(lib_const.HA_ROUTER_STATE_KEY)) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent result = f(*args, **kwargs) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha.py", line 102, in check_ha_state_for_router 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent if current_state != TRANSLATION_MAP[ha_state]: 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent KeyError: 'master' To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1928875/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1928875] [NEW] Neutron L3 HA state transition fails with KeyError and agent stops processing
Public bug reported: When using L3 HA enabled for routers sometimes the state transitions reports master instead of primary causing a KeyError in the TRANSLATION_MAP. This causes the agent to fail and stop processing them all together, if you then move a router (haven't tried for new routers) to the agent it will become the primary but since state transition doesn't happen it will not get any routes. Victoria release. python3-neutronclient-6.14.1-1.el8.noarch python3-neutron-17.1.0-1.el8.noarch openstack-neutron-openvswitch-17.1.0-1.el8.noarch python3-neutron-lib-2.6.1-2.el8.noarch openstack-neutron-common-17.1.0-1.el8.noarch openstack-neutron-ml2-17.1.0-1.el8.noarch python3-neutron-dynamic-routing-17.0.0-2.el8.noarch openstack-neutron-bgp-dragent-17.0.0-2.el8.noarch openstack-neutron-17.1.0-1.el8.noarch openstack-neutron-dynamic-routing-common-17.0.0-2.el8.noarch keepalived-2.0.10-11.el8_3.1.x86_64 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a67b5215-a905-4303-8dc6-75e0f45aa6c6: KeyError: 'master' 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 788, in _process_routers_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 617, in _process_router_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 671, in _process_updated_router 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent router['id'], router.get(lib_const.HA_ROUTER_STATE_KEY)) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/osprofiler/profiler.py", line 160, in wrapper 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent result = f(*args, **kwargs) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/ha.py", line 102, in check_ha_state_for_router 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent if current_state != TRANSLATION_MAP[ha_state]: 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent KeyError: 'master' ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1928875 Title: Neutron L3 HA state transition fails with KeyError and agent stops processing Status in neutron: New Bug description: When using L3 HA enabled for routers sometimes the state transitions reports master instead of primary causing a KeyError in the TRANSLATION_MAP. This causes the agent to fail and stop processing them all together, if you then move a router (haven't tried for new routers) to the agent it will become the primary but since state transition doesn't happen it will not get any routes. Victoria release. python3-neutronclient-6.14.1-1.el8.noarch python3-neutron-17.1.0-1.el8.noarch openstack-neutron-openvswitch-17.1.0-1.el8.noarch python3-neutron-lib-2.6.1-2.el8.noarch openstack-neutron-common-17.1.0-1.el8.noarch openstack-neutron-ml2-17.1.0-1.el8.noarch python3-neutron-dynamic-routing-17.0.0-2.el8.noarch openstack-neutron-bgp-dragent-17.0.0-2.el8.noarch openstack-neutron-17.1.0-1.el8.noarch openstack-neutron-dynamic-routing-common-17.0.0-2.el8.noarch keepalived-2.0.10-11.el8_3.1.x86_64 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a67b5215-a905-4303-8dc6-75e0f45aa6c6: KeyError: 'master' 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 788, in _process_routers_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 617, in _process_router_if_compatible 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent self._process_updated_router(router) 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent File "/usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py", line 671, in _process_updated_router 2021-05-11 00:58:10.449 808339 ERROR neutron.agent.l3.agent router['id'],
[Yahoo-eng-team] [Bug 1795280] Re: netns deletion on newer kernels fails with errno 16
** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1795280 Title: netns deletion on newer kernels fails with errno 16 Status in neutron: Invalid Bug description: This is probably not neutron related, but need help with some input. On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting it properly terminates all processes, removes interfaces and deletes the network namespace. [root@controller ~]# uname -r 3.10.0-862.11.6.el7.x86_64 If running a later kernel like 4.18 there is some change that causes the namespace deletion to cause a OSError errno 16 device or resource busy. Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before. [root@controller ~]# mount | grep qdhcp proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) [root@controller ~]# uname -r 4.18.8-1.el7.elrepo.x86_64 nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) Perhaps some CentOS or RedHat person can shime in about this. Can reproduce this every single time: * Create network, it spawns dnsmasq, haproxy and the interfaces in a netns * Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error Seen on both queens and rocky fwiw 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last): 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port() 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2]) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1795280/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1926978] [NEW] Leaking username and backend in RBD driver
Public bug reported: The RBD utils get_pool_info() function raises an processutils.ProcessExecutionError from oslo.concurrency if it fails. That error message contains the Ceph username and the fact that it's running Ceph in the error message that a end-user can view. | fault| {"code": 500, "created": "2021-05-03T14:00:57Z", "message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 28c36a23-8e2b-4425-aeb3-502c536f43e8. Last exception: Unexpected error while running command. | | | Command: ceph df --format=json --id openstack --conf /etc/ceph/ceph.conf This information should not be available to end-users. ** Affects: nova Importance: Undecided Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1926978 Title: Leaking username and backend in RBD driver Status in OpenStack Compute (nova): In Progress Bug description: The RBD utils get_pool_info() function raises an processutils.ProcessExecutionError from oslo.concurrency if it fails. That error message contains the Ceph username and the fact that it's running Ceph in the error message that a end-user can view. | fault| {"code": 500, "created": "2021-05-03T14:00:57Z", "message": "Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 28c36a23-8e2b-4425-aeb3-502c536f43e8. Last exception: Unexpected error while running command. | | | Command: ceph df --format=json --id openstack --conf /etc/ceph/ceph.conf This information should not be available to end-users. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1926978/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1901707] [NEW] race condition on port binding vs instance being resumed for live-migrations
Public bug reported: This is a separation from the discussion in this bug https://bugs.launchpad.net/neutron/+bug/1815989 There comment https://bugs.launchpad.net/neutron/+bug/1815989/comments/52 goes through in detail the flow on a Train deployment using neutron 15.1.0 (controller) and 15.3.0 (compute) and nova 20.4.0 There is a race condition where nova live-migration will wait for neutron to send the network-vif-plugged event but when nova receives that event the live migration is faster than the OVS l2 agent can bind the port on the destination compute node. This causes the RARP frames sent out to update the switches ARP tables to fail causing the instance to be completely unaccessible after a live migration unless these RARP frames are sent again or traffic is initiated egress from the instance. See Sean's comments after for the view from the Nova side. The correct behavior should be that the port is ready for use when nova get's the external event, but maybe that is not possible from the neutron side, again see comments in the other bug. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1901707 Title: race condition on port binding vs instance being resumed for live- migrations Status in neutron: New Bug description: This is a separation from the discussion in this bug https://bugs.launchpad.net/neutron/+bug/1815989 There comment https://bugs.launchpad.net/neutron/+bug/1815989/comments/52 goes through in detail the flow on a Train deployment using neutron 15.1.0 (controller) and 15.3.0 (compute) and nova 20.4.0 There is a race condition where nova live-migration will wait for neutron to send the network-vif-plugged event but when nova receives that event the live migration is faster than the OVS l2 agent can bind the port on the destination compute node. This causes the RARP frames sent out to update the switches ARP tables to fail causing the instance to be completely unaccessible after a live migration unless these RARP frames are sent again or traffic is initiated egress from the instance. See Sean's comments after for the view from the Nova side. The correct behavior should be that the port is ready for use when nova get's the external event, but maybe that is not possible from the neutron side, again see comments in the other bug. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1901707/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1869929] Re: RuntimeError: maximum recursion depth exceeded while calling a Python object
Think this isn't a bug but was related to SELinux. This issue happend when I upgraded nova on our compute node and then this occured. So I removed the @db.select_db_reader_mode decorator usage in nova/objects/service.py to make it start. I then proceeded to upgrade Neutron and Ceilometer on the compute nodes, Neutron requires the following SELinux packages to be updated in order for it to work: libselinux libselinux-python libselinux-utils selinux-policy selinux- policy-targeted When I upgraded that, neutron and ceilometer I didn't bother testing again. I removed the commented decorators now and restart nova-compute and it worked. This is the install log: Mar 31 17:22:07 Installed: 1:python2-nova-20.1.1-1.el7.noarch Mar 31 17:22:08 Updated: 1:openstack-nova-common-20.1.1-1.el7.noarch Mar 31 17:22:09 Updated: 1:openstack-nova-compute-20.1.1-1.el7.noarch Mar 31 17:22:09 Erased: python-dogpile-cache-0.6.2-1.el7.noarch Mar 31 17:22:11 Erased: 1:python-nova-18.2.3-1.el7.noarch Mar 31 17:22:11 Erased: python-dogpile-core-0.4.1-2.el7.noarch Apr 01 11:49:46 Updated: python2-os-traits-0.16.0-1.el7.noarch Apr 01 11:55:16 Installed: python2-os-ken-0.4.1-1.el7.noarch Apr 01 11:55:17 Updated: python2-neutron-lib-1.29.1-1.el7.noarch Apr 01 11:55:17 Updated: python2-pyroute2-0.5.6-1.el7.noarch Apr 01 11:55:19 Installed: 1:python2-neutron-15.0.2-1.el7.noarch Apr 01 11:55:20 Updated: 1:openstack-neutron-common-15.0.2-1.el7.noarch Apr 01 11:55:21 Updated: 1:openstack-neutron-openvswitch-15.0.2-1.el7.noarch Apr 01 11:55:22 Updated: 1:openstack-neutron-15.0.2-1.el7.noarch Apr 01 11:55:25 Erased: 1:python-neutron-13.0.6-1.el7.noarch Apr 01 11:55:44 Installed: python2-zaqarclient-1.12.0-1.el7.noarch Apr 01 11:55:45 Installed: 1:python2-ceilometer-13.1.0-1.el7.noarch Apr 01 11:55:46 Updated: 1:openstack-ceilometer-common-13.1.0-1.el7.noarch Apr 01 11:55:46 Updated: 1:openstack-ceilometer-polling-13.1.0-1.el7.noarch Apr 01 11:55:48 Erased: 1:python-ceilometer-11.0.1-1.el7.noarch The possibility of any of the additional packages after nova-compute there fixed it is very low. The only thing I did manually except for that was to upgrade the SELinux packages mentioned above because that's required by Neutron. ** Changed in: nova Status: New => Invalid ** Changed in: oslo.config Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869929 Title: RuntimeError: maximum recursion depth exceeded while calling a Python object Status in OpenStack Compute (nova): Invalid Status in oslo.config: Invalid Bug description: When testing upgrading nova packages from Rocky to Train the following issue occurs: versions: oslo.config 6.11.2 oslo.concurrency 3.30.0 oslo.versionedobjects 1.36.1 oslo.db 5.0.2 oslo.config 6.11.2 oslo.cache 1.37.0 Happens here https://github.com/openstack/oslo.db/blob/5.0.2/oslo_db/api.py#L304 where it register_opts for options.database_opts This cmp operation: https://github.com/openstack/oslo.config/blob/6.11.2/oslo_config/cfg.py#L363 If I edit above cmp operation and add print statements before like this: if opt.dest in opts: print('left: %s' % str(opts[opt.dest]['opt'].name)) print('right: %s' % str(opt.name)) if opts[opt.dest]['opt'] != opt: raise DuplicateOptError(opt.name) It stops here: $ nova-compute --help left: sqlite_synchronous right: sqlite_synchronous Traceback (most recent call last): same exception RuntimeError: maximum recursion depth exceeded while calling a Python object /usr/bin/nova-compute --help Traceback (most recent call last): File "/usr/bin/nova-compute", line 6, in from nova.cmd.compute import main File "/usr/lib/python2.7/site-packages/nova/cmd/compute.py", line 29, in from nova.compute import rpcapi as compute_rpcapi File "/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 30, in from nova.objects import service as service_obj File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 170, in base.NovaObjectDictCompat): File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 351, in Service def _db_service_get_by_compute_host(context, host, use_slave=False): File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 91, in select_db_reader_mode return IMPL.select_db_reader_mode(f) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File
[Yahoo-eng-team] [Bug 1869929] Re: RuntimeError: maximum recursion depth exceeded while calling a Python object
** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1869929 Title: RuntimeError: maximum recursion depth exceeded while calling a Python object Status in OpenStack Compute (nova): New Status in oslo.config: New Bug description: When testing upgrading nova packages from Rocky to Train the following issue occurs: versions: oslo.config 6.11.2 oslo.concurrency 3.30.0 oslo.versionedobjects 1.36.1 oslo.db 5.0.2 oslo.config 6.11.2 oslo.cache 1.37.0 Happens here https://github.com/openstack/oslo.db/blob/5.0.2/oslo_db/api.py#L304 where it register_opts for options.database_opts This cmp operation: https://github.com/openstack/oslo.config/blob/6.11.2/oslo_config/cfg.py#L363 If I edit above cmp operation and add print statements before like this: if opt.dest in opts: print('left: %s' % str(opts[opt.dest]['opt'].name)) print('right: %s' % str(opt.name)) if opts[opt.dest]['opt'] != opt: raise DuplicateOptError(opt.name) It stops here: $ nova-compute --help left: sqlite_synchronous right: sqlite_synchronous Traceback (most recent call last): same exception RuntimeError: maximum recursion depth exceeded while calling a Python object /usr/bin/nova-compute --help Traceback (most recent call last): File "/usr/bin/nova-compute", line 6, in from nova.cmd.compute import main File "/usr/lib/python2.7/site-packages/nova/cmd/compute.py", line 29, in from nova.compute import rpcapi as compute_rpcapi File "/usr/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 30, in from nova.objects import service as service_obj File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 170, in base.NovaObjectDictCompat): File "/usr/lib/python2.7/site-packages/nova/objects/service.py", line 351, in Service def _db_service_get_by_compute_host(context, host, use_slave=False): File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 91, in select_db_reader_mode return IMPL.select_db_reader_mode(f) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return getattr(self._api, key) File "/usr/lib/python2.7/site-packages/oslo_db/concurrency.py", line 72, in __getattr__ return
[Yahoo-eng-team] [Bug 1585699] Re: Neutron Metadata Agent Configuration - nova_metadata_ip
** Changed in: puppet-neutron Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1585699 Title: Neutron Metadata Agent Configuration - nova_metadata_ip Status in neutron: Fix Released Status in puppet-neutron: Fix Released Bug description: I am not sure if this constitutes the tag 'bug'. However it has lead us to some confusion and I feel it should be updated. This option in neutron metadata configuration (and install docs) is misleading. {{{ # IP address used by Nova metadata server. (string value) #nova_metadata_ip = 127.0.0.1 }}} It implies the need to present an IP address for the nova metadata api. Where as in actual fact this can be a hostname or IP address. When using TLS encrypted sessions, this 'has' to be a hostname, else this ends in a SSL issue, as the hostname is embedded in the certificates. I am seeing this issue with OpenStack Liberty, however it appears to be in the configuration reference for Mitaka too, so I guess this is accross the board. If this needs to be listed in a different forum, please let me know! Thanks To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1585699/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1828406] Re: neutron-dynamic-routing bgp ryu hold timer expired but never tried to recover
Thanks Ryan, I'll mark it as invalid and wait until we are on Stein. ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1828406 Title: neutron-dynamic-routing bgp ryu hold timer expired but never tried to recover Status in neutron: Invalid Bug description: Lost connection to the peer and the hold timer expired but it never tried to recover. 2019-05-09 13:26:24.921 2461284 INFO bgpspeaker.speaker [-] Negotiated hold time 40 expired. 2019-05-09 13:26:24.922 2461284 INFO bgpspeaker.peer [-] Connection to peer lost, reason: failed to write to socket Resetting retry connect loop: False 2019-05-09 13:26:24.922 2461284 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 80, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/utils/evtlet.py", line 63, in __call__ self._funct(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 542, in _expired self.send_notification(code, subcode) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 374, in send_notification self._send_with_lock(notification) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 386, in _send_with_lock self.connection_lost('failed to write to socket') File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 596, in connection_lost self._peer.connection_lost(reason) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/peer.py", line 2323, in connection_lost self._protocol.stop() File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 405, in stop Activity.stop(self) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/base.py", line 314, in stop raise ActivityException(desc='Cannot call stop when activity is ' ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. : ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1828406/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1828547] [NEW] neutron-dynamic-routing TypeError: argument of type 'NoneType' is not iterable
Public bug reported: Rocky with Ryu, dont have a reproduce on this one or don't know what caused it in the first place. python-neutron-13.0.3-1.el7.noarch openstack-neutron-openvswitch-13.0.3-1.el7.noarch python2-neutron-dynamic-routing-13.0.1-1.el7.noarch openstack-neutron-bgp-dragent-13.0.1-1.el7.noarch openstack-neutron-common-13.0.3-1.el7.noarch openstack-neutron-ml2-13.0.3-1.el7.noarch python2-neutronclient-6.9.0-1.el7.noarch openstack-neutron-13.0.3-1.el7.noarch openstack-neutron-dynamic-routing-common-13.0.1-1.el7.noarch python2-neutron-lib-1.18.0-1.el7.noarch python-ryu-common-4.26-1.el7.noarch python2-ryu-4.26-1.el7.noarch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server return f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 185, in bgp_speaker_create_end 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_helper(bgp_speaker_id) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 249, in add_bgp_speaker_helper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_speaker_on_dragent(bgp_speaker) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 359, in add_bgp_speaker_on_dragent 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.add_bgp_peers_to_bgp_speaker(bgp_speaker) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 390, in add_bgp_peers_to_bgp_speaker 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server bgp_peer) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 399, in add_bgp_peer_to_bgp_speaker 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server self.cache.put_bgp_peer(bgp_speaker_id, bgp_peer) 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/neutron_dynamic_routing/services/bgp/agent/bgp_dragent.py", line 604, in put_bgp_peer 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server if bgp_peer['peer_ip'] in self.get_bgp_peer_ips(bgp_speaker_id): 2019-05-09 16:52:41.970 1659 ERROR oslo_messaging.rpc.server TypeError: argument of type 'NoneType' is not
[Yahoo-eng-team] [Bug 1828406] [NEW] neutron-dynamic-routing bgp ryu hold timer expired but never tried to recover
Public bug reported: Lost connection to the peer and the hold timer expired but it never tried to recover. 2019-05-09 13:26:24.921 2461284 INFO bgpspeaker.speaker [-] Negotiated hold time 40 expired. 2019-05-09 13:26:24.922 2461284 INFO bgpspeaker.peer [-] Connection to peer lost, reason: failed to write to socket Resetting retry connect loop: False 2019-05-09 13:26:24.922 2461284 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 80, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/utils/evtlet.py", line 63, in __call__ self._funct(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 542, in _expired self.send_notification(code, subcode) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 374, in send_notification self._send_with_lock(notification) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 386, in _send_with_lock self.connection_lost('failed to write to socket') File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 596, in connection_lost self._peer.connection_lost(reason) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/peer.py", line 2323, in connection_lost self._protocol.stop() File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 405, in stop Activity.stop(self) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/base.py", line 314, in stop raise ActivityException(desc='Cannot call stop when activity is ' ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. : ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1828406 Title: neutron-dynamic-routing bgp ryu hold timer expired but never tried to recover Status in neutron: New Bug description: Lost connection to the peer and the hold timer expired but it never tried to recover. 2019-05-09 13:26:24.921 2461284 INFO bgpspeaker.speaker [-] Negotiated hold time 40 expired. 2019-05-09 13:26:24.922 2461284 INFO bgpspeaker.peer [-] Connection to peer lost, reason: failed to write to socket Resetting retry connect loop: False 2019-05-09 13:26:24.922 2461284 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 80, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/utils/evtlet.py", line 63, in __call__ self._funct(*self._args, **self._kwargs) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 542, in _expired self.send_notification(code, subcode) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 374, in send_notification self._send_with_lock(notification) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 386, in _send_with_lock self.connection_lost('failed to write to socket') File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 596, in connection_lost self._peer.connection_lost(reason) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/peer.py", line 2323, in connection_lost self._protocol.stop() File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/speaker.py", line 405, in stop Activity.stop(self) File "/usr/lib/python2.7/site-packages/ryu/services/protocols/bgp/base.py", line 314, in stop raise ActivityException(desc='Cannot call stop when activity is ' ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. : ActivityException: 100.1 - Cannot call stop when activity is not started or has been stopped already. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1828406/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1768807] Re: Live Migration failure: 'ascii' codec can't encode characters in position 251-252
** Changed in: nova/rocky Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1768807 Title: Live Migration failure: 'ascii' codec can't encode characters in position 251-252 Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) rocky series: Fix Released Bug description: when i do live migration,it raise the error as below: 2018-05-03 18:38:00.838 1570085 ERROR nova.virt.libvirt.driver [req- 7a3691d2-f850-4258-8c7a-54dcaa6189aa 659e4083e38046f8a23060addb53bd96 58942649d31846858f033ee805fcb5bc - default default] [instance: b9f91fe7-70b0-4efc-800a-0482914da186] Live Migration failure: 'ascii' codec can't encode characters in position 251-252: ordinal not in range(128): UnicodeEncodeError: 'ascii' codec can't encode characters in position 251-252: ordinal not in range(128) I have two computer node: compute1 ,compute2. The instance created at compute2 migrate to compute1 and migrate back to compute2 is work well. But the instance created at compute1 migrate to compute2 will make a fault as above. The two node configure file is same as well. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1768807/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1799455] [NEW] AttributeError: 'BgpDrAgentNotifyApi' object has no attribute 'agent_updated'
Public bug reported: When installing a new bgp-dragent and it's disabled by default (enable_new_agents=False) and you try to enable it, it throws an error on the first attempt then works on the second. openstack network agent set --enable 8139752e-5f08-424b-9b99-09772da3ec7d # fails openstack network agent set --enable 8139752e-5f08-424b-9b99-09772da3ec7d # succeeds 2018-10-23 14:45:12.020 2545 INFO neutron.wsgi [req-8805bf5b-377b-4cba-a0e7-bea10ac5893f 3a78e58e45b84317ad3bb8731112acb3 cc37d5e9495a97e8039314c88d5f - default default] :::xx "GET /v2.0/agents/8139752e-5f08-424b-9b99-09772da3ec7d HTTP/1.1" status: 200 len: 652 time: 0.0748830 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource [req-b47200a6-1fa7-46b7-9b5c-95470d47d2ef 3a78e58e45b84317ad3bb8731112acb3 cc37d5e9495a97e8039314c88d5f - default default] update failed: No details.: AttributeError: 'BgpDrAgentNotifyApi' object has no attribute 'agent_updated' 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource Traceback (most recent call last): 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 98, in resource 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource result = method(request=request, **args) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 626, in update 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource return self._update(request, id, body, **kwargs) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 140, in wrapped 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource setattr(e, '_RETRY_EXCEEDED', True) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource self.force_reraise() 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 136, in wrapped 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource return f(*args, **kwargs) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource ectxt.value = e.inner_exc 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource self.force_reraise() 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource return f(*args, **kwargs) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 183, in wrapped 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource LOG.debug("Retry wrapper got retriable exception: %s", e) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource self.force_reraise() 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron_lib/db/api.py", line 179, in wrapped 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource return f(*dup_args, **dup_kwargs) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 682, in _update 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource obj = obj_updater(request.context, id, **kwargs) 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/agentschedulers_db.py", line 81, in update_agent 2018-10-23 14:45:12.097 2545 ERROR neutron.api.v2.resource
[Yahoo-eng-team] [Bug 1795280] [NEW] netns deletion on newer kernels fails with errno 16
Public bug reported: This is probably not neutron related, but need help with some input. On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting it properly terminates all processes, removes interfaces and deletes the network namespace. [root@controller ~]# uname -r 3.10.0-862.11.6.el7.x86_64 If running a later kernel like 4.18 there is some change that causes the namespace deletion to cause a OSError errno 16 device or resource busy. Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before. [root@controller ~]# mount | grep qdhcp proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) [root@controller ~]# uname -r 4.18.8-1.el7.elrepo.x86_64 nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) Perhaps some CentOS or RedHat person can shime in about this. Can reproduce this every single time: * Create network, it spawns dnsmasq, haproxy and the interfaces in a netns * Delete network, it will terminate all processes, delete interface but netns cannot be deleted and throws below error Seen on both queens and rocky fwiw 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent [req-28a9e37f-a2ca-4375-a3f0-8384711414dd - - - - -] Unable to disable dhcp for 1fb24615-fd9e-4804-aade-5668bb2cdecb.: OSError: [Errno 16] Device or resource busy 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent Traceback (most recent call last): 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 144, in call_driver 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent getattr(driver, action)(**action_kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 241, in disable 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent self._destroy_namespace_and_port() 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 255, in _destroy_namespace_and_port 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent ip_lib.delete_network_namespace(self.network.namespace) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1105, in delete_network_namespace 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent privileged.remove_netns(namespace, **kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 207, in _wrap 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent return self.channel.remote_call(name, args, kwargs) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 202, in remote_call 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent raise exc_type(*result[2]) 2018-10-01 00:03:27.662 2093 ERROR neutron.agent.dhcp.agent OSError: [Errno 16] Device or resource busy ** Affects: neutron Importance: Undecided Status: New ** Description changed: This is probably not neutron related, but need help with some input. On a 3.10 kernel on CentOS 7.5 by simply creating a network and deleting it properly terminates all processes, removes interfaces and deletes the network namespace. [root@controller ~]# uname -r 3.10.0-862.11.6.el7.x86_64 If running a later kernel like 4.18 there is some change that causes the namespace deletion to cause a OSError errno 16 device or resource busy. Before something like kernel 3.19 the netns filesystem was provided in proc but has since been moved to it's own nsfs, maybe this has something to do with it, but I haven't seen this issue on Ubuntu before. [root@controller ~]# mount | grep qdhcp proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) proc on /run/netns/qdhcp-51e47959-9a2b-4372-a204-aff75de9bd01 type proc (rw,nosuid,nodev,noexec,relatime) - [root@osc-network1-sto1-prod ~]# uname -r + [root@controller ~]# uname -r 4.18.8-1.el7.elrepo.x86_64 nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) nsfs on /run/netns/qdhcp-1fb24615-fd9e-4804-aade-5668bb2cdecb type nsfs (rw,seclabel) Perhaps some CentOS or RedHat person can shime in about this. Can reproduce this every single time: * Create network, it spawns
[Yahoo-eng-team] [Bug 1794259] [NEW] rocky upgrade path broken requirements pecan too low
Public bug reported: When upgrading to Rocky we noticed that the pecan requirement is: pecan!=1.0.2,!=1.0.3,!=1.0.4,!=1.2,>=1.1.1 # BSD https://github.com/openstack/neutron/blob/stable/rocky/requirements.txt#L11 But when having python2-pecan-1.1.2 which should satisfy this requirement we get below. After upgrading to python2-pecan-1.3.2 this issue was solved. 2018-09-25 11:03:37.579 416002 INFO neutron.wsgi [-] 172.20.106.11 "GET / HTTP/1.0" status: 500 len: 2523 time: 0.0019162 2018-09-25 11:03:39.582 416002 INFO neutron.wsgi [-] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 490, in handle_one_response result = self.application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 203, in __call__ return app(environ, start_response) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ resp = self.call_func(req, *args, **kw) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ response = req.get_response(self.application) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send application, catch_exc_info=False) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application app_iter = application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ resp = self.call_func(req, *args, **kw) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ response = req.get_response(self.application) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send application, catch_exc_info=False) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application app_iter = application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__ return self.application(environ, start_response) File "/usr/lib/python2.7/site-packages/pecan/core.py", line 835, in __call__ return super(Pecan, self).__call__(environ, start_response) File "/usr/lib/python2.7/site-packages/pecan/core.py", line 677, in __call__ controller, args, kwargs = self.find_controller(state) File "/usr/lib/python2.7/site-packages/pecan/core.py", line 853, in find_controller controller, args, kw = super(Pecan, self).find_controller(_state) File "/usr/lib/python2.7/site-packages/pecan/core.py", line 480, in find_controller accept.startswith('text/html,') and AttributeError: 'NoneType' object has no attribute 'startswith' ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1794259 Title: rocky upgrade path broken requirements pecan too low Status in neutron: New Bug description: When upgrading to Rocky we noticed that the pecan requirement is: pecan!=1.0.2,!=1.0.3,!=1.0.4,!=1.2,>=1.1.1 # BSD https://github.com/openstack/neutron/blob/stable/rocky/requirements.txt#L11 But when having python2-pecan-1.1.2 which should satisfy this requirement we get below. After upgrading to python2-pecan-1.3.2 this issue was solved. 2018-09-25 11:03:37.579 416002 INFO neutron.wsgi [-] 172.20.106.11 "GET / HTTP/1.0" status: 500 len: 2523 time: 0.0019162 2018-09-25 11:03:39.582 416002 INFO neutron.wsgi [-] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 490, in handle_one_response result = self.application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 203, in __call__ return app(environ, start_response) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ resp = self.call_func(req, *args, **kw) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__ response = req.get_response(self.application) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1313, in send application, catch_exc_info=False) File "/usr/lib/python2.7/site-packages/webob/request.py", line 1277, in call_application app_iter = application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 129, in __call__ resp = self.call_func(req, *args, **kw) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 193, in
[Yahoo-eng-team] [Bug 1793353] [NEW] broken upgrade path q->r requirement for oslo.db
Public bug reported: Nova is using async_ introduced in oslo.db 4.40.0 but requirements.txt says oslo.db>=4.27.0 https://github.com/openstack/oslo.db/commit/df6bf3401266f42271627c1e408f87c71a06cef7 So if you still have an old oslo.db version from queens that satisfies that requirement services will fail with below: 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service Traceback (most recent call last): 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 729, in run_service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service service.start() 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/service.py", line 180, in start 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service self.manager.pre_start_hook() 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1249, in pre_start_hook 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service startup=True) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7757, in update_available_resource 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service startup=startup) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 7788, in _get_compute_nodes_in_db 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service use_slave=use_slave) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 177, in wrapper 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service args, kwargs) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/conductor/rpcapi.py", line 241, in object_class_action_versions 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service args=args, kwargs=kwargs) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service retry=self.retry) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service retry=retry) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service call_monitor_timeout, retry=retry) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 575, in _send 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service raise result 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service AttributeError: '_TransactionContextManager' object has no attribute 'async_' 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service Traceback (most recent call last): 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 126, in _object_dispatch 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service return getattr(target, method)(*args, **kwargs) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 184, in wrapper 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service result = fn(cls, context, *args, **kwargs) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/objects/compute_node.py", line 437, in get_all_by_host 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service use_slave=use_slave) 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 205, in wrapper 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service reader_mode = get_context_manager(context).async_ 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service 2018-09-19 16:56:35.965 136178 ERROR oslo_service.service AttributeError: '_TransactionContextManager' object has no attribute 'async_' ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1793353 Title:
[Yahoo-eng-team] [Bug 1793347] [NEW] keystone upgrade fails q->r oslo.log requirement to low
Public bug reported: When upgrading from Keystone queens to rocky the requirements.txt for rocky says oslo.log >= 3.36.0 but versionutils.deprecated.ROCKY is not introduced until 3.37.0 Should bump requirements.txt to atleast 3.37.0 Error when running db sync: Traceback (most recent call last): File "/bin/keystone-manage", line 6, in from keystone.cmd.manage import main File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 19, in from keystone.cmd import cli File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 29, in from keystone.cmd import bootstrap File "/usr/lib/python2.7/site-packages/keystone/cmd/bootstrap.py", line 17, in from keystone.common import driver_hints File "/usr/lib/python2.7/site-packages/keystone/common/driver_hints.py", line 18, in from keystone import exception File "/usr/lib/python2.7/site-packages/keystone/exception.py", line 20, in import keystone.conf File "/usr/lib/python2.7/site-packages/keystone/conf/__init__.py", line 27, in from keystone.conf import default File "/usr/lib/python2.7/site-packages/keystone/conf/default.py", line 60, in deprecated_since=versionutils.deprecated.ROCKY, AttributeError: type object 'deprecated' has no attribute 'ROCKY' ** Affects: keystone Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1793347 Title: keystone upgrade fails q->r oslo.log requirement to low Status in OpenStack Identity (keystone): New Bug description: When upgrading from Keystone queens to rocky the requirements.txt for rocky says oslo.log >= 3.36.0 but versionutils.deprecated.ROCKY is not introduced until 3.37.0 Should bump requirements.txt to atleast 3.37.0 Error when running db sync: Traceback (most recent call last): File "/bin/keystone-manage", line 6, in from keystone.cmd.manage import main File "/usr/lib/python2.7/site-packages/keystone/cmd/manage.py", line 19, in from keystone.cmd import cli File "/usr/lib/python2.7/site-packages/keystone/cmd/cli.py", line 29, in from keystone.cmd import bootstrap File "/usr/lib/python2.7/site-packages/keystone/cmd/bootstrap.py", line 17, in from keystone.common import driver_hints File "/usr/lib/python2.7/site-packages/keystone/common/driver_hints.py", line 18, in from keystone import exception File "/usr/lib/python2.7/site-packages/keystone/exception.py", line 20, in import keystone.conf File "/usr/lib/python2.7/site-packages/keystone/conf/__init__.py", line 27, in from keystone.conf import default File "/usr/lib/python2.7/site-packages/keystone/conf/default.py", line 60, in deprecated_since=versionutils.deprecated.ROCKY, AttributeError: type object 'deprecated' has no attribute 'ROCKY' To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1793347/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1788384] [NEW] self-service password change UI is confusing for end users
Public bug reported: When a end user wants to use the self-service feature to changing their own password it's very common that they go under Identity -> Users and press the "Change password" button for their own user which does not work unless they are admin because it calls update_user keystone API. Instead users should go into [top right dropdown] -> Settings then move their eyes to the left in the appearing settings menu, click Change password and perform the password change there which calls the change_password keystone API. The "Change password" button should not be shown if the user does not have access to perform the action, another fix is also changing the link for the "Change password" button to the change_password API call if the logged in user is the one the password will be changed for. ** Affects: horizon Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1788384 Title: self-service password change UI is confusing for end users Status in OpenStack Dashboard (Horizon): New Bug description: When a end user wants to use the self-service feature to changing their own password it's very common that they go under Identity -> Users and press the "Change password" button for their own user which does not work unless they are admin because it calls update_user keystone API. Instead users should go into [top right dropdown] -> Settings then move their eyes to the left in the appearing settings menu, click Change password and perform the password change there which calls the change_password keystone API. The "Change password" button should not be shown if the user does not have access to perform the action, another fix is also changing the link for the "Change password" button to the change_password API call if the logged in user is the one the password will be changed for. To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1788384/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1787919] [NEW] Upgrade router to L3 HA broke IPv6
Public bug reported: When I disabled a router, changed it to L3 HA and enabled it again all the logic that was implemented in [1] did not seem to work. Please see the thread on ML [2] for details. The backup router had the net.ipv6.conf.qr-.accept_ra values for the qr interfaces (one for ipv4 and one for ipv6) set to 1. On the active router the net.ipv6.conf.all.forwarding option was set to 0. After removing SLAAC addresses on the backup router, setting accept_ra to 0 and enabling ipv6 forwarding on the active router it started working again. Please let me know if you need anything to troubleshoot this here or on IRC (tobias-urdin). Best regards Tobias [1] https://review.openstack.org/#/q/topic:bug/1667756+(status:open+OR+status:merged [2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133499.html ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1787919 Title: Upgrade router to L3 HA broke IPv6 Status in neutron: New Bug description: When I disabled a router, changed it to L3 HA and enabled it again all the logic that was implemented in [1] did not seem to work. Please see the thread on ML [2] for details. The backup router had the net.ipv6.conf.qr-.accept_ra values for the qr interfaces (one for ipv4 and one for ipv6) set to 1. On the active router the net.ipv6.conf.all.forwarding option was set to 0. After removing SLAAC addresses on the backup router, setting accept_ra to 0 and enabling ipv6 forwarding on the active router it started working again. Please let me know if you need anything to troubleshoot this here or on IRC (tobias-urdin). Best regards Tobias [1] https://review.openstack.org/#/q/topic:bug/1667756+(status:open+OR+status:merged [2] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133499.html To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1787919/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1787385] [NEW] vpnaas and dynamic-routing missing neutron-tempest-plugin in test-requirements.txt
Public bug reported: The vpnaas and dynamic routing projects are missing the neutron-tempest- plugin in test-requirements.txt ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1787385 Title: vpnaas and dynamic-routing missing neutron-tempest-plugin in test- requirements.txt Status in neutron: New Bug description: The vpnaas and dynamic routing projects are missing the neutron- tempest-plugin in test-requirements.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1787385/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1784590] [NEW] neutron-dynamic-routing bgp agent should have options for MP-BGP
Public bug reported: neutron-dynamic-routing The implementation of BGP with Ryu supports IPv4 and IPv6 peers but the MP-BGP capabilities is announced based on if the peer is a v4 or v6 address. If you want to use a IPv4 peer but announce IPv6 prefixes this will not work because in services/bgp/agent/driver/ryu/driver.py in the function add_bgp_peer() it disables the IPv6 MP-BGP capability if the peer IP is a IPv4 address. This should be extended to support setting the capabilities manually, if you change the enable_ipv6 variable in the add_bgp_peer() function to True it will correctly announce IPv6 prefixes over the IPv4 BGP peer if the upstream router (the other side) supports the MP-BGP IPv6 capability. Should be easy to implement with a "mode" config option that can be set to auto or manual and then options to override the capabilities. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1784590 Title: neutron-dynamic-routing bgp agent should have options for MP-BGP Status in neutron: New Bug description: neutron-dynamic-routing The implementation of BGP with Ryu supports IPv4 and IPv6 peers but the MP-BGP capabilities is announced based on if the peer is a v4 or v6 address. If you want to use a IPv4 peer but announce IPv6 prefixes this will not work because in services/bgp/agent/driver/ryu/driver.py in the function add_bgp_peer() it disables the IPv6 MP-BGP capability if the peer IP is a IPv4 address. This should be extended to support setting the capabilities manually, if you change the enable_ipv6 variable in the add_bgp_peer() function to True it will correctly announce IPv6 prefixes over the IPv4 BGP peer if the upstream router (the other side) supports the MP-BGP IPv6 capability. Should be easy to implement with a "mode" config option that can be set to auto or manual and then options to override the capabilities. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1784590/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1784342] [NEW] AttributeError: 'Subnet' object has no attribute '_obj_network_id'
Public bug reported: Running rally caused subnets to be created without a network_id causing this AttributeError. OpenStack Queens RDO packages [root@controller1 ~]# rpm -qa | grep -i neutron python-neutron-12.0.2-1.el7.noarch openstack-neutron-12.0.2-1.el7.noarch python2-neutron-dynamic-routing-12.0.1-1.el7.noarch python2-neutron-lib-1.13.0-1.el7.noarch openstack-neutron-dynamic-routing-common-12.0.1-1.el7.noarch python2-neutronclient-6.7.0-1.el7.noarch openstack-neutron-bgp-dragent-12.0.1-1.el7.noarch openstack-neutron-common-12.0.2-1.el7.noarch openstack-neutron-ml2-12.0.2-1.el7.noarch MariaDB [neutron]> select project_id, id, name, network_id, cidr from subnets where network_id is null; +--+--+---++-+ | project_id | id | name | network_id | cidr| +--+--+---++-+ | b80468629bc5410ca2c53a7cfbf002b3 | 7a23c72b- 3df8-4641-a494-af7642563c8e | s_rally_1e4bebf1_1s3IN6mo | NULL | 1.9.13.0/24 | | b80468629bc5410ca2c53a7cfbf002b3 | f7a57946-4814-477a-9649-cc475fb4e7b2 | s_rally_1e4bebf1_qWSFSMs9 | NULL | 1.5.20.0/24 | +--+--+---++-+ 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation [req-c921b9fb-499b-41c1-9103-93e71a70820c b6b96932bbef41fdbf957c2dc01776aa 050c556faa5944a8953126c867313770 - default default] GET failed.: AttributeError: 'Subnet' object has no attribute '_obj_network_id' 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation Traceback (most recent call last): 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/pecan/core.py", line 678, in __call__ 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation self.invoke_controller(controller, args, kwargs, state) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/pecan/core.py", line 569, in invoke_controller 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation result = controller(*args, **kwargs) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 91, in wrapped 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation setattr(e, '_RETRY_EXCEEDED', True) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation self.force_reraise() 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation six.reraise(self.type_, self.value, self.tb) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 87, in wrapped 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation return f(*args, **kwargs) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 147, in wrapper 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation ectxt.value = e.inner_exc 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation self.force_reraise() 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation six.reraise(self.type_, self.value, self.tb) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 135, in wrapper 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation return f(*args, **kwargs) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/neutron/db/api.py", line 126, in wrapped 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation LOG.debug("Retry wrapper got retriable exception: %s", e) 2018-07-30 10:35:13.351 42618 ERROR neutron.pecan_wsgi.hooks.translation File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line
[Yahoo-eng-team] [Bug 1771517] [NEW] Quota update unexpected behavior with no access to keystone
Public bug reported: Distro: OpenStack Queens running on Ubuntu 16.04 >From this commit [1] nova now needs access to keystone to perform quota (this bug is mostly related to issue we had with quota update). When keystone is not available the nova-api (running in eventlet) tries to use the endpoints ordered in [keystone]/valid_interfaces, we did not have access to the internal endpoint which caused this issue: 2018-05-14 15:54:46.134 1241 INFO nova.api.openstack.identity [req- 8b383cf0-7f99-41e6-9de3-5e694fb24449 f13940ac09924d8582fe6612e838c7a7 9387d3a7be2a487784a90660b6e182cb - default default] Unable to contact keystone to verify project_id You'll also see: 2018-05-14 15:54:46.419 1241 INFO nova.osapi_compute.wsgi.server [req-e9da4d33-05be-42fe-891d-0d201d2e8311 83e8a17bf7874682a86f9aa58f4c9507 e83ea76e472f48679f6fa6070a8a16e1 - default default] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 512, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 453, in write wfile.flush() File "/usr/lib/python2.7/socket.py", line 307, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 385, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 379, in send return self._send_loop(self.fd.send, data, flags) File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 366, in _send_loop return send_method(data, *args) error: [Errno 104] Connection reset by peer Now this is correct, however what happens next is imo not correct, it generates a 200 OK response when it actually failed to perform the requested action. 2018-05-14 15:54:46.420 1241 INFO nova.osapi_compute.wsgi.server [req- e9da4d33-05be-42fe-891d-0d201d2e8311 83e8a17bf7874682a86f9aa58f4c9507 e83ea76e472f48679f6fa6070a8a16e1 - default default] :::195.74.38.54,172.20.104.11 "PUT /v2/e83ea76e472f48679f6fa6070a8a16e1/os-quota- sets/e83ea76e472f48679f6fa6070a8a16e1 HTTP/1.1" status: 200 len: 0 time: 128.0125880 For us we were able to notice this with 504 gateway error because the time the request took (128 seconds) was too long for our load balancer to allow. I think atleast catching the exception and setting the return code to 500 would be appropriate, and also output it as an error and not a INFO message. [1] https://github.com/openstack/nova/commit/1f120b5649ba03aa5b2490a82c08b77c580f12d7 ** Affects: nova Importance: Undecided Status: New ** Description changed: + Distro: OpenStack Queens running on Ubuntu 16.04 + From this commit [1] nova now needs access to keystone to perform quota (this bug is mostly related to issue we had with quota update). When keystone is not available the nova-api (running in eventlet) tries to use the endpoints ordered in [keystone]/valid_interfaces, we did not have access to the internal endpoint which caused this issue: 2018-05-14 15:54:46.134 1241 INFO nova.api.openstack.identity [req- 8b383cf0-7f99-41e6-9de3-5e694fb24449 f13940ac09924d8582fe6612e838c7a7 9387d3a7be2a487784a90660b6e182cb - default default] Unable to contact keystone to verify project_id You'll also see: 2018-05-14 15:54:46.419 1241 INFO nova.osapi_compute.wsgi.server [req-e9da4d33-05be-42fe-891d-0d201d2e8311 83e8a17bf7874682a86f9aa58f4c9507 e83ea76e472f48679f6fa6070a8a16e1 - default default] Traceback (most recent call last): - File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 512, in handle_one_response -write(b''.join(towrite)) - File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 453, in write -wfile.flush() - File "/usr/lib/python2.7/socket.py", line 307, in flush -self._sock.sendall(view[write_offset:write_offset+buffer_size]) - File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 385, in sendall -tail = self.send(data, flags) - File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 379, in send -return self._send_loop(self.fd.send, data, flags) - File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 366, in _send_loop -return send_method(data, *args) + File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 512, in handle_one_response + write(b''.join(towrite)) + File "/usr/lib/python2.7/dist-packages/eventlet/wsgi.py", line 453, in write + wfile.flush() + File "/usr/lib/python2.7/socket.py", line 307, in flush + self._sock.sendall(view[write_offset:write_offset+buffer_size]) + File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 385, in sendall + tail = self.send(data, flags) + File "/usr/lib/python2.7/dist-packages/eventlet/greenio/base.py", line 379, in send + return self._send_loop(self.fd.send, data, flags) + File
[Yahoo-eng-team] [Bug 1649616] Re: Keystone Token Flush job does not complete in HA deployed environment
** Changed in: puppet-keystone Status: Triaged => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1649616 Title: Keystone Token Flush job does not complete in HA deployed environment Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Released Status in Ubuntu Cloud Archive newton series: Fix Released Status in Ubuntu Cloud Archive ocata series: Fix Released Status in OpenStack Identity (keystone): Fix Released Status in OpenStack Identity (keystone) newton series: In Progress Status in OpenStack Identity (keystone) ocata series: In Progress Status in puppet-keystone: Fix Released Status in tripleo: Fix Released Status in keystone package in Ubuntu: Invalid Status in keystone source package in Xenial: Fix Released Status in keystone source package in Yakkety: Fix Released Status in keystone source package in Zesty: Fix Released Bug description: [Impact] * The Keystone token flush job can get into a state where it will never complete because the transaction size exceeds the mysql galara transaction size - wsrep_max_ws_size (1073741824). [Test Case] 1. Authenticate many times 2. Observe that keystone token flush job runs (should be a very long time depending on disk) >20 hours in my environment 3. Observe errors in mysql.log indicating a transaction that is too large Actual results: Expired tokens are not actually flushed from the database without any errors in keystone.log. Only errors appear in mysql.log. Expected results: Expired tokens to be removed from the database [Additional info:] It is likely that you can demonstrate this with less than 1 million tokens as the >1 million token table is larger than 13GiB and the max transaction size is 1GiB, my token bench-marking Browbeat job creates more than needed. Once the token flush job can not complete the token table will never decrease in size and eventually the cloud will run out of disk space. Furthermore the flush job will consume disk utilization resources. This was demonstrated on slow disks (Single 7.2K SATA disk). On faster disks you will have more capacity to generate tokens, you can then generate the number of tokens to exceed the transaction size even faster. Log evidence: [root@overcloud-controller-0 log]# grep " Total expired" /var/log/keystone/keystone.log 2016-12-08 01:33:40.530 21614 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1082434 2016-12-09 09:31:25.301 14120 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1084241 2016-12-11 01:35:39.082 4223 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1086504 2016-12-12 01:08:16.170 32575 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1087823 2016-12-13 01:22:18.121 28669 INFO keystone.token.persistence.backends.sql [-] Total expired tokens removed: 1089202 [root@overcloud-controller-0 log]# tail mysqld.log 161208 1:33:41 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592 161208 1:33:41 [ERROR] WSREP: rbr write fail, data_len: 0, 2 161209 9:31:26 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592 161209 9:31:26 [ERROR] WSREP: rbr write fail, data_len: 0, 2 161211 1:35:39 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592 161211 1:35:40 [ERROR] WSREP: rbr write fail, data_len: 0, 2 161212 1:08:16 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592 161212 1:08:17 [ERROR] WSREP: rbr write fail, data_len: 0, 2 161213 1:22:18 [Warning] WSREP: transaction size limit (1073741824) exceeded: 1073774592 161213 1:22:19 [ERROR] WSREP: rbr write fail, data_len: 0, 2 Disk utilization issue graph is attached. The entire job in that graph takes from the first spike is disk util(~5:18UTC) and culminates in about ~90 minutes of pegging the disk (between 1:09utc to 2:43utc). [Regression Potential] * Not identified To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1649616/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1644187] Re: ValueError on creating new nova instance
Was caused by our internal infrastructure sending an invalid API request. I'm sorry for the hassle, marking as invalid. ** Changed in: nova Status: New => Invalid ** Changed in: nova (Ubuntu) Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1644187 Title: ValueError on creating new nova instance Status in OpenStack Compute (nova): Invalid Status in nova package in Ubuntu: Invalid Bug description: New bug introduced when upgrading nova-api on Ubuntu 14.04 for cloud archive liberty stable repo. (deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] Unexpected exception in API method 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions Traceback (most recent call last): 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 478, in wrapped 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 611, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions **create_kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 149, in inner 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions rv = f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1587, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions check_server_group_quota=check_server_group_quota) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1203, in _create_instance 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions block_device_mapping, legacy_bdm) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 862, in _check_and_transform_bdm 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions context, block_device_mapping) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/objects/block_device.py", line 314, in block_device_make_list_from_dicts 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions for bdm in bdm_dicts_list] 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 294, in __init__ 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions setattr(self, key, kwargs[key]) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 71, in setter 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions field_value = field.coerce(self, name, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 189, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return self._type.coerce(obj, attr, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 304, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return int(value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions ValueError: invalid literal for int() with base 10: '' 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions 2016-11-23 11:31:15.361 1600 INFO nova.api.openstack.wsgi [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] HTTP exception thrown: Unexpected API Error. Please report this at
[Yahoo-eng-team] [Bug 1644187] Re: ValueError on creating new nova instance
Booting a nova instance with a block device as root volume. ** Also affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1644187 Title: ValueError on creating new nova instance Status in OpenStack Compute (nova): Confirmed Status in nova package in Ubuntu: Confirmed Bug description: New bug introduced when upgrading nova-api on Ubuntu 14.04 for cloud archive liberty stable repo. (deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] Unexpected exception in API method 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions Traceback (most recent call last): 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 478, in wrapped 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 611, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions **create_kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 149, in inner 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions rv = f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1587, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions check_server_group_quota=check_server_group_quota) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1203, in _create_instance 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions block_device_mapping, legacy_bdm) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 862, in _check_and_transform_bdm 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions context, block_device_mapping) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/objects/block_device.py", line 314, in block_device_make_list_from_dicts 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions for bdm in bdm_dicts_list] 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 294, in __init__ 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions setattr(self, key, kwargs[key]) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 71, in setter 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions field_value = field.coerce(self, name, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 189, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return self._type.coerce(obj, attr, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 304, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return int(value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions ValueError: invalid literal for int() with base 10: '' 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions 2016-11-23 11:31:15.361 1600 INFO nova.api.openstack.wsgi [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. To manage
[Yahoo-eng-team] [Bug 1644187] [NEW] ValueError on creating new nova instance
Public bug reported: New bug introduced when upgrading nova-api on Ubuntu 14.04 for cloud archive liberty stable repo. (deb http://ubuntu-cloud.archive.canonical.com/ubuntu trusty-updates/liberty main) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] Unexpected exception in API method 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions Traceback (most recent call last): 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/extensions.py", line 478, in wrapped 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/validation/__init__.py", line 73, in wrapper 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return func(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/api/openstack/compute/servers.py", line 611, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions **create_kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/hooks.py", line 149, in inner 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions rv = f(*args, **kwargs) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1587, in create 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions check_server_group_quota=check_server_group_quota) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 1203, in _create_instance 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions block_device_mapping, legacy_bdm) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/compute/api.py", line 862, in _check_and_transform_bdm 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions context, block_device_mapping) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/nova/objects/block_device.py", line 314, in block_device_make_list_from_dicts 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions for bdm in bdm_dicts_list] 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 294, in __init__ 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions setattr(self, key, kwargs[key]) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 71, in setter 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions field_value = field.coerce(self, name, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 189, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return self._type.coerce(obj, attr, value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions File "/usr/lib/python2.7/dist-packages/oslo_versionedobjects/fields.py", line 304, in coerce 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions return int(value) 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions ValueError: invalid literal for int() with base 10: '' 2016-11-23 11:31:15.341 1600 ERROR nova.api.openstack.extensions 2016-11-23 11:31:15.361 1600 INFO nova.api.openstack.wsgi [req-e3cb4d1c-a286-4450-aaa2-066072d1f306 3eb3f6111be04a8a957d8a3e8cb2dd86 6470df6b106e47508ebb00db08b557cf - - -] HTTP exception thrown: Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible. ** Affects: nova Importance: Undecided Status: Confirmed ** Affects: nova (Ubuntu) Importance: Undecided Status: Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1644187 Title: ValueError on creating new nova instance Status in OpenStack Compute (nova): Confirmed Status in nova package in Ubuntu: Confirmed Bug description: New bug introduced when upgrading nova-api
[Yahoo-eng-team] [Bug 1583977] Re: liberty neutron-l3-agent ha failes to spawn keepalived
Moving the pid files for the affected router solves the issue. mv /var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.* /root Found fix thanks to frickler on IRC. It has been merged for liberty https://review.openstack.org/#/c/299138/3 ** Changed in: cloud-archive Status: New => Invalid ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1583977 Title: liberty neutron-l3-agent ha failes to spawn keepalived Status in Ubuntu Cloud Archive: Invalid Status in neutron: Invalid Bug description: After upgrading to 7.0.4 I have several routers that fails to spawn the keepalived process. The logs say 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 not found. The process should not have died 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 2016-05-20 11:01:11.182 23023 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-c1cc1a5d-c0ef-47b7-8d5c-88403e134725', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid', '-r', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:85 All these spawns fail and keepalived outputs to syslog May 20 11:01:11 neutron1 Keepalived[46558]: Starting Keepalived v1.2.19 (09/04,2015) May 20 11:01:11 neutron1 Keepalived[46558]: daemon is already running but the daemon is not running the only thing running is the neutron-keepalived-state-change root@neutron1:~# ps auxf | grep c1cc1a5d root 48137 0.0 0.0 11740 936 pts/4S+ 11:03 0:00 | \_ grep --color=auto c1cc1a5d neutron 21671 0.0 0.0 124924 40172 ?SMay19 0:00 /usr/bin/python /usr/bin/neutron-keepalived-state-change --router_id=c1cc1a5d-c0ef-47b7-8d5c-88403e134725 --namespace=qrouter-c1cc1a5d-c0ef-47b7-8d5c-88403e134725 --conf_dir=/var/lib/neutron/ha_confs/c1cc1a5-c0ef-47b7-8d5c-88403e134725 --monitor_interface=ha-ef4e2a2f-66 --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.monitor.pid --state_path=/var/lib/neutron --user=107 --group=112 ProblemType: Bug DistroRelease: Ubuntu 14.04 Package: neutron-l3-agent 2:7.0.4-0ubuntu1~cloud0 [origin: Canonical] ProcVersionSignature: Ubuntu 3.13.0-86.131-generic 3.13.11-ckt39 Uname: Linux 3.13.0-86-generic x86_64 NonfreeKernelModules: hcpdriver ApportVersion: 2.14.1-0ubuntu3.20 Architecture: amd64 CrashDB: { "impl": "launchpad", "project": "cloud-archive", "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml;, } Date: Fri May 20 11:00:01 2016 PackageArchitecture: all SourcePackage: neutron UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1583977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1583977] [NEW] liberty neutron-l3-agent ha failes to spawn keepalived
Public bug reported: After upgrading to 7.0.4 I have several routers that fails to spawn the keepalived process. The logs say 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 not found. The process should not have died 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 2016-05-20 11:01:11.182 23023 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-c1cc1a5d-c0ef-47b7-8d5c-88403e134725', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid', '-r', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:85 All these spawns fail and keepalived outputs to syslog May 20 11:01:11 neutron1 Keepalived[46558]: Starting Keepalived v1.2.19 (09/04,2015) May 20 11:01:11 neutron1 Keepalived[46558]: daemon is already running but the daemon is not running the only thing running is the neutron-keepalived-state-change root@neutron1:~# ps auxf | grep c1cc1a5d root 48137 0.0 0.0 11740 936 pts/4S+ 11:03 0:00 | \_ grep --color=auto c1cc1a5d neutron 21671 0.0 0.0 124924 40172 ?SMay19 0:00 /usr/bin/python /usr/bin/neutron-keepalived-state-change --router_id=c1cc1a5d-c0ef-47b7-8d5c-88403e134725 --namespace=qrouter-c1cc1a5d-c0ef-47b7-8d5c-88403e134725 --conf_dir=/var/lib/neutron/ha_confs/c1cc1a5-c0ef-47b7-8d5c-88403e134725 --monitor_interface=ha-ef4e2a2f-66 --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.monitor.pid --state_path=/var/lib/neutron --user=107 --group=112 ProblemType: Bug DistroRelease: Ubuntu 14.04 Package: neutron-l3-agent 2:7.0.4-0ubuntu1~cloud0 [origin: Canonical] ProcVersionSignature: Ubuntu 3.13.0-86.131-generic 3.13.11-ckt39 Uname: Linux 3.13.0-86-generic x86_64 NonfreeKernelModules: hcpdriver ApportVersion: 2.14.1-0ubuntu3.20 Architecture: amd64 CrashDB: { "impl": "launchpad", "project": "cloud-archive", "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml;, } Date: Fri May 20 11:00:01 2016 PackageArchitecture: all SourcePackage: neutron UpgradeStatus: No upgrade log present (probably fresh install) ** Affects: cloud-archive Importance: Undecided Status: New ** Affects: neutron Importance: Undecided Status: New ** Tags: amd64 apport-bug regression-update third-party-packages trusty ** Also affects: neutron Importance: Undecided Status: New ** Tags added: regression-update -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1583977 Title: liberty neutron-l3-agent ha failes to spawn keepalived Status in Ubuntu Cloud Archive: New Status in neutron: New Bug description: After upgrading to 7.0.4 I have several routers that fails to spawn the keepalived process. The logs say 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 not found. The process should not have died 2016-05-20 11:01:11.181 23023 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid c1cc1a5d-c0ef-47b7-8d5c-88403e134725 2016-05-20 11:01:11.182 23023 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-c1cc1a5d-c0ef-47b7-8d5c-88403e134725', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid', '-r', '/var/lib/neutron/ha_confs/c1cc1a5d-c0ef-47b7-8d5c-88403e134725.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:85 All these spawns fail and keepalived outputs to syslog May 20 11:01:11 neutron1 Keepalived[46558]: Starting Keepalived v1.2.19 (09/04,2015) May 20 11:01:11 neutron1 Keepalived[46558]: daemon is already running but the daemon is not running the only thing running is the neutron-keepalived-state-change root@neutron1:~# ps auxf | grep c1cc1a5d root 48137 0.0 0.0 11740 936 pts/4S+ 11:03 0:00 | \_ grep --color=auto c1cc1a5d neutron 21671 0.0 0.0 124924 40172 ?SMay19 0:00 /usr/bin/python /usr/bin/neutron-keepalived-state-change --router_id=c1cc1a5d-c0ef-47b7-8d5c-88403e134725
[Yahoo-eng-team] [Bug 1525802] Re: live migration with multipath cinder volumes crashes node
Resolved by changing the no_path_retry option in multipath.conf from "queue" to "0". The issue was that that when IO was queued and the path was about to be removed it was blocked and was never removed, so the flushing of the multipath device failed because the multipath device was in-use by this stuck device. I also changed removed the VIR_MIGRATE_TUNNELLED value from the live_migration_flag option in nova.conf by recommendation from Kashyap Chamarthy (kashyapc). To reload the multipath.conf while multipathd is running (won't stop or break your multipath devices). multipathd -k reconfigure Resolved with good help from these links: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=623613 http://linux.die.net/man/5/multipath.conf http://christophe.varoqui.free.fr/refbook.html Right now, 26 live migrations and counting without any issues. Best regards ** Changed in: nova Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1525802 Title: live migration with multipath cinder volumes crashes node Status in OpenStack Compute (nova): Invalid Bug description: Hello, When issuing a live migration between kvm nodes having multipath cinder volume it sometimes hangs and causes qemu-kvm to crash, the only solution is a restart of the kvm node. Sometimes when live migrating you get stuck when it tries to migrate the active RAM, you will see something like this in the nova-compute.log: http://paste.openstack.org/show/481773/ As you can see it get's nowhere. What is happening in the backgroun is that for some reason the multipath volumes when viewing with 'multipath -ll' they go into a 'faulty running' state and causes issues with the block device causing the qemu-kvm process to hang, the kvm node also tries to run blkid and kpart but all of those hang, which means you can get 100+ load just for those stuck processes. [1015086.978188] end_request: I/O error, dev sdg, sector 41942912 [1015086.978398] device-mapper: multipath: Failing path 8:96. [1015088.547034] qbr8eff45f7-ed: port 1(qvb8eff45f7-ed) entered disabled state [1015088.791695] INFO: task qemu-system-x86:19383 blocked for more than 120 seconds. [1015088.791940] Tainted: P OX 3.13.0-68-generic #111-Ubuntu [1015088.792147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1015088.792396] qemu-system-x86 D 88301f2f3180 0 19383 1 0x [1015088.792404] 8817440ada88 0086 8817fa574800 8817440adfd8 [1015088.792414] 00013180 00013180 8817fa574800 88301f2f3a18 [1015088.792420] 882ff7ab5280 8817fa574800 [1015088.792426] Call Trace: [1015088.792440] [] io_schedule+0x9d/0x140 [1015088.792449] [] do_blockdev_direct_IO+0x1ce4/0x2910 [1015088.792456] [] ? I_BDEV+0x10/0x10 [1015088.792462] [] __blockdev_direct_IO+0x55/0x60 [1015088.792467] [] ? I_BDEV+0x10/0x10 [1015088.792472] [] blkdev_direct_IO+0x56/0x60 [1015088.792476] [] ? I_BDEV+0x10/0x10 [1015088.792482] [] generic_file_direct_write+0xc1/0x180 [1015088.792487] [] __generic_file_aio_write+0x305/0x3d0 [1015088.792492] [] blkdev_aio_write+0x46/0x90 [1015088.792501] [] do_sync_write+0x5a/0x90 [1015088.792507] [] vfs_write+0xb4/0x1f0 [1015088.792512] [] SyS_pwrite64+0x72/0xb0 [1015088.792519] [] system_call_fastpath+0x1a/0x1f root 19410 0.0 0.0 0 0 ?D08:12 0:00 [blkid] root 19575 0.0 0.0 0 0 ?D08:13 0:00 [blkid] root 19584 0.0 0.0 28276 1076 ?S08:13 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 21734 0.0 0.0 28276 1080 ?D08:15 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 21735 0.0 0.0 28276 1076 ?S08:15 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22419 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22420 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 22864 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22865 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 23316 0.0 0.0 28276 1076 ?D08:17 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 23317 0.0 0.0 28276 1072 ?D08:17 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 23756
[Yahoo-eng-team] [Bug 1525802] [NEW] live migration with multipath cinder volumes crashes node
Public bug reported: Hello, When issuing a live migration between kvm nodes having multipath cinder volume it sometimes hangs and causes qemu-kvm to crash, the only solution is a restart of the kvm node. Sometimes when live migrating you get stuck when it tries to migrate the active RAM, you will see something like this in the nova-compute.log: http://paste.openstack.org/show/481773/ As you can see it get's nowhere. What is happening in the backgroun is that for some reason the multipath volumes when viewing with 'multipath -ll' they go into a 'faulty running' state and causes issues with the block device causing the qemu-kvm process to hang, the kvm node also tries to run blkid and kpart but all of those hang, which means you can get 100+ load just for those stuck processes. [1015086.978188] end_request: I/O error, dev sdg, sector 41942912 [1015086.978398] device-mapper: multipath: Failing path 8:96. [1015088.547034] qbr8eff45f7-ed: port 1(qvb8eff45f7-ed) entered disabled state [1015088.791695] INFO: task qemu-system-x86:19383 blocked for more than 120 seconds. [1015088.791940] Tainted: P OX 3.13.0-68-generic #111-Ubuntu [1015088.792147] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1015088.792396] qemu-system-x86 D 88301f2f3180 0 19383 1 0x [1015088.792404] 8817440ada88 0086 8817fa574800 8817440adfd8 [1015088.792414] 00013180 00013180 8817fa574800 88301f2f3a18 [1015088.792420] 882ff7ab5280 8817fa574800 [1015088.792426] Call Trace: [1015088.792440] [] io_schedule+0x9d/0x140 [1015088.792449] [] do_blockdev_direct_IO+0x1ce4/0x2910 [1015088.792456] [] ? I_BDEV+0x10/0x10 [1015088.792462] [] __blockdev_direct_IO+0x55/0x60 [1015088.792467] [] ? I_BDEV+0x10/0x10 [1015088.792472] [] blkdev_direct_IO+0x56/0x60 [1015088.792476] [] ? I_BDEV+0x10/0x10 [1015088.792482] [] generic_file_direct_write+0xc1/0x180 [1015088.792487] [] __generic_file_aio_write+0x305/0x3d0 [1015088.792492] [] blkdev_aio_write+0x46/0x90 [1015088.792501] [] do_sync_write+0x5a/0x90 [1015088.792507] [] vfs_write+0xb4/0x1f0 [1015088.792512] [] SyS_pwrite64+0x72/0xb0 [1015088.792519] [] system_call_fastpath+0x1a/0x1f root 19410 0.0 0.0 0 0 ?D08:12 0:00 [blkid] root 19575 0.0 0.0 0 0 ?D08:13 0:00 [blkid] root 19584 0.0 0.0 28276 1076 ?S08:13 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 21734 0.0 0.0 28276 1080 ?D08:15 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 21735 0.0 0.0 28276 1076 ?S08:15 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22419 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22420 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 22864 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 22865 0.0 0.0 28276 1076 ?D08:16 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 23316 0.0 0.0 28276 1076 ?D08:17 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 23317 0.0 0.0 28276 1072 ?D08:17 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650ed root 23756 0.0 0.0 28276 1076 ?D08:17 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 24200 0.0 0.0 28276 1076 ?D08:18 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 24637 0.0 0.0 28276 1072 ?D08:18 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root 25058 0.0 0.0 28276 1076 ?D08:19 0:00 /sbin/kpartx -a -p -part /dev/mapper/36000d31000a650c6 root@kvm3:~# Ultimately this will cause so much issues on your kvm node that the only fix is a restart because of all the libvirt locks you won't be able to stop, restart or destroy the qemu-kvm process and issuing a kill -9 won't help you either, the only solution is a restart. What will happen is that your live migration will fail with something like this. 2015-12-14 08:19:51.577 23821 ERROR nova.compute.manager [req-99771cf6-d17e-49f7-a01d-38201afbce69 212f451de64b4ae89c853f1430510037 e47ebdf3f3934025b37df3b85bdfd565 - - -] [instance: babf696c-55d1-4bde-be83-3124be2ac7f2] Live migration failed. 2015-12-14 08:19:51.577 23821 ERROR nova.compute.manager [instance: babf696c-55d1-4bde-be83-3124be2ac7f2] Traceback (most recent call last): 2015-12-14
[Yahoo-eng-team] [Bug 1459791] Re: Juno to Kilo upgrade breaks default domain id
Setting to invalid to clean up and since it seems like I was the only one having this issue. ** Changed in: keystone Status: In Progress => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1459791 Title: Juno to Kilo upgrade breaks default domain id Status in OpenStack Identity (keystone): Invalid Bug description: Hello, Upgrading from Keystone Juno to Kilo breaks my build. I have had close looks warnings and debug output in keystone.log and read notes on https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#OpenStack_Identity_.28Keystone.29 but without any luck, I could simply bypass this but it's here for a reason. 2015-05-28 22:51:59.400 1559 ERROR keystone.common.wsgi [-] 'NoneType' object has no attribute 'get' 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi Traceback (most recent call last): 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/wsgi.py", line 239, in __call__ 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi result = method(context, **params) 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/identity/controllers.py", line 51, in get_users 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return {'users': self.v3_to_v2_user(user_list)} 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/controller.py", line 309, in v3_to_v2_user 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return [_normalize_and_filter_user_properties(x) for x in ref] 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/controller.py", line 301, in _normalize_and_filter_user_properties 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi V2Controller.filter_domain(ref) 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File "/usr/lib/python2.7/site-packages/keystone/common/controller.py", line 235, in filter_domain 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi if ref['domain'].get('id') != CONF.identity.default_domain_id: 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi AttributeError: 'NoneType' object has no attribute 'get' It occurs here "/usr/lib/python2.7/site- packages/keystone/common/controller.py", line 235 @staticmethod def filter_domain(ref): """Remove domain since v2 calls are not domain-aware. V3 Fernet tokens builds the users with a domain in the token data. This method will ensure that users create in v3 belong to the default domain. """ if 'domain' in ref: if ref['domain'].get('id') != CONF.identity.default_domain_id: raise exception.Unauthorized( _('Non-default domain is not supported')) del ref['domain'] return ref Configuration: [DEFAULT] debug = false verbose = true [assignment] [auth] [cache] [catalog] [credential] [database] connection=mysql://keystone:xxx@xxx/keystone [domain_config] [endpoint_filter] [endpoint_policy] [eventlet_server] [eventlet_server_ssl] [federation] [fernet_tokens] [identity] [identity_mapping] [kvs] [ldap] [matchmaker_redis] [matchmaker_ring] [memcache] servers = localhost:11211 [oauth1] [os_inherit] [oslo_messaging_amqp] [oslo_messaging_qpid] [oslo_messaging_rabbit] [oslo_middleware] [oslo_policy] [paste_deploy] [policy] [resource] [revoke] driver = keystone.contrib.revoke.backends.sql.Revoke [role] [saml] [signing] [ssl] [token] provider = keystone.token.providers.uuid.Provider driver = keystone.token.persistence.backends.memcache.Token [trust] Best regards To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1459791/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1508907] [NEW] local_gb_used wrong in compute_nodes table when using Dell Cinder backend
Public bug reported: We have compute nodes with a very small amount of disk so we deploy our instances to Cinder volumes with a Dell backend. The issue is that when creating instances with a Cinder volume it still gets counted towards the local storage used (local_gb_used in compute_nodes table of the nova database) which results in faulty information on what's actually stored on local disk. Before: nova hypervisor-stats +--++ | Property | Value | +--++ | local_gb | 425| | local_gb_used| 80 | +--++ cinder list +--+---+---+--+-+--+-+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+---+--+-+--+-+ +--+---+---+--+-+--+-+ nova list ++--+++-+--+ | ID | Name | Status | Task State | Power State | Networks | ++--+++-+--+ ++--+++-+--+ After booting a new instance with 40 GB cinder volume. nova hypervisor-stats +--++ | Property | Value | +--++ | local_gb | 425| | local_gb_used| 120| cinder list +--+---+---+--+-+--+--+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+---+--+-+--+--+ | 15345aa2-efc5-4a02-924a-963c0572a399 | in-use | None | 40 | None| true | 29cbe001-4eca-4b2c-972e-c19121a7cc31 | +--+---+---+--+-+--+--+ nova list +--++++-++ | ID | Name | Status | Task State | Power State | Networks | +--++++-++ | 29cbe001-4eca-4b2c-972e-c19121a7cc31 | tester | ACTIVE | - | Running | test=192.168.28.25 | +--++++-++ So the volume is counted as local storage which is wrong and prevents us from knowing if an instance has been booted on local disk which we need to know since we don't have any local disk for usage. Anybody got any clues? Best regards ** Affects: nova Importance: Undecided Status: New ** Tags: cinder -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1508907 Title: local_gb_used wrong in compute_nodes table when using Dell Cinder backend Status in OpenStack Compute (nova): New Bug description: We have compute nodes with a very small amount of disk so we deploy our instances to Cinder volumes with a Dell backend. The issue is that when creating instances with a Cinder volume it still gets counted towards the local storage used (local_gb_used in compute_nodes table of the nova database) which results in faulty information on what's actually stored on local disk. Before: nova hypervisor-stats +--++ | Property | Value | +--++ | local_gb | 425| | local_gb_used| 80 | +--++ cinder list +--+---+---+--+-+--+-+ | ID | Status | Name | Size | Volume Type | Bootable | Attached to | +--+---+---+--+-+--+-+ +--+---+---+--+-+--+-+ nova list
[Yahoo-eng-team] [Bug 1459791] [NEW] Juno to Kilo upgrade breaks default domain id
Public bug reported: Hello, Upgrading from Keystone Juno to Kilo breaks my build. I have had close looks warnings and debug output in keystone.log and read notes on https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#OpenStack_Identity_.28Keystone.29 but without any luck, I could simply bypass this but it's here for a reason. 2015-05-28 22:51:59.400 1559 ERROR keystone.common.wsgi [-] 'NoneType' object has no attribute 'get' 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi Traceback (most recent call last): 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/wsgi.py, line 239, in __call__ 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi result = method(context, **params) 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/identity/controllers.py, line 51, in get_users 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return {'users': self.v3_to_v2_user(user_list)} 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 309, in v3_to_v2_user 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return [_normalize_and_filter_user_properties(x) for x in ref] 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 301, in _normalize_and_filter_user_properties 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi V2Controller.filter_domain(ref) 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 235, in filter_domain 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi if ref['domain'].get('id') != CONF.identity.default_domain_id: 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi AttributeError: 'NoneType' object has no attribute 'get' It occurs here /usr/lib/python2.7/site- packages/keystone/common/controller.py, line 235 @staticmethod def filter_domain(ref): Remove domain since v2 calls are not domain-aware. V3 Fernet tokens builds the users with a domain in the token data. This method will ensure that users create in v3 belong to the default domain. if 'domain' in ref: if ref['domain'].get('id') != CONF.identity.default_domain_id: raise exception.Unauthorized( _('Non-default domain is not supported')) del ref['domain'] return ref Configuration: [DEFAULT] debug = false verbose = true [assignment] [auth] [cache] [catalog] [credential] [database] connection=mysql://keystone:xxx@xxx/keystone [domain_config] [endpoint_filter] [endpoint_policy] [eventlet_server] [eventlet_server_ssl] [federation] [fernet_tokens] [identity] [identity_mapping] [kvs] [ldap] [matchmaker_redis] [matchmaker_ring] [memcache] servers = localhost:11211 [oauth1] [os_inherit] [oslo_messaging_amqp] [oslo_messaging_qpid] [oslo_messaging_rabbit] [oslo_middleware] [oslo_policy] [paste_deploy] [policy] [resource] [revoke] driver = keystone.contrib.revoke.backends.sql.Revoke [role] [saml] [signing] [ssl] [token] provider = keystone.token.providers.uuid.Provider driver = keystone.token.persistence.backends.memcache.Token [trust] Best regards ** Affects: keystone Importance: Undecided Status: New ** Description changed: Hello, Upgrading from Keystone Juno to Kilo breaks my build. I have had close looks warnings and debug output in keystone.log and read notes on https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#OpenStack_Identity_.28Keystone.29 but without any luck, I could simply bypass this but it's here for a reason. 2015-05-28 22:51:59.400 1559 ERROR keystone.common.wsgi [-] 'NoneType' object has no attribute 'get' 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi Traceback (most recent call last): 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/wsgi.py, line 239, in __call__ 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi result = method(context, **params) 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/identity/controllers.py, line 51, in get_users 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return {'users': self.v3_to_v2_user(user_list)} 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 309, in v3_to_v2_user 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi return [_normalize_and_filter_user_properties(x) for x in ref] 2015-05-28 22:51:59.400 1559 TRACE keystone.common.wsgi File /usr/lib/python2.7/site-packages/keystone/common/controller.py, line 301, in _normalize_and_filter_user_properties