[Yahoo-eng-team] [Bug 1992161] [NEW] Unknown quota resource security_group_rule in neutron-rpc-server
Public bug reported: When restarting our linuxbridge-agents, we see exceptions for some of the networks: Unknown quota resources ['security_group_rule']. This stops the linuxbridge-agent from fully bringing up that network. Prerequisites: * run api-server and rpc-server in different process We have neutron-server running with uWSGI and start the neutron-rpc-server in another container. Steps to reproduce: * have a project with server/network/ports * have an unused default security group * delete the default security group * restart the appropriate linuxbridge-agent Version: * Ussuri with custom patches on top: https://github.com/sapcc/neutron Expected behavior: linuxbridge-agent should bring up all networks even if the user deleted the default security group. Either don't create a default security-group when called via the linuxbridge-agent instead of the API or make the quota available in the rpc-server so the default security-group can be created. Creating/updating a port or creating a network via API will create the default security group and fix the problem on the linuxbridge-agent, too. I just don't think that's acceptable to have the user/admin do some API actions in case the user did something they maybe shouldn't have. We've also seen the same exception from a dhcp-agent. Attached both a traceback from linuxbridge as well as from dhcp-agent. Trying to debug this, we found that no quota resources are registered in neutron-rpc-server. This can be seen when using the eventlet backdoor by these commands: from neutron.quota import resource_registry; resource_registry.get_all_resources() ** Affects: neutron Importance: Undecided Status: New ** Attachment added: "tracebacks from dhcp-agent and linuxbridge agent calling neutron-rpc-server" https://bugs.launchpad.net/bugs/1992161/+attachment/5622035/+files/rpc-no-default-security-group-creation.txt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1992161 Title: Unknown quota resource security_group_rule in neutron-rpc-server Status in neutron: New Bug description: When restarting our linuxbridge-agents, we see exceptions for some of the networks: Unknown quota resources ['security_group_rule']. This stops the linuxbridge-agent from fully bringing up that network. Prerequisites: * run api-server and rpc-server in different process We have neutron-server running with uWSGI and start the neutron-rpc-server in another container. Steps to reproduce: * have a project with server/network/ports * have an unused default security group * delete the default security group * restart the appropriate linuxbridge-agent Version: * Ussuri with custom patches on top: https://github.com/sapcc/neutron Expected behavior: linuxbridge-agent should bring up all networks even if the user deleted the default security group. Either don't create a default security-group when called via the linuxbridge-agent instead of the API or make the quota available in the rpc-server so the default security-group can be created. Creating/updating a port or creating a network via API will create the default security group and fix the problem on the linuxbridge-agent, too. I just don't think that's acceptable to have the user/admin do some API actions in case the user did something they maybe shouldn't have. We've also seen the same exception from a dhcp-agent. Attached both a traceback from linuxbridge as well as from dhcp-agent. Trying to debug this, we found that no quota resources are registered in neutron-rpc-server. This can be seen when using the eventlet backdoor by these commands: from neutron.quota import resource_registry; resource_registry.get_all_resources() To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1992161/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1989361] Re: extension using collection_actions and collection_methods with path_prefix doesn't get proper URLs
Looking a little more into it, the tests [0] actually always have a "/" prefix in their "path_prefix", which works fine, because the "routes" library calls "stripslashes()" in the "resource()" call and thus we shouldn't end up with double-slashes. [0] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/tests/unit/api/test_extensions.py#L237 ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1989361 Title: extension using collection_actions and collection_methods with path_prefix doesn't get proper URLs Status in neutron: Invalid Bug description: We're creating a new extension downstream to add some special-sauce API endpoints. During that, we tried to use "collection_actions" to create some special actions for our resource. Those ended up being uncallable always returning a 404 as the call was interpreted as a standard "update" call instead of calling our special function. We debugged this down and it turns out the Route object created when registering the API endpoint in [0] ff doesn't contain a "/" at the start of its regexp. Therefore, it doesn't match. This seems to come from the fact that we - other than e.g. the quotasv2 extension [1] - have to set a "path_prefix". Looking at the underlying "routes" library, we automatically get a "/" prefixed for the "resource()" call [2], while the "Submap"'s "submapper()" call needs to already contain the prefixed "/" as exemplified in [3]. Therefore, I propose to prepend a "/" to the "path_prefix" for the code handling "collection_actions" and "collection_methods" and will open a review-request for this. [0] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/api/extensions.py#L159 [1] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/extensions/quotasv2.py#L210-L215 [2] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L1126-L1132 [3] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L78 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1989361/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1989361] [NEW] extension using collection_actions and collection_methods with path_prefix doesn't get proper URLs
Public bug reported: We're creating a new extension downstream to add some special-sauce API endpoints. During that, we tried to use "collection_actions" to create some special actions for our resource. Those ended up being uncallable always returning a 404 as the call was interpreted as a standard "update" call instead of calling our special function. We debugged this down and it turns out the Route object created when registering the API endpoint in [0] ff doesn't contain a "/" at the start of its regexp. Therefore, it doesn't match. This seems to come from the fact that we - other than e.g. the quotasv2 extension [1] - have to set a "path_prefix". Looking at the underlying "routes" library, we automatically get a "/" prefixed for the "resource()" call [2], while the "Submap"'s "submapper()" call needs to already contain the prefixed "/" as exemplified in [3]. Therefore, I propose to prepend a "/" to the "path_prefix" for the code handling "collection_actions" and "collection_methods" and will open a review-request for this. [0] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/api/extensions.py#L159 [1] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/extensions/quotasv2.py#L210-L215 [2] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L1126-L1132 [3] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L78 ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1989361 Title: extension using collection_actions and collection_methods with path_prefix doesn't get proper URLs Status in neutron: New Bug description: We're creating a new extension downstream to add some special-sauce API endpoints. During that, we tried to use "collection_actions" to create some special actions for our resource. Those ended up being uncallable always returning a 404 as the call was interpreted as a standard "update" call instead of calling our special function. We debugged this down and it turns out the Route object created when registering the API endpoint in [0] ff doesn't contain a "/" at the start of its regexp. Therefore, it doesn't match. This seems to come from the fact that we - other than e.g. the quotasv2 extension [1] - have to set a "path_prefix". Looking at the underlying "routes" library, we automatically get a "/" prefixed for the "resource()" call [2], while the "Submap"'s "submapper()" call needs to already contain the prefixed "/" as exemplified in [3]. Therefore, I propose to prepend a "/" to the "path_prefix" for the code handling "collection_actions" and "collection_methods" and will open a review-request for this. [0] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/api/extensions.py#L159 [1] https://github.com/sapcc/neutron/blob/64bef10cd97d1f56647a4d20a7ce0644c18b8ece/neutron/extensions/quotasv2.py#L210-L215 [2] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L1126-L1132 [3] https://github.com/bbangert/routes/blob/main/routes/mapper.py#L78 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1989361/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1949767] [NEW] FIP ports count into quota as they get a project_id set
Public bug reported: With https://github.com/openstack/neutron/commit/d0c172afa6ea38e94563afb4994471420b27cddf Neutron started adding a "project_id" to a FIPs external port, even though https://github.com/openstack/neutron/blob/f97baa0b16687453735e46e7a0f73fe03d7d4db7/neutron/db/l3_db.py#L326 states, that this is "intentionally not set". This makes the ports viewable by the customer in "openstack port list" and lets the ports count into quota, which was not the case pre-train. Is this change intentional? ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1949767 Title: FIP ports count into quota as they get a project_id set Status in neutron: New Bug description: With https://github.com/openstack/neutron/commit/d0c172afa6ea38e94563afb4994471420b27cddf Neutron started adding a "project_id" to a FIPs external port, even though https://github.com/openstack/neutron/blob/f97baa0b16687453735e46e7a0f73fe03d7d4db7/neutron/db/l3_db.py#L326 states, that this is "intentionally not set". This makes the ports viewable by the customer in "openstack port list" and lets the ports count into quota, which was not the case pre-train. Is this change intentional? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1949767/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1930406] [NEW] parallel volume-attachment requests might starve out nova-api for others
Public bug reported: When doing volume-attachemnts, nova-api does an RPC-call (with a long_rpc_timeout) into nova-compute to reserve_block_device_name(). This takes a lock on the instance. If a volume-attachment was in progress, which also takes the instance-lock, nova-api's RPC-call needs to wait. Having RPC-calls in nova-api, that can take a long time, will block the process handling the request. If a project does a lot of volume- attachments (e.g. for a k8s workload > 10 attachments per instance), this could starve out other users of nova-api by occupying all available processes. When running nova-api with eventlet, a small number of processes can handle a lot of requests in parallel and some blocking rpc-calls don't matter too much. When switching to uWSGI, the number of processes would have to be increased drastically to accommodate for that - unless it's possible to map those requests to threads and use a high number of threads instead. What's the recommended way to run nova-api on uWSGI to handle this? Low number of processes with high number of threads to mimic eventlet? ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1930406 Title: parallel volume-attachment requests might starve out nova-api for others Status in OpenStack Compute (nova): New Bug description: When doing volume-attachemnts, nova-api does an RPC-call (with a long_rpc_timeout) into nova-compute to reserve_block_device_name(). This takes a lock on the instance. If a volume-attachment was in progress, which also takes the instance-lock, nova-api's RPC-call needs to wait. Having RPC-calls in nova-api, that can take a long time, will block the process handling the request. If a project does a lot of volume- attachments (e.g. for a k8s workload > 10 attachments per instance), this could starve out other users of nova-api by occupying all available processes. When running nova-api with eventlet, a small number of processes can handle a lot of requests in parallel and some blocking rpc-calls don't matter too much. When switching to uWSGI, the number of processes would have to be increased drastically to accommodate for that - unless it's possible to map those requests to threads and use a high number of threads instead. What's the recommended way to run nova-api on uWSGI to handle this? Low number of processes with high number of threads to mimic eventlet? To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1930406/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1915815] [NEW] vmware: Rescue impossible if VM folder renamed
Public bug reported: Steps to reproduce == * storage-vMotion a VM (this renames the folder to the VM name, i.e. "$uuid" to "$name ($uuid)") * openstack server rescue $uuid Actual Result = Nova's vmware driver raises an exception: Traceback (most recent call last): File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/compute/manager.py", line 3621, in rescue_instance rescue_image_meta, admin_password) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py", line 601, in rescue self._vmops.rescue(context, instance, network_info, image_meta) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/vmops.py", line 1802, in rescue vi.cache_image_path, rescue_disk_path) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/ds_util.py", line 311, in disk_copy session._wait_for_task(copy_disk_task) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py", line 725, in _wait_for_task return self.wait_for_task(task_ref) File "/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/api.py", line 402, in wait_for_task return evt.wait() File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait return hubs.get_hub().switch() File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch return self.greenlet.switch() File "/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/common/loopingcall.py", line 75, in _inner self.f(*self.args, **self.kw) File "/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/api.py", line 449, in _poll_task raise exceptions.translate_fault(task_info.error) FileNotFoundException: File [eph-bb145-3] 551e5570-cf70-4ca0-9f37-e50210c4d2f5/ was not found Expected Result === VM is put into rescue mode and boots the rescue image. This is Environment === This happend on queens, but the same code is still there in master: https://github.com/openstack/nova/blob/a7dd1f8881484ba0bf4270dd48109c2be142c333/nova/virt/vmwareapi/vmops.py#L1228-L1229 ** Affects: nova Importance: Undecided Assignee: Johannes Kulik (jkulik) Status: New ** Changed in: nova Assignee: (unassigned) => Johannes Kulik (jkulik) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1915815 Title: vmware: Rescue impossible if VM folder renamed Status in OpenStack Compute (nova): New Bug description: Steps to reproduce == * storage-vMotion a VM (this renames the folder to the VM name, i.e. "$uuid" to "$name ($uuid)") * openstack server rescue $uuid Actual Result = Nova's vmware driver raises an exception: Traceback (most recent call last): File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/compute/manager.py", line 3621, in rescue_instance rescue_image_meta, admin_password) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py", line 601, in rescue self._vmops.rescue(context, instance, network_info, image_meta) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/vmops.py", line 1802, in rescue vi.cache_image_path, rescue_disk_path) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/ds_util.py", line 311, in disk_copy session._wait_for_task(copy_disk_task) File "/nova-base-source/nova-base-archive-stable-queens-m3/nova/virt/vmwareapi/driver.py", line 725, in _wait_for_task return self.wait_for_task(task_ref) File "/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/api.py", line 402, in wait_for_task return evt.wait() File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait return hubs.get_hub().switch() File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch return self.greenlet.switch() File "/plugins/openstack-base-plugin-oslo-vmware-archive-stable-queens-m3/oslo_vmware/common/loopingcall.py", line 75, in _inner self.f(*self.args, **self.kw) File "/plugins/openstack-base-plugin-oslo-vmware-archive-
[Yahoo-eng-team] [Bug 1870096] [NEW] soft-affinity weight not normalized base on server group's maximum
Public bug reported: Description === When using soft-affinity to schedule instances on the same host, the weight is unexpectedly low if a server was previously scheduled to any server-group with more members on a host. Steps to reproduce == Do not restart nova-scheduler in the process or the bug doesn't appear. * Create a server-group with soft-affinity (let's call it A) * Create 6 servers in server-group A, one after the other so they end up on the same host. * Create another server-group with soft-affinity (B) * Create 1 server in server-group B * Create 1 server in server-group B and look at the scheduler's weights assigned to the hosts by the ServerGroupSoftAffinityWeigher. Expected result === The weight assigned to the host by the ServerGroupSoftAffinityWeigher should be 1, as the maximum number of instances for server-group B is on that host (the one we created there before). Actual result = The weight assigned to the host by the ServerGroupSoftAffinityWeigher is 0.2, as the maximum number of instances ever encountered on a host is 5. Environment === We noticed this on a queens version of nova a year ago. Can't give the exact commit anymore, but the code still looks broken in current master. I've opened a review-request for fixing this bug here: https://review.opendev.org/#/c/713863/ ** Affects: nova Importance: Undecided Assignee: Johannes Kulik (jkulik) Status: In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1870096 Title: soft-affinity weight not normalized base on server group's maximum Status in OpenStack Compute (nova): In Progress Bug description: Description === When using soft-affinity to schedule instances on the same host, the weight is unexpectedly low if a server was previously scheduled to any server-group with more members on a host. Steps to reproduce == Do not restart nova-scheduler in the process or the bug doesn't appear. * Create a server-group with soft-affinity (let's call it A) * Create 6 servers in server-group A, one after the other so they end up on the same host. * Create another server-group with soft-affinity (B) * Create 1 server in server-group B * Create 1 server in server-group B and look at the scheduler's weights assigned to the hosts by the ServerGroupSoftAffinityWeigher. Expected result === The weight assigned to the host by the ServerGroupSoftAffinityWeigher should be 1, as the maximum number of instances for server-group B is on that host (the one we created there before). Actual result = The weight assigned to the host by the ServerGroupSoftAffinityWeigher is 0.2, as the maximum number of instances ever encountered on a host is 5. Environment === We noticed this on a queens version of nova a year ago. Can't give the exact commit anymore, but the code still looks broken in current master. I've opened a review-request for fixing this bug here: https://review.opendev.org/#/c/713863/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1870096/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp