[Yahoo-eng-team] [Bug 1952467] [NEW] OVN cannot create it's files when build from source
Public bug reported: With OVN build from source, devstack will fail on accessing /opt/stack/data/ovn, since ovn bits will be accessing with unprivileged user, while that directory will be created using elevated permissions. As a result, following crash can be observed: + lib/neutron_plugins/ovn_agent:_start_process:227 : sudo systemctl enable devstack@ovn-northd.service Created symlink /etc/systemd/system/multi-user.target.wants/devstack@ovn-northd.service → /etc/systemd/system/devstack@ovn-northd.service. + lib/neutron_plugins/ovn_agent:_start_process:228 : sudo systemctl restart devstack@ovn-northd.service Job for devstack@ovn-northd.service failed because a timeout was exceeded. See "systemctl status devstack@ovn-northd.service" and "journalctl -xe" for details. + lib/neutron_plugins/ovn_agent:_start_process:1 : exit_trap + ./devstack/stack.sh:exit_trap:507: local r=1 ++ ./devstack/stack.sh:exit_trap:508: jobs -p + ./devstack/stack.sh:exit_trap:508: jobs=76154 + ./devstack/stack.sh:exit_trap:511: [[ -n 76154 ]] + ./devstack/stack.sh:exit_trap:511: [[ -n /opt/stack/logs/devstacklog.txt.2021-11-26-151741 ]] + ./devstack/stack.sh:exit_trap:511: [[ True == \T\r\u\e ]] + ./devstack/stack.sh:exit_trap:512: echo 'exit_trap: cleaning up child processes' exit_trap: cleaning up child processes + ./devstack/stack.sh:exit_trap:513: kill 76154 + ./devstack/stack.sh:exit_trap:517: '[' -f /tmp/tmp.vXTRCtpEh5 ']' + ./devstack/stack.sh:exit_trap:518: rm /tmp/tmp.vXTRCtpEh5 + ./devstack/stack.sh:exit_trap:522: kill_spinner + ./devstack/stack.sh:kill_spinner:417 : '[' '!' -z '' ']' + ./devstack/stack.sh:exit_trap:524: [[ 1 -ne 0 ]] + ./devstack/stack.sh:exit_trap:525: echo 'Error on exit' Error on exit + ./devstack/stack.sh:exit_trap:527: type -p generate-subunit + ./devstack/stack.sh:exit_trap:528: generate-subunit 1637939860 1408 fail + ./devstack/stack.sh:exit_trap:530: [[ -z /opt/stack/logs ]] + ./devstack/stack.sh:exit_trap:533: /usr/bin/python3.8 /home/ubuntu/devstack/tools/worlddump.py -d /opt/stack/logs + ./devstack/stack.sh:exit_trap:542: exit 1 and status for the unit will reveal the issue: $ sudo systemctl status devstack@ovn-northd.service ● devstack@ovn-northd.service - Devstack devstack@ovn-northd.service Loaded: loaded (/etc/systemd/system/devstack@ovn-northd.service; enabled; vendor preset: enabled) Active: failed (Result: timeout) since Fri 2021-11-26 15:41:08 UTC; 3min 52s ago Process: 108582 ExecStart=/bin/bash /usr/local/share/ovn/scripts/ovn-ctl --no-monitor start_northd (code=killed, signal=TERM) Tasks: 0 (limit: 19175) Memory: 1.6M CGroup: /system.slice/system-devstack.slice/devstack@ovn-northd.service Nov 26 15:39:38 devstack bash[108640]: chown: cannot access '/usr/local/etc/ovn': No such file or directory Nov 26 15:39:38 devstack ovsdb-server[108641]: ovs|1|vlog|INFO|opened log file /opt/stack/logs/ovsdb-server-sb.log Nov 26 15:39:38 devstack ovsdb-server[108643]: ovs|2|lockfile|WARN|/opt/stack/data/ovn/.ovnsb_db.db.~lock~: failed to open lock file: Permission denied Nov 26 15:39:38 devstack ovsdb-server[108643]: ovs|3|lockfile|WARN|/opt/stack/data/ovn/.ovnsb_db.db.~lock~: failed to lock file: Resource temporarily unavailable Nov 26 15:39:38 devstack bash[108643]: ovsdb-server: I/O error: /opt/stack/data/ovn/ovnsb_db.db: failed to lock lockfile (Resource temporarily unavailable) Nov 26 15:39:38 devstack ovn-sbctl[108647]: ovs|1|sbctl|INFO|Called as ovn-sbctl --no-leader-only init Nov 26 15:41:08 devstack systemd[1]: devstack@ovn-northd.service: start operation timed out. Terminating. Nov 26 15:41:08 devstack systemd[1]: devstack@ovn-northd.service: Killing process 108647 (ovn-sbctl) with signal SIGKILL. Nov 26 15:41:08 devstack systemd[1]: devstack@ovn-northd.service: Failed with result 'timeout'. Nov 26 15:41:08 devstack systemd[1]: Failed to start Devstack devstack@ovn-northd.service. ** Affects: devstack Importance: Undecided Assignee: Roman Dobosz (roman-dobosz) Status: In Progress ** Changed in: neutron Assignee: (unassigned) => Roman Dobosz (roman-dobosz) ** Project changed: neutron => devstack -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1952467 Title: OVN cannot create it's files when build from source Status in devstack: In Progress Bug description: With OVN build from source, devstack will fail on accessing /opt/stack/data/ovn, since ovn bits will be accessing with unprivileged user, while that directory will be created using elevated permissions. As a result, following crash can be observed: + lib/neutron_plugins/ovn_agent:_start_process:227
[Yahoo-eng-team] [Bug 1837529] [NEW] Cannot use push-notification with custom objects
Public bug reported: We have custom object which we would like to have updated in remote resource cache. Currently, in CacheBackedPluginApi resource cache is created on initialization by create_cache_for_l2_agent function which have fixed list of resources to subscribe. If we want to use additional type of resource, there is no other way, than either copy entire class and use custom cache creation function, or alter the list in the neutron code, which is bad either. This isn't a bug, but rather it's an annoying inconvenience, which might be easily fixed. ** Affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1837529 Title: Cannot use push-notification with custom objects Status in neutron: New Bug description: We have custom object which we would like to have updated in remote resource cache. Currently, in CacheBackedPluginApi resource cache is created on initialization by create_cache_for_l2_agent function which have fixed list of resources to subscribe. If we want to use additional type of resource, there is no other way, than either copy entire class and use custom cache creation function, or alter the list in the neutron code, which is bad either. This isn't a bug, but rather it's an annoying inconvenience, which might be easily fixed. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1837529/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1547066] [NEW] Test test_live_migration_pause_vm_invalid_migration_state lacks of execution
Public bug reported: In the test test_live_migration_pause_vm_invalid_migration_state there is an inner function _do_test() defined which actually should be called, while it does not. ** Affects: nova Importance: Undecided Assignee: Roman Dobosz (roman-dobosz) Status: In Progress ** Tags: live-migration ** Changed in: nova Assignee: (unassigned) => Roman Dobosz (roman-dobosz) ** Changed in: nova Status: New => In Progress -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1547066 Title: Test test_live_migration_pause_vm_invalid_migration_state lacks of execution Status in OpenStack Compute (nova): In Progress Bug description: In the test test_live_migration_pause_vm_invalid_migration_state there is an inner function _do_test() defined which actually should be called, while it does not. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1547066/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1546433] [NEW] nova.service.Service.kill() is unused and orphaned
Public bug reported: oslo.service.Service doesn't provide kill method in it's interface [1]. Nova is implementing it (it removes service record from the DB), but obviously it isn't actually ever called. This was probably orphaned long time ago (last changes in 2011). I think the method should go away. [1] https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L88-L109 ** Affects: nova Importance: Undecided Assignee: Roman Dobosz (roman-dobosz) Status: New ** Changed in: nova Assignee: (unassigned) => Roman Dobosz (roman-dobosz) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1546433 Title: nova.service.Service.kill() is unused and orphaned Status in OpenStack Compute (nova): New Bug description: oslo.service.Service doesn't provide kill method in it's interface [1]. Nova is implementing it (it removes service record from the DB), but obviously it isn't actually ever called. This was probably orphaned long time ago (last changes in 2011). I think the method should go away. [1] https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L88-L109 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1546433/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1442024] Re: AvailabilityZoneFilter does not filter when doing live migration
I have performed the test, which I hoped will shed some light on this (potential) behavior, however turns out it's not. The idea was to prepare two AZ which will separate the two groups of computes (in my case it was simply 3-node devstack), so that first AZ would have one compute and the second AZ would have the other one. There is also one Host Aggregate which contain all the computes. With this approach it might happen, that Host Aggregate will take a precedence over the AZ. The actors: 1. ctrl (controller node) 2. Alter the nova.conf: scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_default_filters=RetryFilter,AggregateInstanceExtraSpecsFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter,ImagePropertiesFilter 3. cpu1 and cpu2 (compute nodes) 4. availability zone az1 which include cpu1 and have metadata set to some.hw=true 5. availability zone az2 which include cpu2 6. host aggregate aggr3 which include cpu and cpu2 7. flavor aztest with the extra spec set to some.hw=true The action: Create the vms with aztest - all of them should be spawned on cpu1. Note, cirrosXXX has to be avialable; i've used image for i386 to be able to successfully perform live migration on my devstack setup. $ nova boot --flavor aztest --image cirrosXXX --min-count 4 vm $ nova list --fields host,name,status +--+--+--++ | ID | Host | Name | Status | +--+--+--++ | 1569be1a-1289-4d52-b3d1-c3008f7c865f | cpu1 | vm-4 | ACTIVE | | 217cb74e-74c6-4e46-abbc-3582d7e5fb4d | cpu1 | vm-3 | ACTIVE | | 7dc98646-db5a-4433-b000-fd0ae671f3c7 | cpu1 | vm-2 | ACTIVE | | a6ddd4d8-d05f-45c3-9e6a-4c9fa33da2ea | cpu1 | vm-1 | ACTIVE | +--+--+--++ Now, try live migrate the vm-1: $ nova live-migration --block-migrate vm-1 ERROR (BadRequest): No valid host was found. There are not enough hosts available. (HTTP 400) (Request-ID: req-2b1cd8d2-2316-40f2-8600-98c748ae565d) After adding another compute to the cluster, and adding it to the az1, live migration works as expected: $ nova aggregate-add-host aggr1 cpu3 $ nova live-migration --block-migrate vm-1 So I've failed to reproduce the reported behaviour, which might be a result of not enough data provided, and might be an configuration issue on the production. ** Changed in: nova Status: Confirmed => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1442024 Title: AvailabilityZoneFilter does not filter when doing live migration Status in OpenStack Compute (nova): Invalid Bug description: last night our ops team live migrated (nova live-migration --block- migrate $vm) a group of vm to do hw maintenance. the vm ended on a different AZ making the vm unusable (we have different upstream network connectivity on each AZ) it never happened before, i tested of course, i have setup AZ filter scheduler_available_filters=nova.scheduler.filters.all_filters scheduler_default_filters=RetryFilter,AggregateInstanceExtraSpecsFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter,ImagePropertiesFilter i'm using icehouse 2014.1.2-0ubuntu1.1~cloud0 i will clean and upload logs right away To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1442024/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1443910] [NEW] Zookeeper servicegroup driver crashes
Public bug reported: Zookeeper driver is based on zookeeper and evzookeeper modules. The latter is the source of nasty crash, which is well visible on nova conductor. To reproduce it is enough to enable zookeeper in nova.conf, provide configuration for the zookeeper service address and stack the thing. The traceback: 2015-04-14 13:23:22.622 TRACE nova Traceback (most recent call last): 2015-04-14 13:23:22.622 TRACE nova File /usr/local/bin/nova-conductor, line 10, in module 2015-04-14 13:23:22.622 TRACE nova sys.exit(main()) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/cmd/conductor.py, line 44, in main 2015-04-14 13:23:22.622 TRACE nova manager=CONF.conductor.manager) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/service.py, line 277, in create 2015-04-14 13:23:22.622 TRACE nova db_allowed=db_allowed) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/service.py, line 146, in __init__ 2015-04-14 13:23:22.622 TRACE nova self.servicegroup_api = servicegroup.API(db_allowed=db_allowed) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/servicegroup/api.py, line 76, in __init__ 2015-04-14 13:23:22.622 TRACE nova *args, **kwargs) 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/oslo_utils/importutils.py, line 38, in import_object 2015-04-14 13:23:22.622 TRACE nova return import_class(import_str)(*args, **kwargs) 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/oslo_utils/importutils.py, line 27, in import_class 2015-04-14 13:23:22.622 TRACE nova __import__(mod_str) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/servicegroup/drivers/zk.py, line 28, in module 2015-04-14 13:23:22.622 TRACE nova evzookeeper = importutils.try_import('evzookeeper') 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/oslo_utils/importutils.py, line 71, in try_import 2015-04-14 13:23:22.622 TRACE nova return import_module(import_str) 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/oslo_utils/importutils.py, line 57, in import_module 2015-04-14 13:23:22.622 TRACE nova __import__(import_str) 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/evzookeeper/__init__.py, line 26, in module 2015-04-14 13:23:22.622 TRACE nova from evzookeeper import utils 2015-04-14 13:23:22.622 TRACE nova File /usr/local/lib/python2.7/dist-packages/evzookeeper/utils.py, line 26, in module 2015-04-14 13:23:22.622 TRACE nova class _SocketDuckForFdTimeout(greenio._SocketDuckForFd): 2015-04-14 13:23:22.622 TRACE nova AttributeError: 'module' object has no attribute '_SocketDuckForFd' The root cause of the problem is the change, which have a place in module eventlet 0.17, which evzookeeper module is depend on. Because of the change the way eventlet.greenio is exposed there is no way to reach the class _SocketDuckForFd other way than explicitly importing it via eventlet.greenio.py2 module. Definitely solution for this problem might go upstream to the evzookeeper author, however the development of evzookeeper module seems stalled (no activity for last 2 years), maybe it's worth to consider changing the zk driver implementation to use different zk module (kazoo seems quite active). ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1443910 Title: Zookeeper servicegroup driver crashes Status in OpenStack Compute (Nova): New Bug description: Zookeeper driver is based on zookeeper and evzookeeper modules. The latter is the source of nasty crash, which is well visible on nova conductor. To reproduce it is enough to enable zookeeper in nova.conf, provide configuration for the zookeeper service address and stack the thing. The traceback: 2015-04-14 13:23:22.622 TRACE nova Traceback (most recent call last): 2015-04-14 13:23:22.622 TRACE nova File /usr/local/bin/nova-conductor, line 10, in module 2015-04-14 13:23:22.622 TRACE nova sys.exit(main()) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/cmd/conductor.py, line 44, in main 2015-04-14 13:23:22.622 TRACE nova manager=CONF.conductor.manager) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/service.py, line 277, in create 2015-04-14 13:23:22.622 TRACE nova db_allowed=db_allowed) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/service.py, line 146, in __init__ 2015-04-14 13:23:22.622 TRACE nova self.servicegroup_api = servicegroup.API(db_allowed=db_allowed) 2015-04-14 13:23:22.622 TRACE nova File /opt/stack/nova/nova/servicegroup/api.py, line 76, in __init__ 2015-04-14 13:23:22.622 TRACE nova *args, **kwargs) 2015-04-14