[Yahoo-eng-team] [Bug 1341420] Re: gap between scheduler selection and claim causes spurious failures when the instance is the last one to fit
** Also affects: tripleo Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1341420 Title: gap between scheduler selection and claim causes spurious failures when the instance is the last one to fit Status in OpenStack Compute (nova): In Progress Status in tripleo: New Bug description: There is a race between the scheduler in select_destinations, which selects a set of hosts, and the nova compute manager, which claims resources on those hosts when building the instance. The race is particularly noticable with Ironic, where every request will consume a full host, but can turn up on libvirt etc too. Multiple schedulers will likely exacerbate this too unless they are in a version of python with randomised dictionary ordering, in which case they will make it better :). I've put https://review.openstack.org/106677 up to remove a comment which comes from before we introduced this race. One mitigating aspect to the race in the filter scheduler _schedule method attempts to randomly select hosts to avoid returning the same host in repeated requests, but the default minimum set it selects from is size 1 - so when heat requests a single instance, the same candidate is chosen every time. Setting that number higher can avoid all concurrent requests hitting the same host, but it will still be a race, and still likely to fail fairly hard at near-capacity situations (e.g. deploying all machines in a cluster with Ironic and Heat). Folk wanting to reproduce this: take a decent size cloud - e.g. 5 or 10 hypervisor hosts (KVM is fine). Deploy up to 1 VM left of capacity on each hypervisor. Then deploy a bunch of VMs one at a time but very close together - e.g. use the python API to get cached keystone credentials, and boot 5 in a loop. If using Ironic you will want https://review.openstack.org/106676 to let you see which host is being returned from the selection. Possible fixes: - have the scheduler be a bit smarter about returning hosts - e.g. track destination selection counts since the last refresh and weight hosts by that count as well - reinstate actioning claims into the scheduler, allowing the audit to correct any claimed-but-not-started resource counts asynchronously - special case the retry behaviour if there are lots of resources available elsewhere in the cluster. Stats wise, I just testing a 29 instance deployment with ironic and a heat stack, with 45 machines to deploy onto (so 45 hosts in the scheduler set) and 4 failed with this race - which means they recheduled and failed 3 times each - or 12 cases of scheduler racing *at minimum*. background chat 15:43 < lifeless> mikal: around? I need to sanity check something 15:44 < lifeless> ulp, nope, am sure of it. filing a bug. 15:45 < mikal> lifeless: ok 15:46 < lifeless> mikal: oh, you're here, I will run it past you :) 15:46 < lifeless> mikal: if you have ~5m 15:46 < mikal> Sure 15:46 < lifeless> so, symptoms 15:46 < lifeless> nova boot <...> --num-instances 45 -> works fairly reliably. Some minor timeout related things to fix but nothing dramatic. 15:47 < lifeless> heat create-stack <...> with a stack with 45 instances in it -> about 50% of instances fail to come up 15:47 < lifeless> this is with Ironic 15:47 < mikal> Sure 15:47 < lifeless> the failure on all the instances is the retry-three-times failure-of-death 15:47 < lifeless> what I believe is happening is this 15:48 < lifeless> the scheduler is allocating the same weighed list of hosts for requests that happen close enough together 15:49 < lifeless> and I believe its able to do that because the target hosts (from select_destinations) need to actually hit the compute node manager and have 15:49 < lifeless> with rt.instance_claim(context, instance, limits): 15:49 < lifeless> happen in _build_and_run_instance 15:49 < lifeless> before the resource usage is assigned 15:49 < mikal> Is heat making 45 separate requests to the nova API? 15:49 < lifeless> eys 15:49 < lifeless> yes 15:49 < lifeless> thats the key difference 15:50 < lifeless> same flavour, same image 15:50 < openstackgerrit> Sam Morrison proposed a change to openstack/nova: Remove cell api overrides for lock and unlock https://review.openstack.org/89487 15:50 < mikal> And you have enough quota for these instances, right? 15:50 < lifeless> yes 15:51 < mikal> I'd have to dig deeper to have an answer, but it sure does seem worth filing a bug for 15:51 < lifeless> my theory is that there is enough time between select_destinations in the conductor, and _build_and_run_instance in compute for another request to come in the front door and be scheduled to the same host 15:51 < mikal> That seems possible to me 15:52 <
[Yahoo-eng-team] [Bug 1531881] [NEW] AttributeError: 'module' object has no attribute 'dump_as_bytes'
Public bug reported: Seeing the following traceback from nova-compute when trying to launch instances in tripleo-ci for stable/liberty (using the ironic driver): 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] Traceback (most recent call last): 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2155, in _build_resources 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] yield resources 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2009, in _build_and_run_instance 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] block_device_info=block_device_info) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 802, in spawn 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] files=injected_files) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 716, in _generate_configdrive 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] "error: %s"), e, instance=instance) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__ 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] six.reraise(self.type_, self.value, self.tb) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 711, in _generate_configdrive 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] with configdrive.ConfigDriveBuilder(instance_md=i_meta) as cdb: 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/virt/configdrive.py", line 72, in __init__ 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] self.add_instance_metadata(instance_md) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/virt/configdrive.py", line 93, in add_instance_metadata 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] for (path, data) in instance_md.metadata_for_config_drive(): 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] File "/usr/lib/python2.7/site-packages/nova/api/metadata/base.py", line 465, in metadata_for_config_drive 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] yield (filepath, jsonutils.dump_as_bytes(data['meta-data'])) 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] AttributeError: 'module' object has no attribute 'dump_as_bytes' 2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] 2016-01-07 13:32:27.693 19349 INFO nova.compute.manager [req-6b73f4c5-c031-496e-b2f0-a5380d3ca7ba 285d1c33eca8410e9ed03bbe3de03d15 9448d5b54ff84bd6a8a04b1083eb920f - - -] [instance: 5a7c299b-f6b6-48d8-a20e-36e72c7bed79] Termi I believe it's caused by this commit: https://review.openstack.org/#/c/246792/ which I've submitted a revert for: https://review.openstack.org/#/c/264793/ The failed tripleo-ci job: http://logs.openstack.org/46/254946/4/check-tripleo/gate-tripleo-ci-f22-nonha/1363b32/ from this patch: https://review.openstack.org/#/c/254946/ The version of oslo.serialization in use on the job is python2-oslo- serialization-1.9.1-dev3.el7.centos.noarch, tripleo-ci uses delorean which is building rpm's based on the latest from stable/liberty ** Affects: nova Importance: Undecided Status: New ** Affects: tripleo Importance: Critical Assignee: James Slagle (james-slagle) Status: In Progress ** Also affects: tripleo Importance: Undecided Status: New ** Changed in: tripleo Importan
[Yahoo-eng-team] [Bug 1373430] Re: Error while compressing files
** Changed in: tripleo Status: Triaged = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Dashboard (Horizon). https://bugs.launchpad.net/bugs/1373430 Title: Error while compressing files Status in OpenStack Dashboard (Horizon): Fix Released Status in tripleo - openstack on openstack: Fix Released Bug description: All ci jobs failing Earliest Failure : 2014-09-24 09:51:55 UTC Example : http://logs.openstack.org/50/123150/3/check-tripleo/check-tripleo-ironic-undercloud-precise-nonha/3c60b32/console.html Sep 24 11:51:43 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: dib-run-parts Wed Sep 24 11:51:43 UTC 2014 Running /opt/stack/os-config-refresh/post-configure.d/14-horizon Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: CommandError: An error occured during rendering /opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_scripts.html: 'horizon/lib/bootstrap_datepicker/locales/bootstrap-datepicker..js' could not be found in the COMPRESS_ROOT '/opt/stack/venvs/openstack/lib/python2.7/site-packages/openstack_dashboard/static' or with staticfiles. Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: Found 'compress' tags in: Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: /opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_scripts.html Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: /opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_conf.html Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: /opt/stack/venvs/openstack/lib/python2.7/site-packages/openstack_dashboard/templates/_stylesheets.html Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: Compressing... [2014-09-24 11:51:53,459] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/opt/stack/os-config-refresh/post-configure.d']' returned non-zero exit status 1] To manage notifications about this bug go to: https://bugs.launchpad.net/horizon/+bug/1373430/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1346424] Re: Baremetal node id not supplied to driver
** Changed in: tripleo Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1346424 Title: Baremetal node id not supplied to driver Status in OpenStack Compute (Nova): Fix Committed Status in tripleo - openstack on openstack: Fix Released Bug description: A random overcloud baremetal node fails to boot during check-tripleo- overcloud-f20. Occurs intermittently. Full logs: http://logs.openstack.org/26/105326/4/check-tripleo/check-tripleo-overcloud-f20/9292247/ http://logs.openstack.org/81/106381/2/check-tripleo/check-tripleo-overcloud-f20/ca8a59b/ http://logs.openstack.org/08/106908/2/check-tripleo/check-tripleo-overcloud-f20/e9894ca/ Seed's nova-compute log shows this exception: Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 ERROR oslo.messaging.rpc.dispatcher [req-9f090bea-a974-4f3c-ab06-ebd2b7a5c9e6 ] Exception during message handling: Baremetal node id not supplied to driver for 'e13f2660-b72d-4a97-afac-64ff0eecc448' Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last): Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py, line 133, in _dispatch_and_reply Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher incoming.message)) Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py, line 176, in _dispatch Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args) Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py, line 122, in _do_dispatch Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args) Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/exception.py, line 88, in wrapped Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher payload) Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/excutils.py, line 82, in __exit__ Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/exception.py, line 71, in wrapped Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return f(self, context, *args, **kw) Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/compute/manager.py, line 291, in decorated_function Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher pass Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/excutils.py, line 82, in __exit__ Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher File /opt/stack/venvs/nova/lib/python2.7/site-packages/nova/compute/manager.py, line 277, in decorated_function Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return function(self, context, *args, **kwargs) Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21
[Yahoo-eng-team] [Bug 1174132] Re: Only one instance of a DHCP agent is run per network
** Changed in: tripleo Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1174132 Title: Only one instance of a DHCP agent is run per network Status in OpenStack Neutron (virtual network service): Fix Released Status in tripleo - openstack on openstack: Fix Released Bug description: DHCP Agent's are stateless and can run HA, but the scheduler only schedules one instance - when a network node fails (or is rebooted / has maintenance done on it) this results in user visible downtime. See also bug 1174591 which talks about the administrative overhead of dealing with failed nodes. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1174132/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1293784] [NEW] Need better default for nova_url
Public bug reported: While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we noticed that it seems the default for nova_url in neutron.conf is http://127.0.0.1:8774 From common/config.py: cfg.StrOpt('nova_url', default='http://127.0.0.1:8774', help=_('URL for connection to nova')), Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be more correct? ** Affects: neutron Importance: Undecided Status: New ** Description changed: While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we noticed that it seems the default for nova_url in neutron.conf is http://127.0.0.1:8774 From common/config.py: - cfg.StrOpt('nova_url', -default='http://127.0.0.1:8774', -help=_('URL for connection to nova')), + cfg.StrOpt('nova_url', + default='http://127.0.0.1:8774', + help=_('URL for connection to nova')), - Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be a - more sane default? + Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be more + correct? -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1293784 Title: Need better default for nova_url Status in OpenStack Neutron (virtual network service): New Bug description: While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we noticed that it seems the default for nova_url in neutron.conf is http://127.0.0.1:8774 From common/config.py: cfg.StrOpt('nova_url', default='http://127.0.0.1:8774', help=_('URL for connection to nova')), Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be more correct? To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1293784/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1262878] [NEW] Serial console device required for runcmd to take effect
Public bug reported: Not sure if this should actually be considered a bug or not. But, figured I'd submit it anyway. My cloud image is ubuntu-13.10, amd64, downloaded from here: http ://cloud-images.ubuntu.com/saucy/current/saucy-server-cloudimg- amd64-disk1.img. I'm testing locally in libvirt with virsh, etc. I'm using the OpenStack Config Drive as my only data source. It provides a user_data file that has the following contents: #cloud-config runcmd: - touch /cloudinit-runcmd-done I could not get that touch command to run. To help debug, I added the following line as well: output: {all: '| tee -a /var/log/cloud-init-output.log'} After adding that line, I noticed that the touch command ran fine. So, to summarize: w/o an output:{...} line in the user-data, my runcmd does not execute. Eventually I narrowed the issue down to the fact that my vm was missing a defined serial console. After adding the following to my libvirt xml for my vm, my runcmd runs fine with *or* without an output line in the user-data: serial type='pty' target port='0'/ /serial console type='pty' target type='serial' port='0'/ /console The only thing I can figure is that adding the output line, causes util.fix_output to get executed which must trigger something so that cloud-init keeps running to the stage where runcmd's are executed. ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1262878 Title: Serial console device required for runcmd to take effect Status in Init scripts for use on cloud images: New Bug description: Not sure if this should actually be considered a bug or not. But, figured I'd submit it anyway. My cloud image is ubuntu-13.10, amd64, downloaded from here: http ://cloud-images.ubuntu.com/saucy/current/saucy-server-cloudimg- amd64-disk1.img. I'm testing locally in libvirt with virsh, etc. I'm using the OpenStack Config Drive as my only data source. It provides a user_data file that has the following contents: #cloud-config runcmd: - touch /cloudinit-runcmd-done I could not get that touch command to run. To help debug, I added the following line as well: output: {all: '| tee -a /var/log/cloud-init-output.log'} After adding that line, I noticed that the touch command ran fine. So, to summarize: w/o an output:{...} line in the user-data, my runcmd does not execute. Eventually I narrowed the issue down to the fact that my vm was missing a defined serial console. After adding the following to my libvirt xml for my vm, my runcmd runs fine with *or* without an output line in the user-data: serial type='pty' target port='0'/ /serial console type='pty' target type='serial' port='0'/ /console The only thing I can figure is that adding the output line, causes util.fix_output to get executed which must trigger something so that cloud-init keeps running to the stage where runcmd's are executed. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1262878/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1260072] [NEW] TypeError trying to write a file specified in OpenStack Configuration Drive
Public bug reported: I'm using cloud-init with the OpenStack Configuration Drive. On the mounted iso serving as the Configuration Drive I have a /openstack/latest/meta_data.json file that looks like: { availability_zone:nova, files:[ { content_path:/content/foo, path:/etc/foo } ], hostname:foo } (there's some other contents as well, but I've snipped it here for brevity) cloud-init fails writing the /etc/foo file with the following traceback: 2013-12-11 20:52:59,259 - util.py[ERROR]: in method 'matchpathcon', argument 1 of type 'char const *' Traceback (most recent call last): File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in __exit__ self.selinux.matchpathcon(path, stats[stat.ST_MODE]) TypeError: in method 'matchpathcon', argument 1 of type 'char const *' 2013-12-11 20:52:59,260 - util.py[WARNING]: Failed writing files Note I had to add my own LOG.exception line in __exit__ of the SeLinuxGuard class in utils.py in order to see the exception. The issue seems to be that when the json from meta_data.json is passed to json.loads, you get back a dictionary with the keys and values in unicode, not strings. This is pretty easy to verify from the command line: /home/jslagle $ python Python 2.7.5 (default, Nov 12 2013, 16:18:42) [GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2 Type help, copyright, credits or license for more information. import json json.loads('{foo:bar}') {u'foo': u'bar'} Later on in cloud-init write_files is called with the file paths in unicode, which apparently selinux.matchpathcon does not support (it only supports strings). This is on Fedora 19, cloud-init version: cloud-init-0.7.2-7.fc19.noarch I'll be happy to work on a patch for this as well. What do you think the right fix for this would be? It'd be easy enough to cast path to str before calling matchpathcon. ** Affects: cloud-init Importance: Undecided Status: New ** Description changed: I'm using cloud-init with the OpenStack Configuration Drive. On the mounted iso serving as the Configuration Drive I have a - /openstack/latest/meta_json file that looks like: + /openstack/latest/meta_data.json file that looks like: { -availability_zone:nova, -files:[ - { -content_path:/content/foo, -path:/etc/foo - } -], -hostname:foo + availability_zone:nova, + files:[ + { + content_path:/content/foo, + path:/etc/foo + } + ], + hostname:foo } (there's some other contents as well, but I've snipped it here for brevity) cloud-init fails writing the /etc/foo file with the following traceback: 2013-12-11 20:52:59,259 - util.py[ERROR]: in method 'matchpathcon', argument 1 of type 'char const *' Traceback (most recent call last): - File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in __exit__ - self.selinux.matchpathcon(path, stats[stat.ST_MODE]) + File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in __exit__ + self.selinux.matchpathcon(path, stats[stat.ST_MODE]) TypeError: in method 'matchpathcon', argument 1 of type 'char const *' 2013-12-11 20:52:59,260 - util.py[WARNING]: Failed writing files Note I had to add my own LOG.exception line in __exit__ of the SeLinuxGuard class in utils.py in order to see the exception. The issue seems to be that when the json from meta_data.json is passed to json.loads, you get back a dictionary with the keys and values in unicode, not strings. This is pretty easy to verify from the command line: /home/jslagle $ python - Python 2.7.5 (default, Nov 12 2013, 16:18:42) + Python 2.7.5 (default, Nov 12 2013, 16:18:42) [GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2 Type help, copyright, credits or license for more information. import json json.loads('{foo:bar}') {u'foo': u'bar'} Later on in cloud-init write_files is called with the file paths in unicode, which apparently selinux.matchpathcon does not support (it only supports strings). This is on Fedora 19, cloud-init version: cloud-init-0.7.2-7.fc19.noarch I'll be happy to work on a patch for this as well. What do you think the right fix for this would be? It'd be easy enough to cast path to str before calling matchpathcon. -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1260072 Title: TypeError trying to write a file specified in OpenStack Configuration Drive Status in Init scripts for use on cloud images: New Bug description: I'm using cloud-init with the OpenStack Configuration Drive. On the mounted iso serving as the Configuration Drive I have a /openstack/latest/meta_data.json file that looks like: { availability_zone:nova, files:[ { content_path:/content/foo, path:/etc/foo }