[Yahoo-eng-team] [Bug 1341420] Re: gap between scheduler selection and claim causes spurious failures when the instance is the last one to fit

2016-03-08 Thread James Slagle
** Also affects: tripleo
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1341420

Title:
  gap between scheduler selection and claim causes spurious failures
  when the instance is the last one to fit

Status in OpenStack Compute (nova):
  In Progress
Status in tripleo:
  New

Bug description:
  There is a race between the scheduler in select_destinations, which
  selects a set of hosts, and the nova compute manager, which claims
  resources on those hosts when building the instance. The race is
  particularly noticable with Ironic, where every request will consume a
  full host, but can turn up on libvirt etc too. Multiple schedulers
  will likely exacerbate this too unless they are in a version of python
  with randomised dictionary ordering, in which case they will make it
  better :).

  I've put https://review.openstack.org/106677 up to remove a comment
  which comes from before we introduced this race.

  One mitigating aspect to the race in the filter scheduler _schedule
  method attempts to randomly select hosts to avoid returning the same
  host in repeated requests, but the default minimum set it selects from
  is size 1 - so when heat requests a single instance, the same
  candidate is chosen every time. Setting that number higher can avoid
  all concurrent requests hitting the same host, but it will still be a
  race, and still likely to fail fairly hard at near-capacity situations
  (e.g. deploying all machines in a cluster with Ironic and Heat).

  Folk wanting to reproduce this: take a decent size cloud - e.g. 5 or
  10 hypervisor hosts (KVM is fine). Deploy up to 1 VM left of capacity
  on each hypervisor. Then deploy a bunch of VMs one at a time but very
  close together - e.g. use the python API to get cached keystone
  credentials, and boot 5 in a loop.

  If using Ironic you will want https://review.openstack.org/106676 to
  let you see which host is being returned from the selection.

  Possible fixes:
   - have the scheduler be a bit smarter about returning hosts - e.g. track 
destination selection counts since the last refresh and weight hosts by that 
count as well
   - reinstate actioning claims into the scheduler, allowing the audit to 
correct any claimed-but-not-started resource counts asynchronously
   - special case the retry behaviour if there are lots of resources available 
elsewhere in the cluster.

  Stats wise, I just testing a 29 instance deployment with ironic and a
  heat stack, with 45 machines to deploy onto (so 45 hosts in the
  scheduler set) and 4 failed with this race - which means they
  recheduled and failed 3 times each - or 12 cases of scheduler racing
  *at minimum*.

  background chat

  15:43 < lifeless> mikal: around? I need to sanity check something
  15:44 < lifeless> ulp, nope, am sure of it. filing a bug.
  15:45 < mikal> lifeless: ok
  15:46 < lifeless> mikal: oh, you're here, I will run it past you :)
  15:46 < lifeless> mikal: if you have ~5m
  15:46 < mikal> Sure
  15:46 < lifeless> so, symptoms
  15:46 < lifeless> nova boot <...> --num-instances 45 -> works fairly 
reliably. Some minor timeout related things to fix but nothing dramatic.
  15:47 < lifeless> heat create-stack <...> with a stack with 45 instances in 
it -> about 50% of instances fail to come up
  15:47 < lifeless> this is with Ironic
  15:47 < mikal> Sure
  15:47 < lifeless> the failure on all the instances is the retry-three-times 
failure-of-death
  15:47 < lifeless> what I believe is happening is this
  15:48 < lifeless> the scheduler is allocating the same weighed list of hosts 
for requests that happen close enough together
  15:49 < lifeless> and I believe its able to do that because the target hosts 
(from select_destinations) need to actually hit the compute node manager and 
have
  15:49 < lifeless> with rt.instance_claim(context, instance, 
limits):
  15:49 < lifeless> happen in _build_and_run_instance
  15:49 < lifeless> before the resource usage is assigned
  15:49 < mikal> Is heat making 45 separate requests to the nova API?
  15:49 < lifeless> eys
  15:49 < lifeless> yes
  15:49 < lifeless> thats the key difference
  15:50 < lifeless> same flavour, same image
  15:50 < openstackgerrit> Sam Morrison proposed a change to openstack/nova: 
Remove cell api overrides for lock and unlock  
https://review.openstack.org/89487
  15:50 < mikal> And you have enough quota for these instances, right?
  15:50 < lifeless> yes
  15:51 < mikal> I'd have to dig deeper to have an answer, but it sure does 
seem worth filing a bug for
  15:51 < lifeless> my theory is that there is enough time between 
select_destinations in the conductor, and _build_and_run_instance in compute 
for another request to come in the front door and be scheduled to the same host
  15:51 < mikal> That seems possible to me
  15:52 < 

[Yahoo-eng-team] [Bug 1531881] [NEW] AttributeError: 'module' object has no attribute 'dump_as_bytes'

2016-01-07 Thread James Slagle
Public bug reported:

Seeing the following traceback from nova-compute when trying to launch
instances in tripleo-ci for stable/liberty (using the ironic driver):

2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] Traceback (most recent call last):
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2155, in 
_build_resources
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] yield resources
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2009, in 
_build_and_run_instance
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] block_device_info=block_device_info)
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 802, in 
spawn
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] files=injected_files)
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 716, in 
_generate_configdrive
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] "error: %s"), e, instance=instance)
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 195, in __exit__
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] six.reraise(self.type_, self.value, 
self.tb)
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 711, in 
_generate_configdrive
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] with 
configdrive.ConfigDriveBuilder(instance_md=i_meta) as cdb:
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/virt/configdrive.py", line 72, in 
__init__
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] 
self.add_instance_metadata(instance_md)
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/virt/configdrive.py", line 93, in 
add_instance_metadata
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] for (path, data) in 
instance_md.metadata_for_config_drive():
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]   File 
"/usr/lib/python2.7/site-packages/nova/api/metadata/base.py", line 465, in 
metadata_for_config_drive
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] yield (filepath, 
jsonutils.dump_as_bytes(data['meta-data']))
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] AttributeError: 'module' object has no 
attribute 'dump_as_bytes'
2016-01-07 13:32:27.691 19349 ERROR nova.compute.manager [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79]
2016-01-07 13:32:27.693 19349 INFO nova.compute.manager 
[req-6b73f4c5-c031-496e-b2f0-a5380d3ca7ba 285d1c33eca8410e9ed03bbe3de03d15 
9448d5b54ff84bd6a8a04b1083eb920f - - -] [instance: 
5a7c299b-f6b6-48d8-a20e-36e72c7bed79] Termi


I believe it's caused by this commit:
https://review.openstack.org/#/c/246792/

which I've submitted a revert for:
https://review.openstack.org/#/c/264793/

The failed tripleo-ci job:
http://logs.openstack.org/46/254946/4/check-tripleo/gate-tripleo-ci-f22-nonha/1363b32/

from this patch:
https://review.openstack.org/#/c/254946/

The version of oslo.serialization in use on the job is python2-oslo-
serialization-1.9.1-dev3.el7.centos.noarch, tripleo-ci uses delorean
which is building rpm's based on the latest from stable/liberty

** Affects: nova
 Importance: Undecided
     Status: New

** Affects: tripleo
 Importance: Critical
 Assignee: James Slagle (james-slagle)
 Status: In Progress

** Also affects: tripleo
   Importance: Undecided
   Status: New

** Changed in: tripleo
   Importan

[Yahoo-eng-team] [Bug 1373430] Re: Error while compressing files

2014-10-07 Thread James Slagle
** Changed in: tripleo
   Status: Triaged = Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Dashboard (Horizon).
https://bugs.launchpad.net/bugs/1373430

Title:
  Error while compressing files

Status in OpenStack Dashboard (Horizon):
  Fix Released
Status in tripleo - openstack on openstack:
  Fix Released

Bug description:
  All ci jobs failing

  Earliest Failure : 2014-09-24 09:51:55 UTC
  Example : 
http://logs.openstack.org/50/123150/3/check-tripleo/check-tripleo-ironic-undercloud-precise-nonha/3c60b32/console.html

  
  Sep 24 11:51:43 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
dib-run-parts Wed Sep 24 11:51:43 UTC 2014 Running 
/opt/stack/os-config-refresh/post-configure.d/14-horizon
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
CommandError: An error occured during rendering 
/opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_scripts.html:
 'horizon/lib/bootstrap_datepicker/locales/bootstrap-datepicker..js' could not 
be found in the COMPRESS_ROOT 
'/opt/stack/venvs/openstack/lib/python2.7/site-packages/openstack_dashboard/static'
 or with staticfiles.
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
Found 'compress' tags in:
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
/opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_scripts.html
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
/opt/stack/venvs/openstack/lib/python2.7/site-packages/horizon/templates/horizon/_conf.html
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
/opt/stack/venvs/openstack/lib/python2.7/site-packages/openstack_dashboard/templates/_stylesheets.html
  Sep 24 11:51:53 overcloud-controller0-dxjfgv3agarr os-collect-config[724]: 
Compressing... [2014-09-24 11:51:53,459] (os-refresh-config) [ERROR] during 
post-configure phase. [Command '['dib-run-parts', 
'/opt/stack/os-config-refresh/post-configure.d']' returned non-zero exit status 
1]

To manage notifications about this bug go to:
https://bugs.launchpad.net/horizon/+bug/1373430/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1346424] Re: Baremetal node id not supplied to driver

2014-08-15 Thread James Slagle
** Changed in: tripleo
   Status: Fix Committed = Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1346424

Title:
  Baremetal node id not supplied to driver

Status in OpenStack Compute (Nova):
  Fix Committed
Status in tripleo - openstack on openstack:
  Fix Released

Bug description:
  A random overcloud baremetal node fails to boot during check-tripleo-
  overcloud-f20. Occurs intermittently.

  Full logs:

  
http://logs.openstack.org/26/105326/4/check-tripleo/check-tripleo-overcloud-f20/9292247/
  
http://logs.openstack.org/81/106381/2/check-tripleo/check-tripleo-overcloud-f20/ca8a59b/
  
http://logs.openstack.org/08/106908/2/check-tripleo/check-tripleo-overcloud-f20/e9894ca/

  
  Seed's nova-compute log shows this exception:

  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 ERROR oslo.messaging.rpc.dispatcher 
[req-9f090bea-a974-4f3c-ab06-ebd2b7a5c9e6 ] Exception during message handling: 
Baremetal node id not supplied to driver for 
'e13f2660-b72d-4a97-afac-64ff0eecc448'
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent 
call last):
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 133, in _dispatch_and_reply
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher incoming.message))
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 176, in _dispatch
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return 
self._do_dispatch(endpoint, method, ctxt, args)
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/oslo/messaging/rpc/dispatcher.py,
 line 122, in _do_dispatch
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher result = 
getattr(endpoint, method)(ctxt, **new_args)
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/exception.py, line 88, 
in wrapped
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher payload)
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/excutils.py,
 line 82, in __exit__
  Jul 21 13:46:07 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher 
six.reraise(self.type_, self.value, self.tb)
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/exception.py, line 71, 
in wrapped
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return f(self, 
context, *args, **kw)
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/compute/manager.py, 
line 291, in decorated_function
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher pass
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/openstack/common/excutils.py,
 line 82, in __exit__
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher 
six.reraise(self.type_, self.value, self.tb)
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher   File 
/opt/stack/venvs/nova/lib/python2.7/site-packages/nova/compute/manager.py, 
line 277, in decorated_function
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 
13:46:07.981 3608 TRACE oslo.messaging.rpc.dispatcher return function(self, 
context, *args, **kwargs)
  Jul 21 13:46:08 host-192-168-1-236 nova-compute[3608]: 2014-07-21 

[Yahoo-eng-team] [Bug 1174132] Re: Only one instance of a DHCP agent is run per network

2014-04-03 Thread James Slagle
** Changed in: tripleo
   Status: Fix Committed = Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1174132

Title:
  Only one instance of a DHCP agent is run per network

Status in OpenStack Neutron (virtual network service):
  Fix Released
Status in tripleo - openstack on openstack:
  Fix Released

Bug description:
  DHCP Agent's are stateless and can run HA, but the scheduler only
  schedules one instance - when a network node fails (or is rebooted /
  has maintenance done on it) this results in user visible downtime.

  See also bug 1174591 which talks about the administrative overhead of
  dealing with failed nodes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1174132/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1293784] [NEW] Need better default for nova_url

2014-03-17 Thread James Slagle
Public bug reported:

While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we
noticed that it seems the default for nova_url in neutron.conf is
http://127.0.0.1:8774

From common/config.py:
cfg.StrOpt('nova_url',
   default='http://127.0.0.1:8774',
   help=_('URL for connection to nova')),

Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be more
correct?

** Affects: neutron
 Importance: Undecided
 Status: New

** Description changed:

  While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we
  noticed that it seems the default for nova_url in neutron.conf is
  http://127.0.0.1:8774
  
  From common/config.py:
- cfg.StrOpt('nova_url',
-default='http://127.0.0.1:8774',
-help=_('URL for connection to nova')),
+ cfg.StrOpt('nova_url',
+    default='http://127.0.0.1:8774',
+    help=_('URL for connection to nova')),
  
- Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be a
- more sane default?
+ Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be more
+ correct?

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1293784

Title:
  Need better default for nova_url

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  While looking into https://bugs.launchpad.net/tripleo/+bug/1293782, we
  noticed that it seems the default for nova_url in neutron.conf is
  http://127.0.0.1:8774

  From common/config.py:
  cfg.StrOpt('nova_url',
     default='http://127.0.0.1:8774',
     help=_('URL for connection to nova')),

  Is this really a sane default? Wouldn't http://127.0.0.1:8774/v2 be
  more correct?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1293784/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1262878] [NEW] Serial console device required for runcmd to take effect

2013-12-19 Thread James Slagle
Public bug reported:

Not sure if this should actually be considered a bug or not.  But,
figured I'd submit it anyway.

My cloud image is ubuntu-13.10, amd64, downloaded from here: http
://cloud-images.ubuntu.com/saucy/current/saucy-server-cloudimg-
amd64-disk1.img.  I'm testing locally in libvirt with virsh, etc.

I'm using the OpenStack Config Drive as my only data source.  It
provides a user_data file that has the following contents:

#cloud-config
runcmd:
  - touch /cloudinit-runcmd-done

I could not get that touch command to run.  To help debug, I added the
following line as well:

output: {all: '| tee -a /var/log/cloud-init-output.log'}

After adding that line, I noticed that the touch command ran fine.  So,
to summarize: w/o an output:{...} line in the user-data, my runcmd does
not execute.  Eventually I narrowed the issue down to the fact that my
vm was missing a defined serial console.  After adding the following to
my libvirt xml for my vm, my runcmd runs fine with *or* without an
output line in the user-data:

serial type='pty'
  target port='0'/
/serial
console type='pty'
  target type='serial' port='0'/
/console

The only thing I can figure is that adding the output line, causes
util.fix_output to get executed which must trigger something so that
cloud-init keeps running to the stage where runcmd's are executed.

** Affects: cloud-init
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1262878

Title:
  Serial console device required for runcmd to take effect

Status in Init scripts for use on cloud images:
  New

Bug description:
  Not sure if this should actually be considered a bug or not.  But,
  figured I'd submit it anyway.

  My cloud image is ubuntu-13.10, amd64, downloaded from here: http
  ://cloud-images.ubuntu.com/saucy/current/saucy-server-cloudimg-
  amd64-disk1.img.  I'm testing locally in libvirt with virsh, etc.

  I'm using the OpenStack Config Drive as my only data source.  It
  provides a user_data file that has the following contents:

  #cloud-config
  runcmd:
- touch /cloudinit-runcmd-done

  I could not get that touch command to run.  To help debug, I added
  the following line as well:

  output: {all: '| tee -a /var/log/cloud-init-output.log'}

  After adding that line, I noticed that the touch command ran fine.
  So, to summarize: w/o an output:{...} line in the user-data, my runcmd
  does not execute.  Eventually I narrowed the issue down to the fact
  that my vm was missing a defined serial console.  After adding the
  following to my libvirt xml for my vm, my runcmd runs fine with *or*
  without an output line in the user-data:

  serial type='pty'
target port='0'/
  /serial
  console type='pty'
target type='serial' port='0'/
  /console

  The only thing I can figure is that adding the output line, causes
  util.fix_output to get executed which must trigger something so that
  cloud-init keeps running to the stage where runcmd's are executed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1262878/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1260072] [NEW] TypeError trying to write a file specified in OpenStack Configuration Drive

2013-12-11 Thread James Slagle
Public bug reported:

I'm using cloud-init with the OpenStack Configuration Drive.  On the
mounted iso serving as the Configuration Drive I have a
/openstack/latest/meta_data.json file that looks like:

{
   availability_zone:nova,
   files:[
 {
   content_path:/content/foo,
   path:/etc/foo
 }
   ],
   hostname:foo
}

(there's some other contents as well, but I've snipped it here for
brevity)

cloud-init fails writing the /etc/foo file with the following traceback:

2013-12-11 20:52:59,259 - util.py[ERROR]: in method 'matchpathcon', argument 1 
of type 'char const *'
Traceback (most recent call last):
  File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in 
__exit__
self.selinux.matchpathcon(path, stats[stat.ST_MODE])
TypeError: in method 'matchpathcon', argument 1 of type 'char const *'
2013-12-11 20:52:59,260 - util.py[WARNING]: Failed writing files

Note I had to add my own LOG.exception line in __exit__ of the
SeLinuxGuard class in utils.py in order to see the exception.

The issue seems to be that when the json from meta_data.json is passed
to json.loads, you get back a dictionary with the keys and values in
unicode, not strings.  This is pretty easy to verify from the command
line:

/home/jslagle $ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42)
[GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2
Type help, copyright, credits or license for more information.
 import json
 json.loads('{foo:bar}')
{u'foo': u'bar'}

Later on in cloud-init  write_files is called with the file paths in
unicode, which apparently selinux.matchpathcon does not support (it only
supports strings).

This is on Fedora 19, cloud-init version: cloud-init-0.7.2-7.fc19.noarch

I'll be happy to work on a patch for this as well.

What do you think the right fix for this would be?  It'd be easy enough
to cast path to str before calling matchpathcon.

** Affects: cloud-init
 Importance: Undecided
 Status: New

** Description changed:

  I'm using cloud-init with the OpenStack Configuration Drive.  On the
  mounted iso serving as the Configuration Drive I have a
- /openstack/latest/meta_json file that looks like:
+ /openstack/latest/meta_data.json file that looks like:
  
  {
-availability_zone:nova,
-files:[
-  {
-content_path:/content/foo,
-path:/etc/foo
-  }
-],
-hostname:foo
+    availability_zone:nova,
+    files:[
+  {
+    content_path:/content/foo,
+    path:/etc/foo
+  }
+    ],
+    hostname:foo
  }
  
  (there's some other contents as well, but I've snipped it here for
  brevity)
  
  cloud-init fails writing the /etc/foo file with the following traceback:
  
  2013-12-11 20:52:59,259 - util.py[ERROR]: in method 'matchpathcon', argument 
1 of type 'char const *'
  Traceback (most recent call last):
-   File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in 
__exit__
- self.selinux.matchpathcon(path, stats[stat.ST_MODE])
+   File /usr/lib/python2.7/site-packages/cloudinit/util.py, line 177, in 
__exit__
+ self.selinux.matchpathcon(path, stats[stat.ST_MODE])
  TypeError: in method 'matchpathcon', argument 1 of type 'char const *'
  2013-12-11 20:52:59,260 - util.py[WARNING]: Failed writing files
  
  Note I had to add my own LOG.exception line in __exit__ of the
  SeLinuxGuard class in utils.py in order to see the exception.
  
  The issue seems to be that when the json from meta_data.json is passed
  to json.loads, you get back a dictionary with the keys and values in
  unicode, not strings.  This is pretty easy to verify from the command
  line:
  
  /home/jslagle $ python
- Python 2.7.5 (default, Nov 12 2013, 16:18:42) 
+ Python 2.7.5 (default, Nov 12 2013, 16:18:42)
  [GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2
  Type help, copyright, credits or license for more information.
   import json
   json.loads('{foo:bar}')
  {u'foo': u'bar'}
  
  Later on in cloud-init  write_files is called with the file paths in
  unicode, which apparently selinux.matchpathcon does not support (it only
  supports strings).
  
  This is on Fedora 19, cloud-init version: cloud-init-0.7.2-7.fc19.noarch
  
  I'll be happy to work on a patch for this as well.
  
  What do you think the right fix for this would be?  It'd be easy enough
  to cast path to str before calling matchpathcon.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1260072

Title:
  TypeError trying to write a file specified in OpenStack Configuration
  Drive

Status in Init scripts for use on cloud images:
  New

Bug description:
  I'm using cloud-init with the OpenStack Configuration Drive.  On the
  mounted iso serving as the Configuration Drive I have a
  /openstack/latest/meta_data.json file that looks like:

  {
     availability_zone:nova,
     files:[
   {
     content_path:/content/foo,
     path:/etc/foo
   }