ok. so i have 2 other systems that are showing this failure now. I was able to ssh into them, though. walinux-agent had provisioned the user, populated ssh keys and then also started sshd (which it actually should not do).
it shouldn't start sshd because it is possibly doing that before sshd has the required facilities up (sshd starts on 'filesystem or runlevel [2345]). that wouldn't seem to be the problem here, and actually has allowed us into the instance to debug. $ ls -tr --full-time /var/log/upstart/*.log -rw-r----- 1 root root 46 2013-06-28 15:07:55.843772000 +0000 /var/log/upstart/container-detect.log -rw-r----- 1 root root 95 2013-06-28 15:07:56.095772000 +0000 /var/log/upstart/console-setup.log -rw-r----- 1 root root 282 2013-06-28 15:07:56.183772000 +0000 /var/log/upstart/procps-virtual-filesystems.log -rw-r----- 1 root root 118 2013-06-28 15:07:56.311772000 +0000 /var/log/upstart/module-init-tools.log -rw-r----- 1 root root 282 2013-06-28 15:07:58.310376600 +0000 /var/log/upstart/procps-static-network-up.log -rw-r----- 1 root root 110 2013-06-28 15:08:02.993943800 +0000 /var/log/upstart/udev-fallback-graphics.log -rw-r----- 1 root root 158 2013-06-28 15:08:09.876561300 +0000 /var/log/upstart/ureadahead-other.log -rw-r----- 1 root root 64 2013-06-28 15:09:30.346411301 +0000 /var/log/upstart/rsyslog.log -rw-r----- 1 root root 64 2013-06-28 15:09:30.370411301 +0000 /var/log/upstart/dbus.log $ cat /proc/mounts rootfs / rootfs rw 0 0 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 udev /dev devtmpfs rw,relatime,size=335336k,nr_inodes=83834,mode=755 0 0 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /run tmpfs rw,nosuid,relatime,size=137672k,mode=755 0 0 /dev/disk/by-uuid/65a0705a-7afe-482f-917d-c59e75cf0c52 / ext4 rw,relatime,user_xattr,barrier=1,data=ordered,discard 0 0 none /sys/fs/fuse/connections fusectl rw,relatime 0 0 none /sys/kernel/debug debugfs rw,relatime 0 0 none /sys/kernel/security securityfs rw,relatime 0 0 none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0 none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0 /dev/sdb1 /mnt/resource ext4 rw,relatime,user_xattr,barrier=1,data=ordered 0 0 $ cat /etc/fstab UUID=65a0705a-7afe-482f-917d-c59e75cf0c52 / ext4 defaults,discard 0 0 mountall is not running. $ sudo status mountall mountall stop/waiting $ ls -altr /var/run/landscape ls: cannot access /var/run/landscape: No such file or directory $ runlevel N 2 $ ps axw .. root 389 1 0 15:07 ? 00:00:00 upstart-udev-bridge --daemon root 391 1 0 15:07 ? 00:00:00 /sbin/udevd --daemon root 508 1 0 15:07 ? 00:00:00 /usr/bin/python /usr/sbin/waagent -daemo root 574 391 0 15:07 ? 00:00:00 /sbin/udevd --daemon root 577 391 0 15:07 ? 00:00:00 /sbin/udevd --daemon root 598 2 0 15:07 ? 00:00:00 [kpsmoused] root 633 1 0 15:07 ? 00:00:00 upstart-socket-bridge --daemon root 906 2 0 15:08 ? 00:00:00 [jbd2/sdb1-8] root 907 2 0 15:08 ? 00:00:00 [ext4-dio-unwrit] root 931 508 0 15:08 ? 00:00:00 [sh] <defunct> root 1015 1 0 15:08 ? 00:00:00 dhclient3 -e IF_METRIC=100 -pf /var/run/ root 1025 1 0 15:08 ? 00:00:00 /bin/sh /etc/network/if-up.d/ntpdate root 1028 1025 0 15:08 ? 00:00:00 lockfile-create /var/lock/ntpdate-ifup root 1121 1 0 15:09 ? 00:00:00 /usr/sbin/sshd -D syslog 1137 1 0 15:09 ? 00:00:00 rsyslogd -c5 102 1142 1 0 15:09 ? 00:00:00 dbus-daemon --system --fork --activation root 1200 1 0 15:09 tty4 00:00:00 /sbin/getty -8 38400 tty4 root 1207 1 0 15:09 tty5 00:00:00 /sbin/getty -8 38400 tty5 root 1214 1 0 15:09 tty2 00:00:00 /sbin/getty -8 38400 tty2 root 1215 1 0 15:09 tty3 00:00:00 /sbin/getty -8 38400 tty3 root 1218 1 0 15:09 tty6 00:00:00 /sbin/getty -8 38400 tty6 root 1248 1 0 15:09 ? 00:00:00 /usr/sbin/hv_kvp_daemon_3.2.0-48-virtual root 1250 1 0 15:09 ? 00:00:00 acpid -c /etc/acpi/events -s /var/run/ac root 1251 1 0 15:09 ? 00:00:00 cron daemon 1252 1 0 15:09 ? 00:00:00 atd root 1265 1 0 15:09 tty1 00:00:00 /sbin/getty -8 38400 tty1 whoopsie 1279 1 0 15:09 ? 00:00:00 whoopsie root 1308 1121 0 15:23 ? 00:00:00 sshd: test [priv] test 1412 1308 0 15:24 ? 00:00:00 sshd: test@pts/0 test 1413 1412 0 15:24 pts/0 00:00:01 -bash root 1755 2 0 15:33 ? 00:00:00 [kworker/0:0] root 2054 2 0 15:38 ? 00:00:00 [kworker/0:2] root 2274 2 0 15:43 ? 00:00:00 [kworker/0:1] test 2450 1413 0 15:45 pts/0 00:00:00 ps -ef Note, it seems that 'lockfile-create /var/lock/ntpdate-ifup' is hung. $ sudo sh -c "tr '\0' ' ' < /proc/1025/environ" ; echo METHOD=dhcp MODE=start LOGICAL=eth0 PHASE=post-up ADDRFAM=inet VERBOSITY=0 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin IF_METRIC=100 IFACE=eth0 PWD=/var/lib/waagent $ ls -l /proc/1025/cmdline --full-time -r--r--r-- 1 root root 0 2013-06-28 15:23:53.965119600 +0000 /proc/1025/cmdline $ ls -l /run/network/ --full-time total 4 -rw-r--r-- 1 root root 16 2013-06-28 15:08:13.333041000 +0000 ifstate -rw-r--r-- 1 root root 0 2013-06-28 15:08:13.321041000 +0000 ifup.eth0 -rw-r--r-- 1 root root 0 2013-06-28 15:07:57.774376600 +0000 ifup.lo drwxr-xr-x 2 root root 40 2013-06-28 15:07:58.298376600 +0000 static-network-up-emitted -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs