Greetings, OK, another try - simplified the test environment as much as possible, in hopes of getting this working. Ubuntu 16.04, up to date.
Two changes: I've reverted to the default lxc network configuration to eliminate corner cases and focus on live migration issues. I've uninstalled the criu package and built criu from git - (https://github.com/tych0/criu/tree/cgroup-root-mount) Steps to reproduce failure: Create 1 new ubuntu 16.04 container on each of the 2 lxd hosts Issue the lxc move command Result: root@ronnie:~# lxc move second lxd: error: Error transferring container data: checkpoint failed: (03.725544) Error (net.c:1048): mount failed: Device or resource busy (03.742659) Error (namespaces.c:910): Namespaces dumping finished with error 65280 (03.747443) Error (cr-dump.c:1600): Dumping FAILED. root@ronnie:~# Tail of /var/log/lxd/second/migration_dump_2016-04-10T14\:39\:43-07\:00.log: (03.722460) Process: 579(23837) (03.722464) ---------------------------------------- (03.722477) Dumping 1(23057)'s namespaces (03.723377) Dump CGROUP namespace info 14 via 23057 (03.724066) Dump UTS namespace 11 via 23057 (03.724662) Dump IPC namespace 10 via 23057 (03.724924) IPC shared memory segments: 0 (03.724934) IPC message queues: 0 (03.724941) IPC semaphore sets: 0 (03.725346) Dump NET namespace info 9 via 23057 (03.725525) Mount ns' sysfs in crtools-sys.JZ9n0X (03.725544) Error (net.c:1048): mount failed: Device or resource busy (03.742659) Error (namespaces.c:910): Namespaces dumping finished with error 65280 (03.742876) Unlock network (03.742883) Running network-unlock scripts (03.747031) Unfreezing tasks into 1 (03.747049) Unseizing 23057 into 1 (03.747061) Unseizing 23147 into 1 (03.747069) Unseizing 23148 into 1 (03.747077) Unseizing 23182 into 1 (03.747084) Unseizing 23272 into 1 (03.747091) Unseizing 23279 into 1 (03.747125) Unseizing 23282 into 1 (03.747140) Unseizing 23297 into 1 (03.747162) Unseizing 23301 into 1 (03.747181) Unseizing 23306 into 1 (03.747223) Unseizing 23309 into 1 (03.747232) Unseizing 23330 into 1 (03.747247) Unseizing 23345 into 1 (03.747287) Unseizing 23474 into 1 (03.747299) Unseizing 23577 into 1 (03.747310) Unseizing 23675 into 1 (03.747319) Unseizing 23688 into 1 (03.747328) Unseizing 23691 into 1 (03.747339) Unseizing 23835 into 1 (03.747348) Unseizing 23836 into 1 (03.747357) Unseizing 23837 into 1 (03.747443) Error (cr-dump.c:1600): Dumping FAILED. I'm happy to supply additional info, or test patches - Regards, Jake On Fri, Apr 8, 2016 at 4:40 PM, jjs - mainphrame <j...@mainphrame.com> wrote: > Ah, never mind - it doesn't appear to be solely a criu issue - even > migration of stopped containers hangs forever now. > > Jake > > On Fri, Apr 8, 2016 at 4:23 PM, jjs - mainphrame <j...@mainphrame.com> wrote: >> Ubuntu 16.04, up to date - >> >> After today's updates, including a kernel upgrade to 4.4.0-18, I tried >> live migration again: >> >> root@raskolnikov:~# lxc move third lxd2: >> >> One hour later: >> >> root@raskolnikov:~# lxc move third lxd2: >> >> Still stuck, and the migration file in /var/log/lxd/third has not been >> created. >> >> Tycho said on Mar 30 that the situation should be sorted soon, but >> mentioned the git repo: >> https://github.com/tych0/criu/tree/cgroup-root-mount >> >> Should live migration work with criu from git? >> >> Feel free to advise me on what information I can supply, not only for >> the ct migration issues, but also for the new dhcp issue >> >> Regards, >> >> Jake >> >> >> On Thu, Apr 7, 2016 at 11:01 PM, jjs - mainphrame <j...@mainphrame.com> >> wrote: >>> (Bump) - >>> >>> Any thoughts on what to try for the CT migration and dhcp issues? >>> Running up to date ubuntu 16.04 beta - >>> >>> Regards, >>> >>> Jake >>> >>> On Wed, Apr 6, 2016 at 3:18 PM, jjs - mainphrame <j...@mainphrame.com> >>> wrote: >>>> Greetings - >>>> >>>> I'be not yet been able to reproduce that one shining moment from Mar >>>> 29 when live migration of privileged containers was working, under >>>> kernel 4.4.0-15 >>>> >>>> To recap. live container migration broke with 4.4.0-16, and is still >>>> broken in 4.4.0-17 - but now, instead of producing an error message, >>>> an attempt to live migrate a container merely hangs forever. Is that >>>> expected, or should I be seeing something more? BTW - the migration >>>> dump log for that container hasn't been touched for a week. I'll be >>>> glad to supply more info if this is not a known issue. >>>> >>>> Recent updates seem to have created a new problem. the CTs which >>>> configure their own network settings work (aside from migration) but >>>> none of the CTs which depend on dhcp are getting IPs. BTW I'm using a >>>> bridge connected to my local network and dhcp, not the default lxc >>>> dhcp server. I see the packets on the host bridge, but they don't >>>> reach the dhcp server. I'd be curious to know if there have been any >>>> dhcp issues since recent updates. If not, I'll need to troubleshoot >>>> other causes, but it's odd that dhcp simply stops working for all CTs >>>> on both lxd hosts after updates. >>>> >>>> Jake >>>> >>>> >>>> On Wed, Mar 30, 2016 at 6:27 AM, Tycho Andersen >>>> <tycho.ander...@canonical.com> wrote: >>>>> On Tue, Mar 29, 2016 at 11:17:26PM -0700, jjs - mainphrame wrote: >>>>>> Well, I've found some interesting things here today. I created a couple >>>>>> of >>>>>> privileged xenial containers, and sure enough, I was able to live migrate >>>>>> them back and forth between the 2 lxd hosts. >>>>>> >>>>>> So far, so good. >>>>>> >>>>>> Then I did an apt upgrade - among the changes was a kernel change from >>>>>> 4.4.0-15 to 4.4.0-16 - and live migration stopped working. >>>>>> >>>>>> Here are the failure messages that resulted from attempting the very same >>>>>> live migrations that worked before the upgrade and reboot into 4.4.0-16: >>>>>> >>>>>> root@raskolnikov:~# lxc move akira lxd2: >>>>>> error: Error transferring container data: checkpoint failed: >>>>>> (00.092234) Error (mount.c:740): mnt: 83:./sys/fs/cgroup/devices doesn't >>>>>> have a proper root mount >>>>>> (00.098187) Error (cr-dump.c:1600): Dumping FAILED. >>>>>> >>>>>> >>>>>> root@ronnie:~# lxc move third lxd: >>>>>> error: Error transferring container data: checkpoint failed: >>>>>> (00.076107) Error (mount.c:740): mnt: 326:./sys/fs/cgroup/perf_event >>>>>> doesn't have a proper root mount >>>>>> (00.080388) Error (cr-dump.c:1600): Dumping FAILED. >>>>> >>>>> Yep, this is a known issue with -16. We need both a kernel patch and a >>>>> patch to CRIU before it will start working again. I have a branch at: >>>>> >>>>> https://github.com/tych0/criu/tree/cgroup-root-mount >>>>> >>>>> which should work if you want to keep playing with it, but hopefully >>>>> we'll have the situation sorted out in the next few days. >>>>> >>>>> Tycho >>>>> >>>>>> Jake >>>>>> >>>>>> PS - Thanks for the html mail heads-up - I've been using google mail >>>>>> services for this domain. I'll have to look into the config options, and >>>>>> see if I can do the needful. >>>>> >>>>>> >>>>>> On Tue, Mar 29, 2016 at 12:45 PM, Andrey Repin <anrdae...@yandex.ru> >>>>>> wrote: >>>>>> >>>>>> > Greetings, jjs - mainphrame! >>>>>> > >>>>>> > >> On Mon, Mar 28, 2016 at 08:47:24PM -0700, jjs - mainphrame wrote: >>>>>> > >>> I've looked at ct migration between 2 ubuntu 16.04 hosts today, >>>>>> > and >>>>>> > had >>>>>> > >>> some interesting problems; I find that migration of stopped >>>>>> > containers >>>>>> > >>> works fairly reliably; but live migration, well, it transfers a >>>>>> > lot of >>>>>> > >>> data, then exits with a failure message. I can then move the same >>>>>> > >>> container, stopped, with no problem. >>>>>> > >>> >>>>>> > >>> The error is the same every time, a failure of "mkdtemp" - >>>>>> > >> >>>>>> > >> It looks like your host /tmp isn't writable by the uid map that the >>>>>> > >> container is being restored as? >>>>>> > >>>>>> > >>>>>> > > Which is odd, since /tmp has 1777 perms on both hosts, so I don't >>>>>> > > see how >>>>>> > > it could be a permissions problem. Surely the default apparmor >>>>>> > > profile is >>>>>> > > not the cause? You did give me a new idea though, and I'll set up a >>>>>> > > test >>>>>> > > with privileged containers for comparison. Is there a switch to >>>>>> > > enable >>>>>> > verbose logging? >>>>>> > >>>>>> > I've ran into the same issue once. Stumbled upon it for nearly a month, >>>>>> > falsely >>>>>> > blaming LXC. >>>>>> > Recreating a container's rootfs from scratch resolved the issue. >>>>>> > I know not of what caused it to begin with, must've been some kind of >>>>>> > glitch. >>>>>> > >>>>>> > P.S. >>>>>> > It would be great if you can configure your mail client to not use HTML >>>>>> > format >>>>>> > for lists. >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > With best regards, >>>>>> > Andrey Repin >>>>>> > Tuesday, March 29, 2016 22:43:04 >>>>>> > >>>>>> > Sorry for my terrible english... >>>>>> > _______________________________________________ >>>>>> > lxc-users mailing list >>>>>> > lxc-users@lists.linuxcontainers.org >>>>>> > http://lists.linuxcontainers.org/listinfo/lxc-users >>>>>> > >>>>> >>>>>> _______________________________________________ >>>>>> lxc-users mailing list >>>>>> lxc-users@lists.linuxcontainers.org >>>>>> http://lists.linuxcontainers.org/listinfo/lxc-users >>>>> >>>>> _______________________________________________ >>>>> lxc-users mailing list >>>>> lxc-users@lists.linuxcontainers.org >>>>> http://lists.linuxcontainers.org/listinfo/lxc-users _______________________________________________ lxc-users mailing list lxc-users@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-users