Mesos is not only about running stateless microservices to handle http requests. There are long duration workloads that would benefit from being rescheduled to a different host and not being interrupted; i.e. to implement dynamic bin packing in the cluster.
The networking issues has been proved through CRIU that is possible even at the socket level. Regarding IP moving around, Project Calico <https://www.projectcalico.org/> offers a way to do that; We tried with a homemade modifications to do it using docker and OSPF and it works very well. On Fri, Feb 19, 2016 at 11:49 AM, Sharma Podila <spod...@netflix.com> wrote: > Moving stateless services can be trivial or a non problem, as others have > suggested. > Migrating state full services becomes a function of migrating the state, > including any network conx, etc. To think aloud, from a bit of past > considerations in hpc like systems, some systems relied upon the underlying > systems to support migration (vMotion, etc.), to 3rd party libraries (was > that Meiosys) that could work on existing application binaries, to > libraries (BLCR > <http://crd.lbl.gov/departments/computer-science/CLaSS/research/BLCR/>) > that need support from application developer. I was involved with providing > support for BLCR based applications. One of the challenges was the time to > checkpoint an application with large memory footprint, say, 100 GB or more, > which isn't uncommon in hpc. Incremental checkpointing wasn't an option, at > least at that point. > Regardless, Mesos' support for checkpoint-restore would have to consider > the type of checkpoint-restore being used. I would imagine that the core > part of the solution would be simple'ish, in providing a "workflow" for the > checkpoint-restore system (sort of send signal to start checkpoint, wait > certain time to complete or timeout). Relatively less simple would be the > actual integration of the checkpoint-restore system and dealing with its > constraints and idiosyncrasies. > > > On Fri, Feb 19, 2016 at 4:50 AM, Dick Davies <d...@hellooperator.net> > wrote: > >> Agreed, vMotion always struck me as something for those monolithic >> apps with a lot of local state. >> >> The industry seems to be moving away from that as fast as its little >> legs will carry it. >> >> On 19 February 2016 at 11:35, Jason Giedymin <jason.giedy...@gmail.com> >> wrote: >> > Food for thought: >> > >> > One should refrain from monolithic apps. If they're small and stateless >> you >> > should be doing rolling upgrades. >> > >> > If you find yourself with one container and you can't easily distribute >> that >> > work load by just scaling and load balancing then you have a monolith. >> Time >> > to enhance it. >> > >> > Containers should not be treated like VMs. >> > >> > -Jason >> > >> > On Feb 19, 2016, at 6:05 AM, Mike Michel <mike.mic...@mmbash.de> wrote: >> > >> > Question is if you really need this when you are moving in the world of >> > containers/microservices where it is about building stateless 12factor >> apps >> > except databases. Why moving a service when you can just kill it and >> let the >> > work be done by 10 other containers doing the same? I remember a talk on >> > dockercon about containers and live migration. It was like: „And now >> where >> > you know how to do it, dont’t do it!“ >> > >> > >> > >> > Von: Avinash Sridharan [mailto:avin...@mesosphere.io] >> > Gesendet: Freitag, 19. Februar 2016 05:48 >> > An: user@mesos.apache.org >> > Betreff: Re: Feature request: move in-flight containers w/o stopping >> them >> > >> > >> > >> > One problem with implementing something like vMotion for Mesos is to >> address >> > seamless movement of network connectivity as well. This effectively >> requires >> > moving the IP address of the container across hosts. If the container >> shares >> > host network stack, this won't be possible since this would imply >> moving the >> > host IP address from one host to another. When a container has its >> network >> > namespace, attached to the host, using a bridge, moving across L2 >> segments >> > might be a possibility. To move across L3 segments you will need some >> form >> > of overlay (VxLAN maybe ?) . >> > >> > >> > >> > On Thu, Feb 18, 2016 at 7:34 PM, Jay Taylor <outtat...@gmail.com> >> wrote: >> > >> > Is this theoretically feasible with Linux checkpoint and restore, >> perhaps >> > via CRIU?http://criu.org/Main_Page >> > >> > >> > On Feb 18, 2016, at 4:35 AM, Paul Bell <arach...@gmail.com> wrote: >> > >> > Hello All, >> > >> > >> > >> > Has there ever been any consideration of the ability to move in-flight >> > containers from one Mesos host node to another? >> > >> > >> > >> > I see this as analogous to VMware's "vMotion" facility wherein VMs can >> be >> > moved from one ESXi host to another. >> > >> > >> > >> > I suppose something like this could be useful from a load-balancing >> > perspective. >> > >> > >> > >> > Just curious if it's ever been considered and if so - and rejected - why >> > rejected? >> > >> > >> > >> > Thanks. >> > >> > >> > >> > -Paul >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Avinash Sridharan, Mesosphere >> > >> > +1 (323) 702 5245 >> > >