If Apache JIRA were up, I'd point you to a JIRA noting the problem with naming docker containers `mesos-*`, as Mesos reserves that prefix (and kills everything it considers "unknown").
As a quick workaround, try setting this flag to false: https://github.com/apache/mesos/blob/1.1.x/src/slave/flags.cpp#L590-L596 On Thu, Jan 12, 2017 at 4:41 PM, Giulio Eulisse <giulio.euli...@gmail.com> wrote: > MMm... it seems to die after a long sequence of forks, and mesos itself > seems to be issuing the sigkill. I wonder if it's trying to do some cleanup > and it does not realise one of the containers is the agent itself??? Notice > I do have `MESOS_DOCKER_MESOS_IMAGE=alisw/mesos-slave:1.0.1` set. > > On 13 Jan 2017, 01:23 +0100, Giulio Eulisse <giulio.euli...@gmail.com>, > wrote: > > Ciao, > > the only thing I could find is by running a parallel `docker events` > > ``` > 2017-01-13T01:18:20.766593692+01:00 network connect > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71, > name=host, type=host) > 2017-01-13T01:18:20.846137793+01:00 container start > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > name=mesos-slave, vendor=CentOS) > 2017-01-13T01:18:20.847965921+01:00 container resize > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, height=16, image=alisw/mesos-slave:1.0.1, > license=GPLv2, name=mesos-slave, vendor=CentOS, width=134) > 2017-01-13T01:18:21.610141857+01:00 container kill > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > name=mesos-slave, signal=15, vendor=CentOS) > 2017-01-13T01:18:21.610491564+01:00 container kill > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > name=mesos-slave, signal=9, vendor=CentOS) > 2017-01-13T01:18:21.646229213+01:00 container die > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, exitCode=143, image=alisw/mesos-slave:1.0.1, > license=GPLv2, name=mesos-slave, vendor=CentOS) > 2017-01-13T01:18:21.652894124+01:00 network disconnect > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71, > name=host, type=host) > 2017-01-13T01:18:21.705874041+01:00 container stop > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > name=mesos-slave, vendor=CentOS) > ``` > > Ciao, > Giulio > > On 13 Jan 2017, 01:06 +0100, haosdent <haosd...@gmail.com>, wrote: > > Hi, @Giuliio According to your log, it looks normal. Do you have any logs > related to "SIGKILL"? > > On Fri, Jan 13, 2017 at 8:00 AM, Giulio Eulisse <giulio.euli...@gmail.com> > wrote: > >> Hi, >> >> I’ve a setup where I run mesos in docker which works perfectly when I use >> 0.28.2. I now migrated to 1.0.1 (but it’s the same with 1.1.0 and 1.0.0) >> and it seems to receive a sigkill right after saying: >> >> WARNING: Logging before InitGoogleLogging() is written to STDERR >> I0112 23:22:09.889120 4934 main.cpp:243] Build: 2016-08-26 23:06:27 by >> centos >> I0112 23:22:09.889181 4934 main.cpp:244] Version: 1.0.1 >> I0112 23:22:09.889184 4934 main.cpp:247] Git tag: 1.0.1 >> I0112 23:22:09.889188 4934 main.cpp:251] Git SHA: >> 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 >> W0112 23:22:09.890808 4934 openssl.cpp:398] Failed SSL connections will be >> downgraded to a non-SSL socket >> W0112 23:22:09.891237 4934 process.cpp:881] Failed SSL connections will be >> downgraded to a non-SSL socket >> E0112 23:22:10.129096 4934 shell.hpp:106] Command 'hadoop version 2>&1' >> failed; this is the output: >> sh: hadoop: command not found >> 2017-01-12 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@726: Client >> environment:zookeeper.version=zookeeper C client 3.4.8 >> 2017-01-12 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@730: Client >> environment:host.name=XXXX.XXX.ch >> 2017-01-12 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@737: Client >> environment:os.name=Linux >> 2017-01-12 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@738: Client >> environment:os.arch=3.10.0-229.14.1.el7.x86_64 >> 2017-01-12 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@739: Client >> environment:os.version=#1 SMP Tue Sep 15 15:05:51 UTC 2015 >> 2017-01-12 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@747: Client >> environment:user.name=(null) >> 2017-01-12 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@755: Client >> environment:user.home=/root >> 2017-01-12 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@767: Client >> environment:user.dir=/ >> 2017-01-12 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@zookeeper_init@800: >> Initiating client connection, >> host=XXX1.YYY.ch:2181,XXX2.YYY.ch:2181,XXX3.YYY.ch:2181 sessionTimeout=10000 >> watcher=0x7f950ee20300 sessionId=0 sessionPasswd=<null> context=0x >> 7f94f0000c60 flags=0 >> 2017-01-12 23:22:10,134:4934(0x7f9501fd7700):ZOO_INFO@check_events@1728: >> initiated connection to server [XX.YY.ZZ.WW:2181] >> 2017-01-12 23:22:10,146:4934(0x7f9501fd7700):ZOO_INFO@check_events@1775: >> session establishment complete on server [XX.YY.ZZ.WW:2181 >> ], sessionId=0x35828ae70fb2065, negotiated timeout=10000 >> >> Any idea of what might be going on? Looks like an OOM, but I do not see >> it in /var/log/messages and it also happens with --oom-kill-disable. >> >> -- >> Ciao, >> Giulio >> > > > > -- > Best Regards, > Haosdent Huang > >