docker rm mesos-slave /usr/bin/docker run --pids-limit -1 --net host -m 0b --privileged \ --oom-kill-disable \ -e LIBPROCESS_SSL_KEY_FILE=/etc/grid-security/hostkey.pem \ -e LIBPROCESS_SSL_CERT_FILE=/etc/grid-security/hostcert.pem \ -e LIBPROCESS_SSL_VERIFY_CERT=false \ -e LIBPROCESS_SSL_SUPPORT_DOWNGRADE=true \ -e LIBPROCESS_SSL_ENABLED=true \ -e MESOS_MASTER_ZK=zk://XXX:2181,XXX:2181,XXX:2181/mesos \ -e MESOS_ATTRIBUTES="os:Linux;is_virtual:true;cpu:GenuineIntel" \ -e MESOS_MASTER_WORKDIR=/build/mesos \ -e MESOS_SYSTEMD_ENABLE_SUPPORT=false \ -e MESOS_LAUNCHER=posix \ -e MESOS_DOCKER_MESOS_IMAGE=alisw/mesos-slave:1.0.1 \ -e MESOS_IMAGE_PROVIDERS=docker \ -e MESOS_ISOLATION=docker/runtime \ -e MESOS_EXTRA_CPUS=1 \ -e MESOS_MODULES=file://etc/mesos-slave/modules \ -e MESOS_RESOURCE_ESTIMATOR=org_apache_mesos_FixedResourceEstimator \ -e MESOS_QOS_CONTROLLER=org_apache_mesos_LoadQoSController \ -e MESOS_LOGGING_LEVEL=WARNING \ -e JENKINS_UID=203 \ -e JENKINS_GID=992 \ -v /var/run/docker.sock:/var/run/docker.sock \ -v /sys/fs/cgroup:/sys/fs/cgroup \ -v /build/docker:/var/lib/docker \ -v /build:/build \ -v /build/log:/var/log \ -v /etc/grid-security:/etc/grid-security:ro -it --pid=host --name mesos-agent -it alisw/mesos-slave:1.0.1 /bin/bash
I also tried with `mesos-agent` as a name. On 13 Jan 2017, 01:46 +0100, haosdent <haosd...@gmail.com>, wrote: > Hi, what the docker command you use to start agents, I remember mesos would > try to recover containers which names start with mesos-slave and kill them if > could not recover successfully. > > > On Jan 13, 2017 8:43 AM, "Giulio Eulisse" <giulio.euli...@gmail.com> wrote: > > > MMm... it seems to die after a long sequence of forks, and mesos itself > > > seems to be issuing the sigkill. I wonder if it's trying to do some > > > cleanup and it does not realise one of the containers is the agent > > > itself??? Notice I do have > > > `MESOS_DOCKER_MESOS_IMAGE=alisw/mesos-slave:1.0.1` set. > > > > > > On 13 Jan 2017, 01:23 +0100, Giulio Eulisse <giulio.euli...@gmail.com>, > > > wrote: > > > > Ciao, > > > > > > > > the only thing I could find is by running a parallel `docker events` > > > > > > > > ``` > > > > 2017-01-13T01:18:20.766593692+01:00 network connect > > > > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 > > > > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71, > > > > name=host, type=host) > > > > 2017-01-13T01:18:20.846137793+01:00 container start > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > > > > name=mesos-slave, vendor=CentOS) > > > > 2017-01-13T01:18:20.847965921+01:00 container resize > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, height=16, image=alisw/mesos-slave:1.0.1, > > > > license=GPLv2, name=mesos-slave, vendor=CentOS, width=134) > > > > 2017-01-13T01:18:21.610141857+01:00 container kill > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > > > > name=mesos-slave, signal=15, vendor=CentOS) > > > > 2017-01-13T01:18:21.610491564+01:00 container kill > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > > > > name=mesos-slave, signal=9, vendor=CentOS) > > > > 2017-01-13T01:18:21.646229213+01:00 container die > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, exitCode=143, image=alisw/mesos-slave:1.0.1, > > > > license=GPLv2, name=mesos-slave, vendor=CentOS) > > > > 2017-01-13T01:18:21.652894124+01:00 network disconnect > > > > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 > > > > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71, > > > > name=host, type=host) > > > > 2017-01-13T01:18:21.705874041+01:00 container stop > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, license=GPLv2, > > > > name=mesos-slave, vendor=CentOS) > > > > ``` > > > > > > > > Ciao, > > > > Giulio > > > > > > > > On 13 Jan 2017, 01:06 +0100, haosdent <haosd...@gmail.com>, wrote: > > > > > Hi, @Giuliio According to your log, it looks normal. Do you have any > > > > > logs related to "SIGKILL"? > > > > > > > > > > > On Fri, Jan 13, 2017 at 8:00 AM, Giulio Eulisse > > > > > > <giulio.euli...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > I’ve a setup where I run mesos in docker which works perfectly > > > > > > > when I use 0.28.2. I now migrated to 1.0.1 (but it’s the same > > > > > > > with 1.1.0 and 1.0.0) and it seems to receive a sigkill right > > > > > > > after saying: > > > > > > > > > > > > > > WARNING: Logging before InitGoogleLogging() is written to STDERR > > > > > > > I0112 23:22:09.889120 4934 main.cpp:243] Build: 2016-08-26 > > > > > > > 23:06:27 by centos > > > > > > > I0112 23:22:09.889181 4934 main.cpp:244] Version: 1.0.1 > > > > > > > I0112 23:22:09.889184 4934 main.cpp:247] Git tag: 1.0.1 > > > > > > > I0112 23:22:09.889188 4934 main.cpp:251] Git SHA: > > > > > > > 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3 > > > > > > > W0112 23:22:09.890808 4934 openssl.cpp:398] Failed SSL > > > > > > > connections will be downgraded to a non-SSL socket > > > > > > > W0112 23:22:09.891237 4934 process.cpp:881] Failed SSL > > > > > > > connections will be downgraded to a non-SSL socket > > > > > > > E0112 23:22:10.129096 4934 shell.hpp:106] Command 'hadoop > > > > > > > version 2>&1' failed; this is the output: > > > > > > > sh: hadoop: command not found > > > > > > > 2017-01-12 > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@726: Client > > > > > > > environment:zookeeper.version=zookeeper C client 3.4.8 > > > > > > > 2017-01-12 > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@730: Client > > > > > > > environment:host.name=XXXX.XXX.ch > > > > > > > 2017-01-12 > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@737: Client > > > > > > > environment:os.name=Linux > > > > > > > 2017-01-12 > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@738: Client > > > > > > > environment:os.arch=3.10.0-229.14.1.el7.x86_64 > > > > > > > 2017-01-12 > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@739: Client > > > > > > > environment:os.version=#1 SMP Tue Sep 15 15:05:51 UTC 2015 > > > > > > > 2017-01-12 > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@747: Client > > > > > > > environment:user.name=(null) > > > > > > > 2017-01-12 > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@755: Client > > > > > > > environment:user.home=/root > > > > > > > 2017-01-12 > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@767: Client > > > > > > > environment:user.dir=/ > > > > > > > 2017-01-12 > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@zookeeper_init@800: > > > > > > > Initiating client connection, > > > > > > > host=XXX1.YYY.ch:2181,XXX2.YYY.ch:2181,XXX3.YYY.ch:2181 > > > > > > > sessionTimeout=10000 watcher=0x7f950ee20300 sessionId=0 > > > > > > > sessionPasswd=<null> context=0x > > > > > > > 7f94f0000c60 flags=0 > > > > > > > 2017-01-12 > > > > > > > 23:22:10,134:4934(0x7f9501fd7700):ZOO_INFO@check_events@1728: > > > > > > > initiated connection to server [XX.YY.ZZ.WW:2181] > > > > > > > 2017-01-12 > > > > > > > 23:22:10,146:4934(0x7f9501fd7700):ZOO_INFO@check_events@1775: > > > > > > > session establishment complete on server [XX.YY.ZZ.WW:2181 > > > > > > > ], sessionId=0x35828ae70fb2065, negotiated timeout=10000 > > > > > > > > > > > > > > Any idea of what might be going on? Looks like an OOM, but I do > > > > > > > not see it in /var/log/messages and it also happens with > > > > > > > --oom-kill-disable. > > > > > > > -- > > > > > > > Ciao, > > > > > > > Giulio > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards, > > > > > Haosdent Huang >