Actually, no. The docker containers seem to be running just fine. Looks like 
mesos is not able to notice that. Did anything change in the way mesos looks up 
for them? Notice I've both renamed my container to "agent" and added 
MESOS_DOCKER_KILL_ORPHANS=false.



On 13 Jan 2017, 02:14 +0100, haosdent <haosd...@gmail.com>, wrote:
> Is it caused by your container riemann-elasticsearch could not start 
> successfully?
>
> > On Fri, Jan 13, 2017 at 9:10 AM, Giulio Eulisse <giulio.euli...@gmail.com> 
> > wrote:
> > > MMm... it improved things, but now I get a bunch of:
> > >
> > > ```
> > > W0113 01:06:24.757287 17811 slave.cpp:5220] Failed to get resource 
> > > statistics for executor 'riemann-elasticsearch.7fc1bc0b-d92c-11e6-9
> > > 367-02426821a225' of framework 20150626-112246-2475462272-5050-5-0000: 
> > > Failed to run 'docker -H unix:///var/run/docker.sock inspect me
> > > sos-498ff8de-782e-482a-9478-69d3faf5a853-S5.a242fc24-0d32-46e6-af63-299cb82fc01c':
> > >  exited with status 1; stderr='Error: No such image,
> > >  container or task: 
> > > mesos-498ff8de-782e-482a-9478-69d3faf5a853-S5.a242fc24-0d32-46e6-af63-299cb82fc01c
> > > ```
> > >
> > > and then leaves out a bunch of running containers.
> > >
> > > On 13 Jan 2017, 01:51 +0100, Joseph Wu <jos...@mesosphere.io>, wrote:
> > > > If Apache JIRA were up, I'd point you to a JIRA noting the problem with 
> > > > naming docker containers `mesos-*`, as Mesos reserves that prefix (and 
> > > > kills everything it considers "unknown").
> > > >
> > > > As a quick workaround, try setting this flag to false:
> > > > https://github.com/apache/mesos/blob/1.1.x/src/slave/flags.cpp#L590-L596
> > > >
> > > > > On Thu, Jan 12, 2017 at 4:41 PM, Giulio Eulisse 
> > > > > <giulio.euli...@gmail.com> wrote:
> > > > > > MMm... it seems to die after a long sequence of forks, and mesos 
> > > > > > itself seems to be issuing the sigkill. I wonder if it's trying to 
> > > > > > do some cleanup and it does not realise one of the containers is 
> > > > > > the agent itself??? Notice I do have 
> > > > > > `MESOS_DOCKER_MESOS_IMAGE=alisw/mesos-slave:1.0.1` set.
> > > > > >
> > > > > > On 13 Jan 2017, 01:23 +0100, Giulio Eulisse 
> > > > > > <giulio.euli...@gmail.com>, wrote:
> > > > > > > Ciao,
> > > > > > >
> > > > > > > the only thing I could find is by running a parallel `docker 
> > > > > > > events`
> > > > > > >
> > > > > > > ```
> > > > > > > 2017-01-13T01:18:20.766593692+01:00 network connect 
> > > > > > > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 
> > > > > > > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71,
> > > > > > >  name=host, type=host)
> > > > > > > 2017-01-13T01:18:20.846137793+01:00 container start 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, 
> > > > > > > license=GPLv2, name=mesos-slave, vendor=CentOS)
> > > > > > > 2017-01-13T01:18:20.847965921+01:00 container resize 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, height=16, image=alisw/mesos-slave:1.0.1, 
> > > > > > > license=GPLv2, name=mesos-slave, vendor=CentOS, width=134)
> > > > > > > 2017-01-13T01:18:21.610141857+01:00 container kill 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, 
> > > > > > > license=GPLv2, name=mesos-slave, signal=15, vendor=CentOS)
> > > > > > > 2017-01-13T01:18:21.610491564+01:00 container kill 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, 
> > > > > > > license=GPLv2, name=mesos-slave, signal=9, vendor=CentOS)
> > > > > > > 2017-01-13T01:18:21.646229213+01:00 container die 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, exitCode=143, 
> > > > > > > image=alisw/mesos-slave:1.0.1, license=GPLv2, name=mesos-slave, 
> > > > > > > vendor=CentOS)
> > > > > > > 2017-01-13T01:18:21.652894124+01:00 network disconnect 
> > > > > > > 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 
> > > > > > > (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71,
> > > > > > >  name=host, type=host)
> > > > > > > 2017-01-13T01:18:21.705874041+01:00 container stop 
> > > > > > > 1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71 
> > > > > > > (build-date=20161214, image=alisw/mesos-slave:1.0.1, 
> > > > > > > license=GPLv2, name=mesos-slave, vendor=CentOS)
> > > > > > > ```
> > > > > > >
> > > > > > > Ciao,
> > > > > > > Giulio
> > > > > > >
> > > > > > > On 13 Jan 2017, 01:06 +0100, haosdent <haosd...@gmail.com>, wrote:
> > > > > > > > Hi, @Giuliio According to your log, it looks normal. Do you 
> > > > > > > > have any logs related to "SIGKILL"?
> > > > > > > >
> > > > > > > > > On Fri, Jan 13, 2017 at 8:00 AM, Giulio Eulisse 
> > > > > > > > > <giulio.euli...@gmail.com> wrote:
> > > > > > > > > > Hi,
> > > > > > > > > > I’ve a setup where I run mesos in docker which works 
> > > > > > > > > > perfectly when I use 0.28.2. I now migrated to 1.0.1 (but 
> > > > > > > > > > it’s the same with 1.1.0 and 1.0.0) and it seems to receive 
> > > > > > > > > > a sigkill right after saying:
> > > > > > > > > >
> > > > > > > > > > WARNING: Logging before InitGoogleLogging() is written to 
> > > > > > > > > > STDERR
> > > > > > > > > > I0112 23:22:09.889120  4934 main.cpp:243] Build: 2016-08-26 
> > > > > > > > > > 23:06:27 by centos
> > > > > > > > > > I0112 23:22:09.889181  4934 main.cpp:244] Version: 1.0.1
> > > > > > > > > > I0112 23:22:09.889184  4934 main.cpp:247] Git tag: 1.0.1
> > > > > > > > > > I0112 23:22:09.889188  4934 main.cpp:251] Git SHA: 
> > > > > > > > > > 3611eb0b7eea8d144e9b2e840e0ba16f2f659ee3
> > > > > > > > > > W0112 23:22:09.890808  4934 openssl.cpp:398] Failed SSL 
> > > > > > > > > > connections will be downgraded to a non-SSL socket
> > > > > > > > > > W0112 23:22:09.891237  4934 process.cpp:881] Failed SSL 
> > > > > > > > > > connections will be downgraded to a non-SSL socket
> > > > > > > > > > E0112 23:22:10.129096  4934 shell.hpp:106] Command 'hadoop 
> > > > > > > > > > version 2>&1' failed; this is the output:
> > > > > > > > > > sh: hadoop: command not found
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@726: 
> > > > > > > > > > Client environment:zookeeper.version=zookeeper C client 
> > > > > > > > > > 3.4.8
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@730: 
> > > > > > > > > > Client environment:host.name=XXXX.XXX.ch
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@737: 
> > > > > > > > > > Client environment:os.name=Linux
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@738: 
> > > > > > > > > > Client environment:os.arch=3.10.0-229.14.1.el7.x86_64
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,130:4934(0x7f950503b700):ZOO_INFO@log_env@739: 
> > > > > > > > > > Client environment:os.version=#1 SMP Tue Sep 15 15:05:51 
> > > > > > > > > > UTC 2015
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@747: 
> > > > > > > > > > Client environment:user.name=(null)
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@755: 
> > > > > > > > > > Client environment:user.home=/root
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@log_env@767: 
> > > > > > > > > > Client environment:user.dir=/
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,131:4934(0x7f950503b700):ZOO_INFO@zookeeper_init@800:
> > > > > > > > > >  Initiating client connection, 
> > > > > > > > > > host=XXX1.YYY.ch:2181,XXX2.YYY.ch:2181,XXX3.YYY.ch:2181 
> > > > > > > > > > sessionTimeout=10000 watcher=0x7f950ee20300 sessionId=0 
> > > > > > > > > > sessionPasswd=<null> context=0x
> > > > > > > > > > 7f94f0000c60 flags=0
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,134:4934(0x7f9501fd7700):ZOO_INFO@check_events@1728:
> > > > > > > > > >  initiated connection to server [XX.YY.ZZ.WW:2181]
> > > > > > > > > > 2017-01-12 
> > > > > > > > > > 23:22:10,146:4934(0x7f9501fd7700):ZOO_INFO@check_events@1775:
> > > > > > > > > >  session establishment complete on server [XX.YY.ZZ.WW:2181
> > > > > > > > > > ], sessionId=0x35828ae70fb2065, negotiated timeout=10000
> > > > > > > > > >
> > > > > > > > > > Any idea of what might be going on? Looks like an OOM, but 
> > > > > > > > > > I do not see it in /var/log/messages and it also happens 
> > > > > > > > > > with --oom-kill-disable.
> > > > > > > > > > --
> > > > > > > > > > Ciao,
> > > > > > > > > > Giulio
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best Regards,
> > > > > > > > Haosdent Huang
> > > >
>
>
>
> --
> Best Regards,
> Haosdent Huang

Reply via email to