Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-16 Thread haosdent
As the log show, it failed when perform below command to find the container status. ``` docker -H unix:///var/run/docker.sock inspect mesos-498ff8de-782e-482a-9478-69d3faf5a853-S5.a242fc24-0d32-46e6-af63-299cb82fc01c ``` have you mount the sock file from host to your agent container? On Fri, Ja

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-13 Thread Giulio Eulisse
Actually, no. The docker containers seem to be running just fine. Looks like mesos is not able to notice that. Did anything change in the way mesos looks up for them? Notice I've both renamed my container to "agent" and added  MESOS_DOCKER_KILL_ORPHANS=false. On 13 Jan 2017, 02:14 +0100, haos

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread haosdent
Is it caused by your container riemann-elasticsearch could not start successfully? On Fri, Jan 13, 2017 at 9:10 AM, Giulio Eulisse wrote: > MMm... it improved things, but now I get a bunch of: > > ``` > W0113 01:06:24.757287 17811 slave.cpp:5220] Failed to get resource > statistics for executor

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Giulio Eulisse
MMm... it improved things, but now I get a bunch of: ``` W0113 01:06:24.757287 17811 slave.cpp:5220] Failed to get resource statistics for executor 'riemann-elasticsearch.7fc1bc0b-d92c-11e6-9 367-02426821a225' of framework 20150626-112246-2475462272-5050-5-: Failed to run 'docker -H unix:///

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread haosdent
yep, it fixed in 1.1.0 https://www.mail-archive.com/issues@mesos.apache.org/msg33959.html On Fri, Jan 13, 2017 at 8:51 AM, Joseph Wu wrote: > If Apache JIRA were up, I'd point you to a JIRA noting the problem with > naming docker containers `mesos-*`, as Mesos reserves that prefix (and > kills e

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Joseph Wu
If Apache JIRA were up, I'd point you to a JIRA noting the problem with naming docker containers `mesos-*`, as Mesos reserves that prefix (and kills everything it considers "unknown"). As a quick workaround, try setting this flag to false: https://github.com/apache/mesos/blob/1.1.x/src/slave/flags

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Giulio Eulisse
docker rm mesos-slave /usr/bin/docker run --pids-limit -1 --net host -m 0b --privileged      \   --oom-kill-disable \     -e LIBPROCESS_SSL_KEY_FILE=/etc/grid-security/hostkey.pem \     -e LIBPROCESS_SSL_CERT_FILE=/etc/grid-security/hostcert.pem      \     -e LIBPROCESS_SSL_VERIFY_CERT=false      \

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread haosdent
Hi, what the docker command you use to start agents, I remember mesos would try to recover containers which names start with mesos-slave and kill them if could not recover successfully. On Jan 13, 2017 8:43 AM, "Giulio Eulisse" wrote: MMm... it seems to die after a long sequence of forks, and me

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Giulio Eulisse
MMm... it seems to die after a long sequence of forks, and mesos itself seems to be issuing the sigkill. I wonder if it's trying to do some cleanup and it does not realise one of the containers is the agent itself??? Notice I do have `MESOS_DOCKER_MESOS_IMAGE=alisw/mesos-slave:1.0.1` set. On 13

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Giulio Eulisse
Ciao, the only thing I could find is by running a parallel `docker events` ``` 2017-01-13T01:18:20.766593692+01:00 network connect 32441cb5f42b009580e104a8360e544beec7120bb6fff800f16dbee421454267 (container=1fddd8e8f956f4545c8b36b088eeca74d157eb1923867d28bf2d919d27babb71, name=host, type=host)

Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread Giulio Eulisse
Hi, I’ve a setup where I run mesos in docker which works perfectly when I use 0.28.2. I now migrated to 1.0.1 (but it’s the same with 1.1.0 and 1.0.0) and it seems to receive a sigkill right after saying: WARNING: Logging before InitGoogleLogging() is written to STDERR I0112 23:22:09.889120 4934

Re: Sigkill while running mesos agent (1.0.1) in docker

2017-01-12 Thread haosdent
Hi, @Giuliio According to your log, it looks normal. Do you have any logs related to "SIGKILL"? On Fri, Jan 13, 2017 at 8:00 AM, Giulio Eulisse wrote: > Hi, > > I’ve a setup where I run mesos in docker which works perfectly when I use > 0.28.2. I now migrated to 1.0.1 (but it’s the same with 1.1