
I used https://github.com/deric/mesos-deb-packaging to compile and package
the latest master (0.26.x) and deployed it to the cluster, and now health
checks are working as advertised in both Marathon and my own framework!
Not sure what was going on with health-checks in 0.24.0..

Anyways, thanks again for your help Haosdent!


On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <outtat...@gmail.com> wrote:

> Hi Haosdent,
> Can you share your Marathon POST request that results in Mesos executing
> the health checks?
> Since we can reference the Marathon framework, I've been doing some
> digging around.
> Here are the details of my setup and findings:
> I put a few small hacks in Marathon:
> (1) Added com.googlecode.protobuf.format to Marathon's dependencies
> (2) Edited the following files so TaskInfo is dumped as JSON to /tmp/X in
> both the TaskFactory as well an right before the task is sent to Mesos via
> driver.launchTasks:
> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala:
> $ git diff
>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala
>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() (
>>      new TaskBuilder(app, taskIdUtil.newTaskId,
>> config).buildIfMatches(offer, runningTasks).map {
>>        case (taskInfo, ports) =>
>> +        import com.googlecode.protobuf.format.JsonFormat
>> +        import java.io._
>> +        val bw = new BufferedWriter(new FileWriter(new
>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue)))
>> +        bw.write(JsonFormat.printToString(taskInfo))
>> +        bw.write("\n")
>> +        bw.close()
>>          CreatedTask(
>>            taskInfo,
>>            MarathonTasks.makeTask(
> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala:
> $ git diff
>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala
>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl(
>>    override def launchTasks(offerID: OfferID, taskInfos: Seq[TaskInfo]):
>> Boolean = {
>>      val launched = withDriver(s"launchTasks($offerID)") { driver =>
>>        import scala.collection.JavaConverters._
>> +      var i = 0
>> +      for (i <- 0 to taskInfos.length - 1) {
>> +        import com.googlecode.protobuf.format.JsonFormat
>> +        import java.io._
>> +        val file = new File("/tmp/taskJson2-" + i.toString() + "-" +
>> taskInfos(i).getTaskId.getValue)
>> +        val bw = new BufferedWriter(new FileWriter(file))
>> +        bw.write(JsonFormat.printToString(taskInfos(i)))
>> +        bw.write("\n")
>> +        bw.close()
>> +      }
>>        driver.launchTasks(Collections.singleton(offerID),
>> taskInfos.asJava)
>>      }
> Then I built and deployed the hacked Marathon and restarted the marathon
> service.
> Next I created the app via the Marathon API ("hello app" is a container
> with a simple hello-world ruby app running on
> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type:
>> application/json' -d'
>> {
>>   "id": "/app-81-1-hello-app",
>>   "apps": [
>>     {
>>       "id": "/app-81-1-hello-app/web-v11",
>>       "container": {
>>         "type": "DOCKER",
>>         "docker": {
>>           "image":
>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>           "network": "BRIDGE",
>>           "portMappings": [
>>             {
>>               "containerPort": 8000,
>>               "hostPort": 0,
>>               "protocol": "tcp"
>>             }
>>           ]
>>         }
>>       },
>>       "env": {
>>       },
>>       "healthChecks": [
>>         {
>>           "protocol": "COMMAND",
>>           "command": {"value": "exit 1"},
>>           "gracePeriodSeconds": 10,
>>           "intervalSeconds": 10,
>>           "timeoutSeconds": 10,
>>           "maxConsecutiveFailures": 3
>>         }
>>       ],
>>       "instances": 1,
>>       "cpus": 1,
>>       "mem": 512
>>     }
>>   ]
>> }
> $ ls /tmp/
>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> Do they match?
> $ md5sum /tmp/task*
>> 1b5115997e78e2611654059249d99578
>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>> 1b5115997e78e2611654059249d99578
>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> Yes, so I am confident this is the information being sent across the wire
> to Mesos.
> Do they contain any health-check information?
> $ cat
>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
>> {
>>   "name":"web-v11.app-81-1-hello-app",
>>   "task_id":{
>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>   },
>>   "slave_id":{
>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>   },
>>   "resources":[
>>     {
>>       "name":"cpus",
>>       "type":"SCALAR",
>>       "scalar":{
>>         "value":1.0
>>       },
>>       "role":"*"
>>     },
>>     {
>>       "name":"mem",
>>       "type":"SCALAR",
>>       "scalar":{
>>         "value":512.0
>>       },
>>       "role":"*"
>>     },
>>     {
>>       "name":"ports",
>>       "type":"RANGES",
>>       "ranges":{
>>         "range":[
>>           {
>>             "begin":31641,
>>             "end":31641
>>           }
>>         ]
>>       },
>>       "role":"*"
>>     }
>>   ],
>>   "command":{
>>     "environment":{
>>       "variables":[
>>         {
>>           "name":"PORT_8000",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"MARATHON_APP_VERSION",
>>           "value":"2015-10-07T19:35:08.386Z"
>>         },
>>         {
>>           "name":"HOST",
>>           "value":"mesos-worker1a"
>>         },
>>         {
>>           "name":"MARATHON_APP_DOCKER_IMAGE",
>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966"
>>         },
>>         {
>>           "name":"MESOS_TASK_ID",
>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0"
>>         },
>>         {
>>           "name":"PORT",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"PORTS",
>>           "value":"31641"
>>         },
>>         {
>>           "name":"MARATHON_APP_ID",
>>           "value":"/app-81-1-hello-app/web-v11"
>>         },
>>         {
>>           "name":"PORT0",
>>           "value":"31641"
>>         }
>>       ]
>>     },
>>     "shell":false
>>   },
>>   "container":{
>>     "type":"DOCKER",
>>     "docker":{
>>       "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966",
>>       "network":"BRIDGE",
>>       "port_mappings":[
>>         {
>>           "host_port":31641,
>>           "container_port":8000,
>>           "protocol":"tcp"
>>         }
>>       ],
>>       "privileged":false,
>>       "force_pull_image":false
>>     }
>>   }
>> }
> No, I don't see anything about any health check.
> Mesos STDOUT for the launched task:
> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --stop_timeout="0ns"
>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --docker="docker" --help="false" --initialize_driver_logging="true"
>> --logbufsecs="0" --logging_level="INFO"
>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false"
>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da"
>> --stop_timeout="0ns"
>> Registered docker executor on mesos-worker1a
>> Starting task
>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0
> I1007 19:35:08.790743  4612 exec.cpp:134] Version: 0.24.0
>> I1007 19:35:08.793416  4619 exec.cpp:208] Executor registered on slave
>> 20150924-210922-1608624320-5050-1792-S1
>> WARNING: Your kernel does not support swap limit capabilities, memory
>> limited without swap.
> Again, nothing about any health checks.
> Any ideas of other things to try or what I could be missing?  Can't say
> either way about the Mesos health-check system working or not if Marathon
> won't put the health-check into the task it sends to Mesos.
> Thanks for all your help!
> Best,
> Jay
> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosd...@gmail.com> wrote:
>> Maybe you could post your executor stdout/stderr so that we could know
>> whether health check running not.
>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosd...@gmail.com> wrote:
>>> marathon also use mesos health check. When I use health check, I could
>>> saw the log like this in executor stdout.
>>> ```
>>> Registered docker executor on xxxxx
>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000
>>> Launching health check process:
>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx
>>> Health check process launched at pid: 9895
>>> Received task health update, healthy: true
>>> ```
>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <outtat...@gmail.com> wrote:
>>>> I am using my own framework, and the full task info I'm using is posted
>>>> earlier in this thread.  Do you happen to know if Marathon uses Mesos's
>>>> health checks for its health check system?
>>>> On Oct 6, 2015, at 9:01 PM, haosdent <haosd...@gmail.com> wrote:
>>>> Yes, launch the health task through its definition in taskinfo. Do you
>>>> launch your task through Marathon? I could test it in my side.
>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <outtat...@gmail.com>
>>>> wrote:
>>>>> Precisely, and there are none of those statements.  Are you or others
>>>>> confident health-checks are part of the code path when defined via task
>>>>> info for docker container tasks?  Going through the code, I wasn't able to
>>>>> find the linkage for anything other than health-checks triggered through a
>>>>> custom executor.
>>>>> With that being said it is a pretty good sized code base and I'm not
>>>>> very familiar with it, so my analysis this far has by no means been
>>>>> exhaustive.
>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <haosd...@gmail.com> wrote:
>>>>> When health check launch, it would have a log like this in your
>>>>> executor stdout
>>>>> ```
>>>>> Health check process launched at pid xxx
>>>>> ```
>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtat...@gmail.com>
>>>>> wrote:
>>>>>> I'm happy to try this, however wouldn't there be output in the logs
>>>>>> with the string "health" or "Health" if the health-check were active?  
>>>>>> None
>>>>>> of my master or slave logs contain the string..
>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <haosd...@gmail.com> wrote:
>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could see
>>>>>> unhealthy status in your task stdout/stderr.
>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <outtat...@gmail.com>
>>>>>> wrote:
>>>>>>> My current version is 0.24.1.
>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <haosd...@gmail.com> wrote:
>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1
>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0
>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7
>>>>>>>> Are you use one of this version?
>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosd...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me double
>>>>>>>>> check.
>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor <outtat...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> Oops- Now I see you already said it's in master.  I'll look there
>>>>>>>>>> :)
>>>>>>>>>> Thanks again!
>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor <j...@jaytaylor.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Great, thanks for the quick reply Tim!
>>>>>>>>>>> Do you know if there is a branch I can checkout to test it out?
>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen <t...@mesosphere.io>
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Jay,
>>>>>>>>>>>> We just added health check support for docker tasks that's in
>>>>>>>>>>>> master but not yet released. It will run docker exec with the 
>>>>>>>>>>>> command you
>>>>>>>>>>>> provided as health checks.
>>>>>>>>>>>> It should be in the next release.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Tim
>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor <outtat...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> Does Mesos support health checks for docker image tasks?  Mesos
>>>>>>>>>>>> seems to be ignoring the TaskInfo.HealthCheck field for me.
>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos:
>>>>>>>>>>>> {
>>>>>>>>>>>>>   "name":"hello-app.web.v3",
>>>>>>>>>>>>>   "task_id":{
>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "slave_id":{
>>>>>>>>>>>>>     "value":"20150924-210922-1608624320-5050-1792-S1"
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "resources":[
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"cpus",
>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>         "value":0.1
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"mem",
>>>>>>>>>>>>>       "type":0,
>>>>>>>>>>>>>       "scalar":{
>>>>>>>>>>>>>         "value":256
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     {
>>>>>>>>>>>>>       "name":"ports",
>>>>>>>>>>>>>       "type":1,
>>>>>>>>>>>>>       "ranges":{
>>>>>>>>>>>>>         "range":[
>>>>>>>>>>>>>           {
>>>>>>>>>>>>>             "begin":31002,
>>>>>>>>>>>>>             "end":31002
>>>>>>>>>>>>>           }
>>>>>>>>>>>>>         ]
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   ],
>>>>>>>>>>>>>   "command":{
>>>>>>>>>>>>>     "container":{
>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103"
>>>>>>>>>>>>>     },
>>>>>>>>>>>>>     "shell":false
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "container":{
>>>>>>>>>>>>>     "type":1,
>>>>>>>>>>>>>     "docker":{
>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103",
>>>>>>>>>>>>>       "network":2,
>>>>>>>>>>>>>       "port_mappings":[
>>>>>>>>>>>>>         {
>>>>>>>>>>>>>           "host_port":31002,
>>>>>>>>>>>>>           "container_port":8000,
>>>>>>>>>>>>>           "protocol":"tcp"
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>       ],
>>>>>>>>>>>>>       "privileged":false,
>>>>>>>>>>>>>       "parameters":[],
>>>>>>>>>>>>>       "force_pull_image":false
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   },
>>>>>>>>>>>>>   "health_check":{
>>>>>>>>>>>>>     "delay_seconds":5,
>>>>>>>>>>>>>     "interval_seconds":10,
>>>>>>>>>>>>>     "timeout_seconds":10,
>>>>>>>>>>>>>     "consecutive_failures":3,
>>>>>>>>>>>>>     "grace_period_seconds":0,
>>>>>>>>>>>>>     "command":{
>>>>>>>>>>>>>       "shell":true,
>>>>>>>>>>>>>       "value":"sleep 5",
>>>>>>>>>>>>>       "user":"root"
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>   }
>>>>>>>>>>>>> }
>>>>>>>>>>>> I have searched all machines and containers to see if they ever
>>>>>>>>>>>> run the command (in this case `sleep 5`), but have not found any 
>>>>>>>>>>>> indication
>>>>>>>>>>>> that it is being executed.
>>>>>>>>>>>> In the mesos src code the health-checks are invoked from
>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask.  
>>>>>>>>>>>> Does this
>>>>>>>>>>>> mean that health-checks are only supported for custom executors 
>>>>>>>>>>>> and not for
>>>>>>>>>>>> docker tasks?
>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero
>>>>>>>>>>>> exit-status of a health-check command translate to task health.
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Jay
>>>>>>>>> --
>>>>>>>>> Best Regards,
>>>>>>>>> Haosdent Huang
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>> --
>>> Best Regards,
>>> Haosdent Huang
>> --
>> Best Regards,
>> Haosdent Huang

Reply via email to