Yes. The related code is located in https://github.com/apache/mesos/blob/master/src/slave/main.cpp#L123
In fact, environment variables starts with MESOS_ would load as flags variables. https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/flags/flags.hpp#L52 On Fri, Oct 9, 2015 at 11:33 AM, Jay Taylor <outtat...@gmail.com> wrote: > One question for you haosdent- > > You mentioned that the flags.launcher_dir should propagate to the docker > executor all the way up the chain. Can you show me where this logic is in > the codebase? I didn't see where that was happening and would like to > understand the mechanism. > > Thanks! > Jay > > > > On Oct 8, 2015, at 8:29 PM, Jay Taylor <outtat...@gmail.com> wrote: > > Maybe tomorrow I will build a fresh cluster from scratch to see if the > broken behavior experienced today still persists. > > On Oct 8, 2015, at 7:52 PM, haosdent <haosd...@gmail.com> wrote: > > As far as I know, MESOS_LAUNCHER_DIR is works by set flags.launcher_dir > which would find mesos-docker-executor and mesos-health-check under this > dir. Although the env is not propagated, but MESOS_LAUNCHER_DIR still > works because flags.launcher_dir is get from it. > > For example, because I > ``` > export MESOS_LAUNCHER_DIR=/tmp > ``` > before start mesos-slave. So when I launch slave, I could find this log in > slave log > ``` > I1009 10:27:26.594599 1416 slave.cpp:203] Flags at startup: > xxxxx --launcher_dir="/tmp" > ``` > > And from your log, I not sure why your MESOS_LAUNCHER_DIR become sandbox > dir. Is it because MESOS_LAUNCHER_DIR is overrided in your other scripts? > > > On Fri, Oct 9, 2015 at 1:56 AM, Jay Taylor <outtat...@gmail.com> wrote: > >> I haven't ever changed MESOS_LAUNCHER_DIR/--launcher_dir before. >> >> I just tried setting both the env var and flag on the slaves, and have >> determined that the env var is not present when it is being checked >> src/docker/executor.cpp @ line 573: >> >> const Option<string> envPath = os::getenv("MESOS_LAUNCHER_DIR"); >>> string path = >>> envPath.isSome() ? envPath.get() >>> : os::realpath(Path(argv[0]).dirname()).get(); >>> cout << "MESOS_LAUNCHER_DIR: envpath.isSome()->" << (envPath.isSome() >>> ? "yes" : "no") << endl; >>> cout << "MESOS_LAUNCHER_DIR: path='" << path << "'" << endl; >> >> >> Exported MESOS_LAUNCHER_DIR env var (and verified it is correctly >> propagated along up to the point of mesos-slave launch): >> >> $ cat /etc/default/mesos-slave >>> export >>> MESOS_MASTER="zk://mesos-primary1a:2181,mesos-primary2a:2181,mesos-primary3a:2181/mesos" >>> export MESOS_CONTAINERIZERS="mesos,docker" >>> export MESOS_EXECUTOR_REGISTRATION_TIMEOUT="5mins" >>> export MESOS_PORT="5050" >>> export MESOS_LAUNCHER_DIR="/usr/libexec/mesos" >> >> >> TASK OUTPUT: >> >> >>> *MESOS_LAUNCHER_DIR: envpath.isSome()->no**MESOS_LAUNCHER_DIR: >>> path='/tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad'* >>> Registered docker executor on mesos-worker2a >>> Starting task hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>> Launching health check process: >>> /tmp/mesos/slaves/61373c0e-7349-4173-ab8d-9d7b260e8a30-S1/frameworks/20150924-210922-1608624320-5050-1792-0020/executors/hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253/runs/41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad/mesos-health-check >>> --executor=(1)@192.168.225.59:44523 >>> --health_check_json={"command":{"shell":true,"value":"docker exec >>> mesos-61373c0e-7349-4173-ab8d-9d7b260e8a30-S1.41f8eed6-ec6c-4e6f-b1aa-0a2817a600ad >>> sh -c \" \/bin\/bash >>> \""},"consecutive_failures":3,"delay_seconds":5.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>> --task_id=hello-app_web-v3.22f9c7e4-2109-48a9-998e-e116141ec253 >>> Health check process launched at pid: 2519 >> >> >> The env var is not propagated when the docker executor is launched >> in src/slave/containerizer/docker.cpp around line 903: >> >> vector<string> argv; >>> argv.push_back("mesos-docker-executor"); >>> // Construct the mesos-docker-executor using the "name" we gave the >>> // container (to distinguish it from Docker containers not created >>> // by Mesos). >>> Try<Subprocess> s = subprocess( >>> path::join(flags.launcher_dir, "mesos-docker-executor"), >>> argv, >>> Subprocess::PIPE(), >>> Subprocess::PATH(path::join(container->directory, "stdout")), >>> Subprocess::PATH(path::join(container->directory, "stderr")), >>> dockerFlags(flags, container->name(), container->directory), >>> environment, >>> lambda::bind(&setup, container->directory)); >> >> >> A little ways above we can see the environment is setup w/ the container >> tasks defined env vars. >> >> See src/slave/containerizer/docker.cpp around line 871: >> >> // Include any enviroment variables from ExecutorInfo. >>> foreach (const Environment::Variable& variable, >>> container->executor.command().environment().variables()) { >>> environment[variable.name()] = variable.value(); >>> } >> >> >> Should I file a JIRA for this? Have I overlooked anything? >> >> >> On Wed, Oct 7, 2015 at 8:11 PM, haosdent <haosd...@gmail.com> wrote: >> >>> >Not sure what was going on with health-checks in 0.24.0. >>> 0.24.1 should be works. >>> >>> >Do any of you know which host the path >>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>> should exist on? It definitely doesn't exist on the slave, hence execution >>> failing. >>> >>> Does you set MESOS_LAUNCHER_DIR/--launcher_dir incorrectly before? We >>> got mesos-health-check from MESOS_LAUNCHER_DIR/--launcher_id or use the >>> same dir of mesos-docker-executor. >>> >>> On Thu, Oct 8, 2015 at 10:46 AM, Jay Taylor <outtat...@gmail.com> wrote: >>> >>>> Maybe I spoke too soon. >>>> >>>> Now the checks are attempting to run, however the STDERR is not looking >>>> good. I've added some debugging to the error message output to show the >>>> path, argv, and envp variables: >>>> >>>> STDOUT: >>>> >>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>> --stop_timeout="0ns" >>>>> --container="mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>> --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" >>>>> --initialize_driver_logging="true" --logbufsecs="0" --logging_level="INFO" >>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>> --sandbox_directory="/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc" >>>>> --stop_timeout="0ns" >>>>> Registered docker executor on mesos-worker2a >>>>> Starting task >>>>> app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>> Launching health check process: >>>>> /tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check >>>>> --executor=(1)@192.168.225.59:43917 >>>>> --health_check_json={"command":{"shell":true,"value":"docker exec >>>>> mesos-16b49e90-6852-4c91-8e70-d89c54f25668-S1.73dbfe88-1dbb-4f61-9a52-c365558cdbfc >>>>> sh -c \" exit 1 >>>>> \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":10.0,"interval_seconds":10.0,"timeout_seconds":10.0} >>>>> --task_id=app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0 >>>>> Health check process launched at pid: 3012 >>>> >>>> >>>> STDERR: >>>> >>>> I1008 02:17:28.870434 2770 exec.cpp:134] Version: 0.26.0 >>>>> I1008 02:17:28.871860 2778 exec.cpp:208] Executor registered on slave >>>>> 16b49e90-6852-4c91-8e70-d89c54f25668-S1 >>>>> WARNING: Your kernel does not support swap limit capabilities, memory >>>>> limited without swap. >>>>> ABORT: (src/subprocess.cpp:180): Failed to os::execvpe in childMain >>>>> (path.c_str()='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>> argv='/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check', >>>>> envp=''): No such file or directory*** Aborted at 1444270649 (unix time) >>>>> try "date -d @1444270649" if you are using GNU date *** >>>>> PC: @ 0x7f4a37ec6cc9 (unknown) >>>>> *** SIGABRT (@0xbc4) received by PID 3012 (TID 0x7f4a2f9f6700) from >>>>> PID 3012; stack trace: *** >>>>> @ 0x7f4a38265340 (unknown) >>>>> @ 0x7f4a37ec6cc9 (unknown) >>>>> @ 0x7f4a37eca0d8 (unknown) >>>>> @ 0x4191e2 _Abort() >>>>> @ 0x41921c _Abort() >>>>> @ 0x7f4a39dc2768 process::childMain() >>>>> @ 0x7f4a39dc4f59 std::_Function_handler<>::_M_invoke() >>>>> @ 0x7f4a39dc24fc process::defaultClone() >>>>> @ 0x7f4a39dc34fb process::subprocess() >>>>> @ 0x43cc9c >>>>> mesos::internal::docker::DockerExecutorProcess::launchHealthCheck() >>>>> @ 0x7f4a39d924f4 process::ProcessManager::resume() >>>>> @ 0x7f4a39d92827 >>>>> _ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv >>>>> @ 0x7f4a38a47e40 (unknown) >>>>> @ 0x7f4a3825d182 start_thread >>>>> @ 0x7f4a37f8a47d (unknown) >>>> >>>> >>>> Do any of you know which host the path >>>> "/tmp/mesos/slaves/16b49e90-6852-4c91-8e70-d89c54f25668-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.b89f2ceb-6d62-11e5-9827-080027477de0/runs/73dbfe88-1dbb-4f61-9a52-c365558cdbfc/mesos-health-check" >>>> should exist on? It definitely doesn't exist on the slave, hence >>>> execution failing. >>>> >>>> This is with current master, git hash >>>> 5058fac1083dc91bca54d33c26c810c17ad95dd1. >>>> >>>> commit 5058fac1083dc91bca54d33c26c810c17ad95dd1 >>>>> Author: Anand Mazumdar <mazumdar.an...@gmail.com> >>>>> Date: Tue Oct 6 17:37:41 2015 -0700 >>>> >>>> >>>> -Jay >>>> >>>> On Wed, Oct 7, 2015 at 5:23 PM, Jay Taylor <outtat...@gmail.com> wrote: >>>> >>>>> Update: >>>>> >>>>> I used https://github.com/deric/mesos-deb-packaging to compile and >>>>> package the latest master (0.26.x) and deployed it to the cluster, and now >>>>> health checks are working as advertised in both Marathon and my own >>>>> framework! Not sure what was going on with health-checks in 0.24.0.. >>>>> >>>>> Anyways, thanks again for your help Haosdent! >>>>> >>>>> Cheers, >>>>> Jay >>>>> >>>>> On Wed, Oct 7, 2015 at 12:53 PM, Jay Taylor <outtat...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Haosdent, >>>>>> >>>>>> Can you share your Marathon POST request that results in Mesos >>>>>> executing the health checks? >>>>>> >>>>>> Since we can reference the Marathon framework, I've been doing some >>>>>> digging around. >>>>>> >>>>>> Here are the details of my setup and findings: >>>>>> >>>>>> I put a few small hacks in Marathon: >>>>>> >>>>>> (1) Added com.googlecode.protobuf.format to Marathon's dependencies >>>>>> >>>>>> (2) Edited the following files so TaskInfo is dumped as JSON to >>>>>> /tmp/X in both the TaskFactory as well an right before the task is sent >>>>>> to >>>>>> Mesos via driver.launchTasks: >>>>>> >>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala: >>>>>> >>>>>> $ git diff >>>>>>> src/main/scala/mesosphere/marathon/tasks/DefaultTaskFactory.scala >>>>>>> @@ -25,6 +25,12 @@ class DefaultTaskFactory @Inject() ( >>>>>>> >>>>>>> new TaskBuilder(app, taskIdUtil.newTaskId, >>>>>>> config).buildIfMatches(offer, runningTasks).map { >>>>>>> case (taskInfo, ports) => >>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>> + import java.io._ >>>>>>> + val bw = new BufferedWriter(new FileWriter(new >>>>>>> File("/tmp/taskjson1-" + taskInfo.getTaskId.getValue))) >>>>>>> + bw.write(JsonFormat.printToString(taskInfo)) >>>>>>> + bw.write("\n") >>>>>>> + bw.close() >>>>>>> CreatedTask( >>>>>>> taskInfo, >>>>>>> MarathonTasks.makeTask( >>>>>> >>>>>> >>>>>> >>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala: >>>>>> >>>>>> $ git diff >>>>>>> src/main/scala/mesosphere/marathon/core/launcher/impl/TaskLauncherImpl.scala >>>>>>> @@ -24,6 +24,16 @@ private[launcher] class TaskLauncherImpl( >>>>>>> override def launchTasks(offerID: OfferID, taskInfos: >>>>>>> Seq[TaskInfo]): Boolean = { >>>>>>> val launched = withDriver(s"launchTasks($offerID)") { driver => >>>>>>> import scala.collection.JavaConverters._ >>>>>>> + var i = 0 >>>>>>> + for (i <- 0 to taskInfos.length - 1) { >>>>>>> + import com.googlecode.protobuf.format.JsonFormat >>>>>>> + import java.io._ >>>>>>> + val file = new File("/tmp/taskJson2-" + i.toString() + "-" >>>>>>> + taskInfos(i).getTaskId.getValue) >>>>>>> + val bw = new BufferedWriter(new FileWriter(file)) >>>>>>> + bw.write(JsonFormat.printToString(taskInfos(i))) >>>>>>> + bw.write("\n") >>>>>>> + bw.close() >>>>>>> + } >>>>>>> driver.launchTasks(Collections.singleton(offerID), >>>>>>> taskInfos.asJava) >>>>>>> } >>>>>> >>>>>> >>>>>> Then I built and deployed the hacked Marathon and restarted the >>>>>> marathon service. >>>>>> >>>>>> Next I created the app via the Marathon API ("hello app" is a >>>>>> container with a simple hello-world ruby app running on 0.0.0.0:8000) >>>>>> >>>>>> curl http://mesos-primary1a:8080/v2/groups -XPOST -H'Content-Type: >>>>>>> application/json' -d' >>>>>>> { >>>>>>> "id": "/app-81-1-hello-app", >>>>>>> "apps": [ >>>>>>> { >>>>>>> "id": "/app-81-1-hello-app/web-v11", >>>>>>> "container": { >>>>>>> "type": "DOCKER", >>>>>>> "docker": { >>>>>>> "image": >>>>>>> "docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>> "network": "BRIDGE", >>>>>>> "portMappings": [ >>>>>>> { >>>>>>> "containerPort": 8000, >>>>>>> "hostPort": 0, >>>>>>> "protocol": "tcp" >>>>>>> } >>>>>>> ] >>>>>>> } >>>>>>> }, >>>>>>> "env": { >>>>>>> >>>>>>> }, >>>>>>> "healthChecks": [ >>>>>>> { >>>>>>> "protocol": "COMMAND", >>>>>>> "command": {"value": "exit 1"}, >>>>>>> "gracePeriodSeconds": 10, >>>>>>> "intervalSeconds": 10, >>>>>>> "timeoutSeconds": 10, >>>>>>> "maxConsecutiveFailures": 3 >>>>>>> } >>>>>>> ], >>>>>>> "instances": 1, >>>>>>> "cpus": 1, >>>>>>> "mem": 512 >>>>>>> } >>>>>>> ] >>>>>>> } >>>>>> >>>>>> >>>>>> $ ls /tmp/ >>>>>>> >>>>>>> taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> >>>>>>> taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> >>>>>> >>>>>> Do they match? >>>>>> >>>>>> $ md5sum /tmp/task* >>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>> >>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> 1b5115997e78e2611654059249d99578 >>>>>>> >>>>>>> /tmp/taskJson2-0-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> >>>>>> >>>>>> Yes, so I am confident this is the information being sent across the >>>>>> wire to Mesos. >>>>>> >>>>>> Do they contain any health-check information? >>>>>> >>>>>> $ cat >>>>>>> /tmp/taskjson1-app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>>> { >>>>>>> "name":"web-v11.app-81-1-hello-app", >>>>>>> "task_id":{ >>>>>>> >>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>> }, >>>>>>> "slave_id":{ >>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>> }, >>>>>>> "resources":[ >>>>>>> { >>>>>>> "name":"cpus", >>>>>>> "type":"SCALAR", >>>>>>> "scalar":{ >>>>>>> "value":1.0 >>>>>>> }, >>>>>>> "role":"*" >>>>>>> }, >>>>>>> { >>>>>>> "name":"mem", >>>>>>> "type":"SCALAR", >>>>>>> "scalar":{ >>>>>>> "value":512.0 >>>>>>> }, >>>>>>> "role":"*" >>>>>>> }, >>>>>>> { >>>>>>> "name":"ports", >>>>>>> "type":"RANGES", >>>>>>> "ranges":{ >>>>>>> "range":[ >>>>>>> { >>>>>>> "begin":31641, >>>>>>> "end":31641 >>>>>>> } >>>>>>> ] >>>>>>> }, >>>>>>> "role":"*" >>>>>>> } >>>>>>> ], >>>>>>> "command":{ >>>>>>> "environment":{ >>>>>>> "variables":[ >>>>>>> { >>>>>>> "name":"PORT_8000", >>>>>>> "value":"31641" >>>>>>> }, >>>>>>> { >>>>>>> "name":"MARATHON_APP_VERSION", >>>>>>> "value":"2015-10-07T19:35:08.386Z" >>>>>>> }, >>>>>>> { >>>>>>> "name":"HOST", >>>>>>> "value":"mesos-worker1a" >>>>>>> }, >>>>>>> { >>>>>>> "name":"MARATHON_APP_DOCKER_IMAGE", >>>>>>> >>>>>>> "value":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966" >>>>>>> }, >>>>>>> { >>>>>>> "name":"MESOS_TASK_ID", >>>>>>> >>>>>>> "value":"app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0" >>>>>>> }, >>>>>>> { >>>>>>> "name":"PORT", >>>>>>> "value":"31641" >>>>>>> }, >>>>>>> { >>>>>>> "name":"PORTS", >>>>>>> "value":"31641" >>>>>>> }, >>>>>>> { >>>>>>> "name":"MARATHON_APP_ID", >>>>>>> "value":"/app-81-1-hello-app/web-v11" >>>>>>> }, >>>>>>> { >>>>>>> "name":"PORT0", >>>>>>> "value":"31641" >>>>>>> } >>>>>>> ] >>>>>>> }, >>>>>>> "shell":false >>>>>>> }, >>>>>>> "container":{ >>>>>>> "type":"DOCKER", >>>>>>> "docker":{ >>>>>>> >>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-1444240966", >>>>>>> "network":"BRIDGE", >>>>>>> "port_mappings":[ >>>>>>> { >>>>>>> "host_port":31641, >>>>>>> "container_port":8000, >>>>>>> "protocol":"tcp" >>>>>>> } >>>>>>> ], >>>>>>> "privileged":false, >>>>>>> "force_pull_image":false >>>>>>> } >>>>>>> } >>>>>>> } >>>>>> >>>>>> >>>>>> No, I don't see anything about any health check. >>>>>> >>>>>> Mesos STDOUT for the launched task: >>>>>> >>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>> --stop_timeout="0ns" >>>>>>> --container="mesos-20150924-210922-1608624320-5050-1792-S1.14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>> --docker="docker" --help="false" --initialize_driver_logging="true" >>>>>>> --logbufsecs="0" --logging_level="INFO" >>>>>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" >>>>>>> --sandbox_directory="/tmp/mesos/slaves/20150924-210922-1608624320-5050-1792-S1/frameworks/20150821-214332-1407297728-5050-18973-0000/executors/app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0/runs/14335f1f-3774-4862-a55b-e9c76cd0f2da" >>>>>>> --stop_timeout="0ns" >>>>>>> Registered docker executor on mesos-worker1a >>>>>>> Starting task >>>>>>> app-81-1-hello-app_web-v11.84c0f441-6d2a-11e5-98ba-080027477de0 >>>>>> >>>>>> >>>>>> And STDERR: >>>>>> >>>>>> I1007 19:35:08.790743 4612 exec.cpp:134] Version: 0.24.0 >>>>>>> I1007 19:35:08.793416 4619 exec.cpp:208] Executor registered on >>>>>>> slave 20150924-210922-1608624320-5050-1792-S1 >>>>>>> WARNING: Your kernel does not support swap limit capabilities, >>>>>>> memory limited without swap. >>>>>> >>>>>> >>>>>> Again, nothing about any health checks. >>>>>> >>>>>> Any ideas of other things to try or what I could be missing? Can't >>>>>> say either way about the Mesos health-check system working or not if >>>>>> Marathon won't put the health-check into the task it sends to Mesos. >>>>>> >>>>>> Thanks for all your help! >>>>>> >>>>>> Best, >>>>>> Jay >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>> On Tue, Oct 6, 2015 at 11:24 PM, haosdent <haosd...@gmail.com> wrote: >>>>>> >>>>>>> Maybe you could post your executor stdout/stderr so that we could >>>>>>> know whether health check running not. >>>>>>> >>>>>>> On Wed, Oct 7, 2015 at 2:15 PM, haosdent <haosd...@gmail.com> wrote: >>>>>>> >>>>>>>> marathon also use mesos health check. When I use health check, I >>>>>>>> could saw the log like this in executor stdout. >>>>>>>> >>>>>>>> ``` >>>>>>>> Registered docker executor on xxxxx >>>>>>>> Starting task test-health-check.822a5fd2-6cba-11e5-b5ce-0a0027000000 >>>>>>>> Launching health check process: >>>>>>>> /home/haosdent/mesos/build/src/.libs/mesos-health-check --executor=xxxx >>>>>>>> Health check process launched at pid: 9895 >>>>>>>> Received task health update, healthy: true >>>>>>>> ``` >>>>>>>> >>>>>>>> On Wed, Oct 7, 2015 at 12:51 PM, Jay Taylor <outtat...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I am using my own framework, and the full task info I'm using is >>>>>>>>> posted earlier in this thread. Do you happen to know if Marathon uses >>>>>>>>> Mesos's health checks for its health check system? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Oct 6, 2015, at 9:01 PM, haosdent <haosd...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Yes, launch the health task through its definition in taskinfo. Do >>>>>>>>> you launch your task through Marathon? I could test it in my side. >>>>>>>>> >>>>>>>>> On Wed, Oct 7, 2015 at 11:56 AM, Jay Taylor <outtat...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Precisely, and there are none of those statements. Are you or >>>>>>>>>> others confident health-checks are part of the code path when >>>>>>>>>> defined via >>>>>>>>>> task info for docker container tasks? Going through the code, I >>>>>>>>>> wasn't >>>>>>>>>> able to find the linkage for anything other than health-checks >>>>>>>>>> triggered >>>>>>>>>> through a custom executor. >>>>>>>>>> >>>>>>>>>> With that being said it is a pretty good sized code base and I'm >>>>>>>>>> not very familiar with it, so my analysis this far has by no means >>>>>>>>>> been >>>>>>>>>> exhaustive. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Oct 6, 2015, at 8:41 PM, haosdent <haosd...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> When health check launch, it would have a log like this in your >>>>>>>>>> executor stdout >>>>>>>>>> ``` >>>>>>>>>> Health check process launched at pid xxx >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> On Wed, Oct 7, 2015 at 11:37 AM, Jay Taylor <outtat...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I'm happy to try this, however wouldn't there be output in the >>>>>>>>>>> logs with the string "health" or "Health" if the health-check were >>>>>>>>>>> active? >>>>>>>>>>> None of my master or slave logs contain the string.. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Oct 6, 2015, at 7:45 PM, haosdent <haosd...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Could you use "exit 1" instead of "sleep 5" to see whether could >>>>>>>>>>> see unhealthy status in your task stdout/stderr. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 7, 2015 at 10:38 AM, Jay Taylor <outtat...@gmail.com >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> My current version is 0.24.1. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 6, 2015 at 7:30 PM, haosdent <haosd...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> yes, adam also help commit it to 0.23.1 and 0.24.1 >>>>>>>>>>>>> https://github.com/apache/mesos/commit/8c0ed92de3925d4312429bfba01b9b1ccbcbbef0 >>>>>>>>>>>>> >>>>>>>>>>>>> https://github.com/apache/mesos/commit/09e367cd69aa39c156c9326d44f4a7b829ba3db7 >>>>>>>>>>>>> Are you use one of this version? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:26 AM, haosdent <haosd...@gmail.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I remember 0.23.1 and 0.24.1 contains this backport, let me >>>>>>>>>>>>>> double check. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Oct 7, 2015 at 10:01 AM, Jay Taylor < >>>>>>>>>>>>>> outtat...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Oops- Now I see you already said it's in master. I'll look >>>>>>>>>>>>>>> there :) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks again! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:59 PM, Jay Taylor < >>>>>>>>>>>>>>> j...@jaytaylor.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Great, thanks for the quick reply Tim! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Do you know if there is a branch I can checkout to test it >>>>>>>>>>>>>>>> out? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Oct 6, 2015 at 6:54 PM, Timothy Chen < >>>>>>>>>>>>>>>> t...@mesosphere.io> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Jay, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We just added health check support for docker tasks that's >>>>>>>>>>>>>>>>> in master but not yet released. It will run docker exec with >>>>>>>>>>>>>>>>> the command >>>>>>>>>>>>>>>>> you provided as health checks. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It should be in the next release. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Tim >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Oct 6, 2015, at 6:49 PM, Jay Taylor < >>>>>>>>>>>>>>>>> outtat...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does Mesos support health checks for docker image tasks? >>>>>>>>>>>>>>>>> Mesos seems to be ignoring the TaskInfo.HealthCheck field for >>>>>>>>>>>>>>>>> me. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Example TaskInfo JSON received back from Mesos: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "name":"hello-app.web.v3", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "task_id":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "value":"hello-app_web-v3.fc05a1a5-1e06-4e61-9879-be0d97cd3eec" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "slave_id":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "value":"20150924-210922-1608624320-5050-1792-S1" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "resources":[ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "name":"cpus", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "value":0.1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "name":"mem", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "type":0, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "scalar":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "value":256 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "name":"ports", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "ranges":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "range":[ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "begin":31002, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "end":31002 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/test/app-81-1-hello-app-103" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "shell":false >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "container":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "type":1, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "docker":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "image":"docker-services1a:5000/gig1/app-81-1-hello-app-103", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "network":2, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "port_mappings":[ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "host_port":31002, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "container_port":8000, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "protocol":"tcp" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "privileged":false, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "parameters":[], >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "force_pull_image":false >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> }, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "health_check":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "delay_seconds":5, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "interval_seconds":10, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "timeout_seconds":10, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "consecutive_failures":3, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "grace_period_seconds":0, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "command":{ >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "shell":true, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "value":"sleep 5", >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "user":"root" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have searched all machines and containers to see if they >>>>>>>>>>>>>>>>> ever run the command (in this case `sleep 5`), but have not >>>>>>>>>>>>>>>>> found any >>>>>>>>>>>>>>>>> indication that it is being executed. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In the mesos src code the health-checks are invoked from >>>>>>>>>>>>>>>>> src/launcher/executor.cpp CommandExecutorProcess::launchTask. >>>>>>>>>>>>>>>>> Does this >>>>>>>>>>>>>>>>> mean that health-checks are only supported for custom >>>>>>>>>>>>>>>>> executors and not for >>>>>>>>>>>>>>>>> docker tasks? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What I am trying to accomplish is to have the 0/non-zero >>>>>>>>>>>>>>>>> exit-status of a health-check command translate to task >>>>>>>>>>>>>>>>> health. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>> Haosdent Huang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Best Regards, >>>>>>>>>>> Haosdent Huang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best Regards, >>>>>>>>>> Haosdent Huang >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best Regards, >>>>>>>>> Haosdent Huang >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Haosdent Huang >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Haosdent Huang >>> >> >> > > > -- > Best Regards, > Haosdent Huang > > -- Best Regards, Haosdent Huang