Ian Downes created MESOS-1149: --------------------------------- Summary: SlaveRecovery.Reboot test doesn't reap executor Key: MESOS-1149 URL: https://issues.apache.org/jira/browse/MESOS-1149 Project: Mesos Issue Type: Bug Components: test Affects Versions: 0.18.0 Reporter: Ian Downes
The executor pid should be reaped after the slave is "rebooted" and before the next slave is started to correctly simulate a host reboot, otherwise there's a race and it may be present (as a zombie) when the test completes. {noformat} # ./bin/mesos-tests.sh --gtest_filter="SlaveRecovery*Reboot" --gtest_repeat=100 --gtest_break_on_failure=1 Source directory: /home/idownes/workspace/mesos Build directory: /home/idownes/workspace/mesos/build ------------------------------------------------------------- We cannot run any cgroups tests that require mounting hierarchies because you have the following hierarchies mounted: /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, /sys/fs/cgroup/freezer, /sys/fs/cgroup/memory We'll disable the CgroupsNoHierarchyTest test fixture for now. ------------------------------------------------------------- Repeating all tests (iteration 1) . . . Note: Google Test filter = SlaveRecovery*Reboot-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy: [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from SlaveRecoveryTest/0, where TypeParam = mesos::internal::slave::MesosContainerizer [ RUN ] SlaveRecoveryTest/0.Reboot WARNING: Logging before InitGoogleLogging() is written to STDERR I0326 22:44:23.032676 34814 exec.cpp:131] Version: 0.19.0 I0326 22:44:23.035573 34835 exec.cpp:205] Executor registered on slave 20140326-224421-1828659978-41066-34759-0 Registered executor on smfd-atr-11-sr1.devel.twitter.com Starting task f503d996-3e82-43f6-861b-38bacd5e4855 sh -c 'sleep 1000' Forked command at 34854 I0326 22:44:23.263057 34852 exec.cpp:378] Executor asked to shutdown Shutting down Killing process tree at pid 34854 Killed the following process trees: [ --- 34854 sleep 1000 ] [ OK ] SlaveRecoveryTest/0.Reboot (1997 ms) [----------] 1 test from SlaveRecoveryTest/0 (1998 ms total) [----------] Global test environment tear-down ../../src/tests/environment.cpp:244: Failure Failed Tests completed with child processes remaining: -+- 34759 /home/idownes/workspace/mesos/build/src/.libs/lt-mesos-tests --gtest_filter=SlaveRecovery*Reboot --gtest_repeat=100 --gtest_break_on_failure=1 \--- 34814 () {noformat} 34814 () is the zombied executor. -- This message was sent by Atlassian JIRA (v6.2#6252)