[ https://issues.apache.org/jira/browse/MESOS-7517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013304#comment-16013304 ]
Neil Conway commented on MESOS-7517: ------------------------------------ cc [~bmahler] > HealthCheckTest.ConsecutiveFailures is flaky > -------------------------------------------- > > Key: MESOS-7517 > URL: https://issues.apache.org/jira/browse/MESOS-7517 > Project: Mesos > Issue Type: Bug > Reporter: Neil Conway > Labels: mesosphere > > {noformat} > [ RUN ] HealthCheckTest.ConsecutiveFailures > I0516 17:12:44.380421 28941 cluster.cpp:162] Creating default 'local' > authorizer > I0516 17:12:44.389566 28996 master.cpp:436] Master > 2b745611-28cc-491b-80ea-2b6e94a9cab8 (core-dev) started on 10.0.49.2:37598 > I0516 17:12:44.389619 28996 master.cpp:438] Flags at startup: --acls="" > --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" > --allocation_interval="1secs" --allocator="HierarchicalDRF" > --authenticate_agents="true" --authenticate_frameworks="true" > --authenticate_http_frameworks="true" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticators="crammd5" > --authorizers="local" --credentials="/tmp/kYELQI/credentials" > --framework_sorter="drf" --help="false" --hostname_lookup="true" > --http_authenticators="basic" --http_framework_authenticators="basic" > --initialize_driver_logging="true" --log_auto_initialize="true" > --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" > --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" > --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" > --recovery_agent_removal_limit="100%" --registry="in_memory" > --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" > --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" > --registry_store_timeout="100secs" --registry_strict="false" > --root_submissions="true" --user_sorter="drf" --version="false" > --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/kYELQI/master" > --zk_session_timeout="10secs" > I0516 17:12:44.389943 28996 master.cpp:488] Master only allowing > authenticated frameworks to register > I0516 17:12:44.389971 28996 master.cpp:502] Master only allowing > authenticated agents to register > I0516 17:12:44.389988 28996 master.cpp:515] Master only allowing > authenticated HTTP frameworks to register > I0516 17:12:44.390012 28996 credentials.hpp:37] Loading credentials for > authentication from '/tmp/kYELQI/credentials' > I0516 17:12:44.390353 28996 master.cpp:560] Using default 'crammd5' > authenticator > I0516 17:12:44.390504 28996 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readonly' > I0516 17:12:44.390661 28996 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-readwrite' > I0516 17:12:44.390993 28996 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-master-scheduler' > I0516 17:12:44.391158 28996 master.cpp:640] Authorization enabled > I0516 17:12:44.393784 28958 master.cpp:2161] Elected as the leading master! > I0516 17:12:44.393831 28958 master.cpp:1700] Recovering from registrar > I0516 17:12:44.394521 28969 registrar.cpp:389] Successfully fetched the > registry (0B) in 536064ns > I0516 17:12:44.394621 28969 registrar.cpp:493] Applied 1 operations in > 16653ns; attempting to update the registry > I0516 17:12:44.395346 28969 registrar.cpp:550] Successfully updated the > registry in 664832ns > I0516 17:12:44.395448 28969 registrar.cpp:422] Successfully recovered > registrar > I0516 17:12:44.395992 28958 master.cpp:1799] Recovered 0 agents from the > registry (119B); allowing 10mins for agents to re-register > I0516 17:12:44.404881 28941 containerizer.cpp:221] Using isolation: > posix/cpu,posix/mem,filesystem/posix,network/cni > W0516 17:12:44.405333 28941 backend.cpp:76] Failed to create 'overlay' > backend: OverlayBackend requires root privileges > W0516 17:12:44.405426 28941 backend.cpp:76] Failed to create 'bind' backend: > BindBackend requires root privileges > I0516 17:12:44.405462 28941 provisioner.cpp:249] Using default backend 'copy' > I0516 17:12:44.406657 28941 cluster.cpp:448] Creating default 'local' > authorizer > I0516 17:12:44.407929 28989 slave.cpp:225] Mesos agent started on > (203)@10.0.49.2:37598 > I0516 17:12:44.407973 28989 slave.cpp:226] Flags at startup: --acls="" > --appc_simple_discovery_uri_prefix="http://" > --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="true" > --authenticate_http_readwrite="true" --authenticatee="crammd5" > --authentication_backoff_factor="1secs" --authorizer="local" > --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" > --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" > --cgroups_root="mesos" --container_disk_watch_interval="15secs" > --containerizers="mesos" > --credential="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential" > --default_role="*" --disk_watch_interval="1mins" --docker="docker" > --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" > --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" > --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" > --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" > --enforce_container_disk_quota="false" > --executor_registration_timeout="1mins" > --executor_shutdown_grace_period="5secs" > --fetcher_cache_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/fetch" > --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" > --gc_disk_headroom="0.1" --hadoop_home="" --help="false" > --hostname_lookup="true" --http_command_executor="false" > --http_credentials="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials" > --http_heartbeat_interval="30secs" --initialize_driver_logging="true" > --isolation="posix/cpu,posix/mem" --launcher="posix" > --launcher_dir="/home/nrc/build-mesos-default-opts/src" --logbufsecs="0" > --logging_level="INFO" --max_completed_executors_per_framework="150" > --oversubscribed_resources_interval="15secs" --perf_duration="10secs" > --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" > --quiet="false" --recover="reconnect" --recovery_timeout="15mins" > --registration_backoff_factor="10ms" > --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" > --revocable_cpu_low_priority="true" > --runtime_dir="/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH" > --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" > --systemd_enable_support="true" > --systemd_runtime_directory="/run/systemd/system" --version="false" > --work_dir="/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod" > I0516 17:12:44.408372 28989 credentials.hpp:86] Loading credential for > authentication from > '/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/credential' > I0516 17:12:44.408543 28989 slave.cpp:258] Agent using credential for: > test-principal > I0516 17:12:44.408593 28989 credentials.hpp:37] Loading credentials for > authentication from > '/tmp/HealthCheckTest_ConsecutiveFailures_wNp6VH/http_credentials' > I0516 17:12:44.408852 28989 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-agent-readonly' > I0516 17:12:44.409008 28989 http.cpp:975] Creating default 'basic' HTTP > authenticator for realm 'mesos-agent-readwrite' > I0516 17:12:44.414839 28989 slave.cpp:529] Agent resources: cpus(*):2; > mem(*):1024; disk(*):1024; ports(*):[31000-32000] > I0516 17:12:44.414953 28989 slave.cpp:537] Agent attributes: [ ] > I0516 17:12:44.414980 28989 slave.cpp:542] Agent hostname: core-dev > I0516 17:12:44.415108 28961 status_update_manager.cpp:177] Pausing sending > status updates > I0516 17:12:44.416466 28961 state.cpp:62] Recovering state from > '/tmp/HealthCheckTest_ConsecutiveFailures_WXsqod/meta' > I0516 17:12:44.416718 28958 status_update_manager.cpp:203] Recovering status > update manager > I0516 17:12:44.417064 28960 containerizer.cpp:608] Recovering containerizer > I0516 17:12:44.419234 28976 provisioner.cpp:410] Provisioner recovery complete > I0516 17:12:44.419749 28986 slave.cpp:5974] Finished recovery > I0516 17:12:44.420372 28998 status_update_manager.cpp:177] Pausing sending > status updates > I0516 17:12:44.420370 28986 slave.cpp:922] New master detected at > master@10.0.49.2:37598 > I0516 17:12:44.420516 28986 slave.cpp:957] Detecting new master > I0516 17:12:44.424572 28941 sched.cpp:232] Version: 1.4.0 > I0516 17:12:44.425042 28995 sched.cpp:336] New master detected at > master@10.0.49.2:37598 > I0516 17:12:44.425138 28995 sched.cpp:407] Authenticating with master > master@10.0.49.2:37598 > I0516 17:12:44.425168 28995 sched.cpp:414] Using default CRAM-MD5 > authenticatee > I0516 17:12:44.425364 28958 authenticatee.cpp:121] Creating new client SASL > connection > I0516 17:12:44.429754 28999 slave.cpp:984] Authenticating with master > master@10.0.49.2:37598 > I0516 17:12:44.429811 28999 slave.cpp:995] Using default CRAM-MD5 > authenticatee > I0516 17:12:44.429942 28955 authenticatee.cpp:121] Creating new client SASL > connection > I0516 17:12:44.437100 28984 master.cpp:7475] Authenticating > slave(203)@10.0.49.2:37598 > I0516 17:12:44.437371 28965 authenticator.cpp:98] Creating new server SASL > connection > W0516 17:12:49.426436 28956 sched.cpp:537] Authentication timed out > W0516 17:12:49.430752 28985 slave.cpp:1098] Authentication timed out > W0516 17:12:49.431509 28973 slave.cpp:1043] Failed to authenticate with > master master@10.0.49.2:37598: Authentication discarded > W0516 17:12:49.437960 29000 master.cpp:7522] Authentication timed out > I0516 17:12:49.442778 28996 master.cpp:7475] Authenticating > scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598 > I0516 17:12:49.443080 28995 authenticator.cpp:98] Creating new server SASL > connection > I0516 17:12:49.443548 28966 sched.cpp:477] Failed to authenticate with master > master@10.0.49.2:37598: Authentication discarded > W0516 17:12:49.449880 28964 master.cpp:7502] Failed to authenticate > scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598: Failed to > communicate with authenticatee > I0516 17:12:49.888478 29000 slave.cpp:984] Authenticating with master > master@10.0.49.2:37598 > I0516 17:12:49.888593 29000 slave.cpp:995] Using default CRAM-MD5 > authenticatee > I0516 17:12:49.888759 28995 authenticatee.cpp:121] Creating new client SASL > connection > I0516 17:12:49.896517 28995 master.cpp:7461] Queuing up authentication > request from slave(203)@10.0.49.2:37598 because authentication is still in > progress > I0516 17:12:51.343961 28977 sched.cpp:407] Authenticating with master > master@10.0.49.2:37598 > I0516 17:12:51.344002 28977 sched.cpp:414] Using default CRAM-MD5 > authenticatee > I0516 17:12:51.344451 29000 authenticatee.cpp:121] Creating new client SASL > connection > I0516 17:12:51.373108 29001 master.cpp:7475] Authenticating > scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598 > I0516 17:12:51.373463 28975 authenticator.cpp:98] Creating new server SASL > connection > I0516 17:12:51.415412 28957 authenticatee.cpp:213] Received SASL > authentication mechanisms: CRAM-MD5 > I0516 17:12:51.415469 28957 authenticatee.cpp:239] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0516 17:12:51.415738 28978 authenticator.cpp:204] Received SASL > authentication start > I0516 17:12:51.415832 28978 authenticator.cpp:326] Authentication requires > more steps > I0516 17:12:51.415956 28969 authenticatee.cpp:259] Received SASL > authentication step > I0516 17:12:51.416134 28996 authenticator.cpp:232] Received SASL > authentication step > I0516 17:12:51.416249 28996 authenticator.cpp:318] Authentication success > I0516 17:12:51.416415 28970 master.cpp:7505] Successfully authenticated > principal 'test-principal' at > scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598 > I0516 17:12:51.416525 28964 authenticatee.cpp:299] Authentication success > I0516 17:12:51.416913 28980 sched.cpp:513] Successfully authenticated with > master master@10.0.49.2:37598 > I0516 17:12:51.417172 28987 master.cpp:2813] Received SUBSCRIBE call for > framework 'default' at > scheduler-ea29d6eb-1bfe-4cc9-b196-e3535ed6dd4e@10.0.49.2:37598 > I0516 17:12:51.417279 28987 master.cpp:2197] Authorizing framework principal > 'test-principal' to receive offers for roles '{ * }' > I0516 17:12:51.417778 29001 master.cpp:2890] Subscribing framework default > with checkpointing disabled and capabilities [ ] > I0516 17:12:51.418303 29002 sched.cpp:759] Framework registered with > 2b745611-28cc-491b-80ea-2b6e94a9cab8-0000 > I0516 17:12:51.418393 28958 hierarchical.cpp:273] Added framework > 2b745611-28cc-491b-80ea-2b6e94a9cab8-0000 > W0516 17:12:54.888931 28985 slave.cpp:1098] Authentication timed out > W0516 17:12:54.889354 28985 slave.cpp:1043] Failed to authenticate with > master master@10.0.49.2:37598: Authentication discarded > I0516 17:12:55.118023 28973 slave.cpp:984] Authenticating with master > master@10.0.49.2:37598 > I0516 17:12:55.118098 28973 slave.cpp:995] Using default CRAM-MD5 > authenticatee > I0516 17:12:55.118614 28967 authenticatee.cpp:121] Creating new client SASL > connection > ../../mesos/src/tests/health_check_tests.cpp:957: Failure > Failed to wait 15secs for offers > *** Aborted at 1494979979 (unix time) try "date -d @1494979979" if you are > using GNU date *** > PC: @ 0x2011328 testing::UnitTest::AddTestPartResult() > *** SIGSEGV (@0x0) received by PID 28941 (TID 0x7f3981a4a8c0) from PID 0; > stack trace: *** > @ 0x7f3978acc370 (unknown) > W0516 17:12:59.454641 28978 master.cpp:7502] Failed to authenticate > slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee > I0516 17:12:59.454766 28978 master.cpp:7475] Authenticating > slave(203)@10.0.49.2:37598 > W0516 17:12:59.455497 28958 master.cpp:7502] Failed to authenticate > slave(203)@10.0.49.2:37598: Failed to communicate with authenticatee > @ 0x2011328 testing::UnitTest::AddTestPartResult() > @ 0x2004467 testing::internal::AssertHelper::operator=() > @ 0x11ca5d0 > mesos::internal::tests::HealthCheckTest_ConsecutiveFailures_Test::TestBody() > @ 0x2030820 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x202ae80 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x200b04d testing::Test::Run() > @ 0x200b866 testing::TestInfo::Run() > @ 0x200beac testing::TestCase::Run() > @ 0x2012800 testing::internal::UnitTestImpl::RunAllTests() > @ 0x2031445 > testing::internal::HandleSehExceptionsInMethodIfSupported<>() > @ 0x202b9fe > testing::internal::HandleExceptionsInMethodIfSupported<>() > @ 0x2011546 testing::UnitTest::Run() > @ 0x138ca1b RUN_ALL_TESTS() > @ 0x138c4ec main > @ 0x7f39778dab35 __libc_start_main > @ 0xb0a049 (unknown) > zsh: segmentation fault (core dumped) ./src/mesos-tests > --gtest_filter="HealthCheckTest.ConsecutiveFailures" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)