James Peach created MESOS-9361: ---------------------------------- Summary: CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively always fails. Key: MESOS-9361 URL: https://issues.apache.org/jira/browse/MESOS-9361 Project: Mesos Issue Type: Bug Components: flaky, test Reporter: James Peach
On Fedora 28: {noformat} [ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively I1029 09:38:31.866564 31397 cgroups.cpp:2838] Freezing cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce I1029 09:38:31.867048 31398 cgroups.cpp:1229] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 359936ns I1029 09:38:31.869033 31397 cgroups.cpp:2856] Thawing cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce I1029 09:38:31.869357 31403 cgroups.cpp:1258] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos_test_62e0c540-832e-4601-8658-7faa25c427ce after 261888ns I1029 09:38:31.884752 31382 cluster.cpp:173] Creating default 'local' authorizer I1029 09:38:31.892966 31397 master.cpp:413] Master 0b04a175-fe62-41a1-a387-8d679d1d9609 (jpeach.scv.apple.com) started on 17.228.8.72:42153 I1029 09:38:31.892992 31397 master.cpp:416] Flags at startup: --acls="" --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="hierarchical" --authenticate_agents="true" --authenticate_frameworks="true" --authenticate_http_frameworks="true" --authenticate_http_readonly="true" --authenticate_http_readwrite="true" --authentication_v0_timeout="15secs" --authenticators="crammd5" --authorizers="local" --credentials="/tmp/mFB69h/credentials" --filter_gpu_resources="true" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_framework_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --memory_profiling="false" --min_allocatable_resources="cpus:0.01|mem:32" --port="5050" --publish_per_framework_metrics="true" --quiet="false" --recovery_agent_removal_limit="100%" --registry="in_memory" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="100secs" --registry_strict="false" --require_agent_domain="false" --role_sorter="drf" --root_submissions="true" --version="false" --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/mFB69h/master" --zk_session_timeout="10secs" I1029 09:38:31.893931 31397 master.cpp:465] Master only allowing authenticated frameworks to register I1029 09:38:31.893942 31397 master.cpp:471] Master only allowing authenticated agents to register I1029 09:38:31.893951 31397 master.cpp:477] Master only allowing authenticated HTTP frameworks to register I1029 09:38:31.893962 31397 credentials.hpp:37] Loading credentials for authentication from '/tmp/mFB69h/credentials' I1029 09:38:31.894204 31397 master.cpp:521] Using default 'crammd5' authenticator I1029 09:38:31.894359 31397 authenticator.cpp:520] Initializing server SASL I1029 09:38:31.898878 31397 auxprop.cpp:73] Initialized in-memory auxiliary property plugin I1029 09:38:31.898983 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readonly' I1029 09:38:31.899279 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-readwrite' I1029 09:38:31.899395 31397 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-master-scheduler' I1029 09:38:31.899507 31397 master.cpp:602] Authorization enabled I1029 09:38:31.900339 31406 whitelist_watcher.cpp:77] No whitelist given I1029 09:38:31.900434 31400 hierarchical.cpp:175] Initialized hierarchical allocator process I1029 09:38:31.908254 31403 master.cpp:2105] Elected as the leading master! I1029 09:38:31.908313 31403 master.cpp:1660] Recovering from registrar I1029 09:38:31.908717 31404 registrar.cpp:339] Recovering registrar I1029 09:38:31.910310 31400 registrar.cpp:383] Successfully fetched the registry (0B) in 1.547776ms I1029 09:38:31.910684 31400 registrar.cpp:487] Applied 1 operations in 150793ns; attempting to update the registry I1029 09:38:31.913811 31400 registrar.cpp:544] Successfully updated the registry in 2.979072ms I1029 09:38:31.914028 31400 registrar.cpp:416] Successfully recovered registrar I1029 09:38:31.914872 31398 master.cpp:1774] Recovered 0 agents from the registry (154B); allowing 10mins for agents to reregister I1029 09:38:31.914912 31406 hierarchical.cpp:215] Skipping recovery of hierarchical allocator: nothing to recover I1029 09:38:31.920753 31382 containerizer.cpp:305] Using isolation { network/cni, filesystem/posix, environment_secret, cgroups/mem } I1029 09:38:31.926185 31382 linux_launcher.cpp:144] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher I1029 09:38:31.927129 31382 provisioner.cpp:298] Using default backend 'overlay' W1029 09:38:31.942937 31382 process.cpp:2829] Attempted to spawn already running process files@17.228.8.72:42153 I1029 09:38:31.943821 31382 cluster.cpp:485] Creating default 'local' authorizer I1029 09:38:31.946377 31402 slave.cpp:267] Mesos agent started on (1)@17.228.8.72:42153 I1029 09:38:31.946439 31402 slave.cpp:268] Flags at startup: --acls="" --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/appc" --authenticate_http_readonly="true" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authentication_timeout_max="1mins" --authentication_timeout_min="5secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_destroy_timeout="1mins" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270" --container_disk_watch_interval="15secs" --containerizers="mesos" --credential="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential" --default_role="*" --disallow_sharing_agent_pid_namespace="false" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_reregistration_timeout="2secs" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/fetch" --fetcher_cache_size="2GB" --fetcher_stall_timeout="1mins" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --gc_non_executor_container_sandboxes="false" --help="false" --hostname_lookup="true" --http_command_executor="false" --http_credentials="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="cgroups/mem" --launcher="linux" --launcher_dir="/home/jpeach/upstream/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --memory_profiling="false" --network_cni_metrics="true" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --reconfiguration_policy="equal" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="10ms" --resources="cpus:2;gpus:0;mem:1024;disk:1024;ports:[31000-32000]" --revocable_cpu_low_priority="true" --runtime_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x" --xfs_kill_containers="false" --xfs_project_range="[5000-10000]" --zk_session_timeout="10secs" I1029 09:38:31.946923 31402 credentials.hpp:86] Loading credential for authentication from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/credential' I1029 09:38:31.947036 31402 slave.cpp:300] Agent using credential for: test-principal I1029 09:38:31.947049 31402 credentials.hpp:37] Loading credentials for authentication from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_VQQM0N/http_credentials' I1029 09:38:31.947134 31402 http.cpp:1038] Creating default 'basic' HTTP authenticator for realm 'mesos-agent-readonly' I1029 09:38:31.947352 31402 disk_profile_adaptor.cpp:80] Creating default disk profile adaptor module I1029 09:38:31.949756 31402 slave.cpp:615] Agent resources: [{"name":"cpus","scalar":{"value":2.0},"type":"SCALAR"},{"name":"mem","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"disk","scalar":{"value":1024.0},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"type":"RANGES"}] I1029 09:38:31.949865 31402 slave.cpp:623] Agent attributes: [ ] I1029 09:38:31.949882 31402 slave.cpp:632] Agent hostname: jpeach.scv.apple.com I1029 09:38:31.950031 31401 task_status_update_manager.cpp:181] Pausing sending task status updates I1029 09:38:31.951694 31403 state.cpp:66] Recovering state from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta' I1029 09:38:31.951923 31402 slave.cpp:6915] Finished recovering checkpointed state from '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta', beginning agent recovery I1029 09:38:31.952055 31404 task_status_update_manager.cpp:207] Recovering task status update manager I1029 09:38:31.952318 31405 containerizer.cpp:727] Recovering Mesos containers I1029 09:38:31.952589 31401 linux_launcher.cpp:286] Recovering Linux launcher I1029 09:38:31.953091 31396 containerizer.cpp:1053] Recovering isolators E1029 09:38:31.954258 31405 slave.cpp:7275] EXIT with status 1: Failed to perform recovery: Collect failed: Collect failed: Failed to list cgroups under '/sys/fs/cgroup/memory': Failed to determine canonical path of '/sys/fs/cgroup/memory/mesos_test_0ace7d1c-d155-43db-b4f8-df1d91ce4270': No such file or directory If recovery failed due to a change in configuration and you want to keep the current agent id, you might want to change the `--reconfiguration_policy` flag to a more permissive value. To restart this agent with a new agent id instead, do as follows: rm -f /tmp/CgroupsIsolatorTest_ROOT_CGROUPS_CreateRecursively_BEPq5x/meta/slaves/latest This ensures that the agent does not recover old live executors. If you use the Docker containerizer and think that the Docker daemon state is broken, you can try to clear it. But be careful: these commands will erase all containers and images from this host, not just those started by Mesos! docker kill $(docker ps -q) docker rm $(docker ps -a -q) docker rmi $(docker images -q) Finally, restart the agent. ../../3rdparty/libprocess/include/process/gmock.hpp:247: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x56491dd05de8. ../../src/tests/mock_registrar.cpp:54: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x56491e267c00. ../../src/tests/containerizer/cgroups_isolator_tests.cpp:737: ERROR: this mock object (used in test CgroupsIsolatorTest.ROOT_CGROUPS_CreateRecursively) should be deleted but never is. Its address is @0x7ffca24acd10. ERROR: 3 leaked mock objects found at program exit. {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)