Hello,I deploy mesos on centos,kernel is 3.14.73,mesos version 1.0.0,this is my master config: export MESOS_log_dir=/apps/mesos/logs/ export MESOS_ip=0.0.0.0 export MESOS_hostname=`hostname` export MESOS_logging_level=INFO export MESOS_quorum=2 export MESOS_work_dir=/apps/mesos/master export MESOS_zk=zk://zk1:2181,zk2:2181,zk3:2181/oss-mesos export MESOS_allocator=HierarchicalDRF export MESOS_cluster=oss-mesos export MESOS_credentials=/apps/mesos/etc/mesos/credentials.txt export MESOS_registry=replicated_log export MESOS_webui_dir=/apps/mesos/share/mesos/webui export MESOS_zk_session_timeout=90secs export MESOS_max_executors_per_slave=10 export MESOS_registry_fetch_timeout=2mins
I start two master node : but master nodes will crash in a few minute the log message is I0719 11:50:22.673280 5376 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (287)@10.10.186.76:5050 I0719 11:50:23.154119 5381 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (504)@10.10.179.252:5050 I0719 11:50:23.154749 5376 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:23.156838 5378 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:23.563072 5382 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (289)@10.10.186.76:5050 I0719 11:50:23.883855 5376 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (507)@10.10.179.252:5050 I0719 11:50:23.884414 5380 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:23.886569 5375 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:24.163056 5379 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (291)@10.10.186.76:5050 I0719 11:50:24.425379 5378 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (510)@10.10.179.252:5050 I0719 11:50:24.425864 5379 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:24.428951 5375 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:24.935673 5379 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (293)@10.10.186.76:5050 F0719 11:50:25.262277 5381 master.cpp:1662] Recovery failed: Failed to recover registrar: Failed to perform fetch within 2mins *** Check failure stack trace: *** @ 0x7fe6fa0ac37c google::LogMessage::Fail() @ 0x7fe6fa0ac2d8 google::LogMessage::SendToLog() @ 0x7fe6fa0abcce google::LogMessage::Flush() @ 0x7fe6fa0aea88 google::LogMessageFatal::~LogMessageFatal() @ 0x7fe6f900a64c mesos::internal::master::fail() @ 0x7fe6f90deffb _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEE6__callIvJS1_EJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE @ 0x7fe6f90b98df _ZNSt5_BindIFPFvRKSsS1_EPKcSt12_PlaceholderILi1EEEEclIJS1_EvEET0_DpOT_ @ 0x7fe6f9086783 _ZZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvRKSsS6_EPKcSt12_PlaceholderILi1EEEEvEERKS2_OT_NS2_6PreferEENUlS6_E_clES6_ @ 0x7fe6f90df0cd _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureI7NothingE8onFailedISt5_BindIFPFvS1_S1_EPKcSt12_PlaceholderILi1EEEEvEERKS6_OT_NS6_6PreferEEUlS1_E _E9_M_invokeERKSt9_Any_dataS1_ @ 0x4a4833 std::function<>::operator()() @ 0x49f0eb _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_ @ 0x4997c2 process::Future<>::fail() @ 0x7fe6f8ccfa22 process::Promise<>::fail() @ 0x7fe6f90dc4f0 process::internal::thenf<>() @ 0x7fe6f9120bd9 _ZNSt5_BindIFPFvRKSt8functionIFN7process6FutureI7NothingEERKN5mesos8internal8RegistryEEERKSt10shared_ptrINS1_7PromiseIS3_EEERKNS2_IS7_EEESB_SH_St12 _PlaceholderILi1EEEE6__callIvISM_EILm0ELm1ELm2EEEET_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE @ 0x7fe6f91178cd std::_Bind<>::operator()<>() @ 0x7fe6f90fe821 std::_Function_handler<>::_M_invoke() @ 0x7fe6f9117aff std::function<>::operator()() @ 0x7fe6f90fe955 _ZZNK7process6FutureIN5mesos8internal8RegistryEE5onAnyIRSt8functionIFvRKS4_EEvEES8_OT_NS4_6PreferEENUlS8_E_clES8_ @ 0x7fe6f9120c85 _ZNSt17_Function_handlerIFvRKN7process6FutureIN5mesos8internal8RegistryEEEEZNKS5_5onAnyIRSt8functionIS8_EvEES7_OT_NS5_6PreferEEUlS7_E_E9_M_invokeER KSt9_Any_dataS7_ @ 0x7fe6f9117aff std::function<>::operator()() @ 0x7fe6f91807c4 process::internal::run<>() @ 0x7fe6f9176ef4 process::Future<>::fail() @ 0x7fe6f91b12de std::_Mem_fn<>::operator()<>() I0719 11:50:25.414069 5382 replica.cpp:673] Replica in EMPTY status received a broadcasted recover request from (513)@10.10.179.252:5050 I0719 11:50:25.414718 5376 recover.cpp:197] Received a recover response from a replica in EMPTY status @ 0x7fe6f91ac6c7 _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEE6__callIbIS8_EILm0ELm1EEEET_OSt5tupleIIDpT 0_EESt12_Index_tupleIIXspT1_EEE I0719 11:50:25.416431 5377 recover.cpp:197] Received a recover response from a replica in EMPTY status I0719 11:50:25.418115 5379 http.cpp:381] HTTP GET for /master/state from 10.10.159.106:3363 with User-Agent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36' @ 0x7fe6f91a4d23 _ZNSt5_BindIFSt7_Mem_fnIMN7process6FutureIN5mesos8internal8RegistryEEEFbRKSsEES6_St12_PlaceholderILi1EEEEclIJS8_EbEET0_DpOT_ @ 0x7fe6f919ac63 _ZZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS4_FbRKSsEES4_St12_PlaceholderILi1EEEEbEERKS4_OT_NS4_6PreferEENUlS9 _E_clES9_ @ 0x7fe6f91ac752 _ZNSt17_Function_handlerIFvRKSsEZNK7process6FutureIN5mesos8internal8RegistryEE8onFailedISt5_BindIFSt7_Mem_fnIMS8_FbS1_EES8_St12_PlaceholderILi1EEEE bEERKS8_OT_NS8_6PreferEEUlS1_E_E9_M_invokeERKSt9_Any_dataS1_ @ 0x4a4833 std::function<>::operator()() @ 0x49f0eb _ZN7process8internal3runISt8functionIFvRKSsEEJS4_EEEvRKSt6vectorIT_SaIS8_EEDpOT0_ @ 0x7fe6f9176ecc process::Future<>::fail() @ 0x7fe6f916feac process::Promise<>::fail() error logs is Log file created at: 2016/07/19 11:50:25 Running on machine: oss-mesos-master-bjc-001 Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg F0719 11:50:25.262277 5381 master.cpp:1662] Recovery failed: Failed to recover registrar: Failed to perform fetch within 2mins can you help me.thanks