Benno Evers created MESOS-9657: ---------------------------------- Summary: Launching a command task twice can crash the agent Key: MESOS-9657 URL: https://issues.apache.org/jira/browse/MESOS-9657 Project: Mesos Issue Type: Bug Reporter: Benno Evers
When launching a command task, we verify that the framework has no existing executor for that task: {noformat} // We are dealing with command task; a new command executor will be // launched. CHECK(executor == nullptr); {noformat} and afterwards an executor is created with the same executor id as the task id: {noformat} // (slave.cpp) // Either the master explicitly requests launching a new executor // or we are in the legacy case of launching one if there wasn't // one already. Either way, let's launch executor now. if (executor == nullptr) { Try<Executor*> added = framework->addExecutor(executorInfo); [...] {noformat} This means that if we relaunch the task with the same task id before the executor is removed, it will crash the agent: {noformat} F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr *** Check failure stack trace: *** @ 0x7feb29a407af google::LogMessage::Flush() @ 0x7feb29a43c3f google::LogMessageFatal::~LogMessageFatal() @ 0x7feb28a5a886 mesos::internal::slave::Slave::__run() @ 0x7feb28af4f0e _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEEEEEclEOS3_ @ 0x7feb2998a620 process::ProcessBase::consume() @ 0x7feb29987675 process::ProcessManager::resume() @ 0x7feb299a2d2b _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8EEEEE6_M_runEv @ 0x7feb2632f523 (unknown) @ 0x7feb25e40594 start_thread @ 0x7feb25b73e6f __GI___clone Aborted (core dumped) {noformat} Instead of crashing, the agent should just drop the task with an appropriate error in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)