Benno Evers created MESOS-9657:
----------------------------------

             Summary: Launching a command task twice can crash the agent
                 Key: MESOS-9657
                 URL: https://issues.apache.org/jira/browse/MESOS-9657
             Project: Mesos
          Issue Type: Bug
            Reporter: Benno Evers


When launching a command task, we verify that the framework has no existing 
executor for that task:
{noformat}
      // We are dealing with command task; a new command executor will be
      // launched.
      CHECK(executor == nullptr);
{noformat}
and afterwards an executor is created with the same executor id as the task id:
{noformat}
  // (slave.cpp)
  // Either the master explicitly requests launching a new executor
  // or we are in the legacy case of launching one if there wasn't
  // one already. Either way, let's launch executor now.
  if (executor == nullptr) {
    Try<Executor*> added = framework->addExecutor(executorInfo);
  [...]
{noformat}

This means that if we relaunch the task with the same task id before the 
executor is removed, it will crash the agent:
{noformat}
F0315 16:39:32.822818 38112 slave.cpp:2865] Check failed: executor == nullptr 
*** Check failure stack trace: ***
    @     0x7feb29a407af  google::LogMessage::Flush()
    @     0x7feb29a43c3f  google::LogMessageFatal::~LogMessageFatal()
    @     0x7feb28a5a886  mesos::internal::slave::Slave::__run()
    @     0x7feb28af4f0e  
_ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchIN5mesos8internal5slave5SlaveERKNSA_13FrameworkInfoERKNSA_12ExecutorInfoERK6OptionINSA_8TaskInfoEERKSK_INSA_13TaskGroupInfoEERKSt6vectorINSB_19ResourceVersionUUIDESaISU_EERKSK_IbESG_SJ_SO_SS_SY_S11_EEvRKNS1_3PIDIT_EEMS13_FvT0_T1_T2_T3_T4_T5_EOT6_OT7_OT8_OT9_OT10_OT11_EUlOSE_OSH_OSM_OSQ_OSW_OSZ_S3_E_JSE_SH_SM_SQ_SW_SZ_St12_PlaceholderILi1EEEEEEclEOS3_
    @     0x7feb2998a620  process::ProcessBase::consume()
    @     0x7feb29987675  process::ProcessManager::resume()
    @     0x7feb299a2d2b  
_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN7process14ProcessManager12init_threadsEvE3$_8EEEEE6_M_runEv
    @     0x7feb2632f523  (unknown)
    @     0x7feb25e40594  start_thread
    @     0x7feb25b73e6f  __GI___clone
Aborted (core dumped)
{noformat}

Instead of crashing, the agent should just drop the task with an appropriate 
error in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to