I am trying to understand the lifecycle of an RPCEndpoint.

Here is my understanding: After negotiating containers form the
ClusterManager, the master starts the CoarseGrainedExecutorBackend on the
worker which connects back to the CoarseGrainedSchedulerBackend's
DriverEndpoint which sends requests/messages to the
CoarseGrainedExecutorBackend.

*Q1: My inference is the lifecyle of CoarseGrainedExecutorBackend is:
onConnected() -> onStart() -> receive -> onStop(). The receive() method
keeps taking the requests/messages and executing them meaning that the
receive() method is called multiple times throughout its lifecycle. Is my
understanding right?*

*Q2: The receive method executes "messages/requests" as per the source code
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/RpcEndpoint.scala>.
What exactly are these messages/requests? Is it referring to the "set of
tasks on assigned to this particular RPCEndpoint" from a stage of a spark
RDD on its individual partitions?*

*Q3: If the receive method is indeed called multiple times through the
course of a spark job where each request refers to the set of task(s) of a
stage, then does this mean a new Executor is instantiated when the
receive() method is called (as the code suggests in line 129
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala>)
which in turn happens every time a stage is executed and a set of tasks are
sent to a particular RPCEndpoint (CoarseGrainedExecutorBackend) after
shutting down the executor from the previous stage?*



I have put this question up on SO as well @
https://stackoverflow.com/questions/59388700/understanding-the-lifecycle-of-and-rpcendpoint-coarsegrainedexecutorbackend
.


It would be a lot of help if one could elaborate and shed light on these
questions.

Reply via email to