Hi Sourav,

A little help with more clarification on your last comment.

In sense of "where" the driver program is executed, then yes the Flink
driver program runs in a mode similar to Spark's YARN-client.

However, the "role" of the driver program and the work that it is
responsible of is quite different between Flink and Spark. In Spark, the
driver program is in charge of coordinating Spark workers (executors) and
must listen for and accept incoming connections from the workers throughout
the job's lifetime. Therefore, in Spark's YARN-client mode, you must keep
the driver program process alive otherwise the job will be shutdown.

However, in Flink, the coordination of Flink TaskManagers to complete a job
is handled by Flink's JobManager once the client at the driver program
submits the job to the JobManager. The driver program is solely used for the
job submission and can disconnect afterwards. 

Like what Stephan explained, if the user-defined dataflow defines any
intermediate results to be retrieved via collect() or print(), the results
are transmitted through the JobManager. Only then does the driver program
need to stay connected. Note that this connection still does not need to
have any connections with the workers (Flink TaskManagers), only the
JobManager.

Cheers,
Gordon



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-with-Yarn-tp4224p4227.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to