Thanks for asking this.

I've have this issue with pyspark too on YARN 100 of the time: I quit out of pyspark and, while my Unix shell prompt returns, a 'yarn application -list' always shows (as does the UI) that application is still running (or at least not totally dead). When I then log onto the nodemanagers, I see orphaned/defunct UNIX processes.


I don't know if those are caused by the simple exiting of pyspark, or my having to do a 'yarn application -kill <appID>' to kill applications that should should have terminated graceful (but didn't).


It recognized this problem about a week ago and haven't gotten back to it (it looks like today I will =:)) , but yes I see that same issue.

I am using Spark 1.0.0 (latest from Cloudera), and the latest YARN from Cloudera as well (I forget the exact version at the moment).



Sincerely yours,
Team Dimension Data



On September 4, 2014 6:27:04 AM Hemanth Yamijala <yhema...@gmail.com> wrote:

Hi,

I launched a spark streaming job under YARN using default configuration for
Spark, using spark-submit with the master as yarn-cluster. It launched an
ApplicationMaster, and 2 CoarseGrainedExecutorBackend processes.

Everything ran fine, then I killed the application using yarn application
-kill <appid>.

On doing this, I noticed that it killed only the shell processes that
launch the Spark AM and other processes, but the Java processes were left
alone. They became orphaned and PPID changed to 1.

Is this a bug in Spark or Yarn ? I am using spark 1.0.2 and Hadoop 2.4.1.
The cluster is a single node setup in pseudo-distributed mode.

Thanks
hemanth

Reply via email to