YARN - Pyspark

ayan guha Thu, 29 Sep 2016 23:33:36 -0700

Hi

I just observed a litlte weird behavior:


I ran a pyspark job, very simple one.

conf = SparkConf()
conf.setAppName("Historical Meter Load")
conf.set("spark.yarn.queue","root.Applications")
conf.set("spark.executor.instances","50")
conf.set("spark.executor.memory","10g")
conf.set("spark.yarn.executor.memoryOverhead","2048")
conf.set("spark.sql.shuffle.partitions",1000)
conf.set("spark.executor.cores","4")
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)

df = sqlContext.sql("some sql")

c = df.count()

df.filter(df["RNK"] == 1).saveAsTable("some table").mode("overwrite")

sc.stop()

running is on CDH 5.7 cluster, Spark 1.6.0.

Behavior observed: After few hours of running (definitely over 12H, but not
sure exacly when), Yarn reported job as Completed, finished successfully,
whereas the job kept running (I can see from Application master link) for
22H. Timing of the job is expected. Behavior of YARN is not.

Is it a known issue? Is it a pyspark specific issue or same with scala as
well?


-- 
Best Regards,
Ayan Guha

YARN - Pyspark

Reply via email to