Hi
I am facing an issue with Cluster Mode, with pyspark
Here is my code:
conf = SparkConf()
conf.setAppName("Spark Ingestion")
conf.set("spark.yarn.queue","root.Applications")
conf.set("spark.executor.instances","50")
conf.set("spark.executor.memory","22g")
conf.set("spark.yarn.executor.memoryOverhead","4096")
conf.set("spark.executor.cores","4")
conf.set("spark.sql.hive.convertMetastoreParquet", "false")
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
r = sc.parallelize(xrange(1,10000))
print r.count()
sc.stop()
The problem is none of my Config settings are passed on to Yarn.
spark-submit --master yarn --deploy-mode cluster ayan_test.py
I tried the same code with deploy-mode=client and all config are passing
fine.
Am I missing something? Will introducing --property-file be of any help?
Can anybody share some working example?
Best
Ayan
--
Best Regards,
Ayan Guha