Hi,

I have a client application running on host0 that is launching multiple
drivers on multiple remote standalone spark clusters (each cluster is
running on a single machine):

«
...

List("host1", "host2" , "host3").foreach(host => {

val sparkConf = new SparkConf()
sparkConf.setAppName("App")

sparkConf.set("spark.driver.memory", "4g")
sparkConf.set("spark.executor.memory", "4g")
sparkConf.set("spark.driver.maxResultSize", "4g")
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
sparkConf.set("spark.executor.extraJavaOptions", "
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " +
  "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ")

sparkConf.setMaster(s"spark://$host:7077")

val rawStreams = (1 to source.parallelism).map(_ =>
ssc.textFileStream("/home/user/data/")).toArray
val rawStream = ssc.union(rawStreams)
rawStream.count.map(c => s"Received $c records.").print()

}
...

»

The problem is that I'm getting an error message saying that the
directory "/home/user/data/" does not exist.
In fact, this directory only exists in host1, host2 and host3 and not in host0.
But since I'm launching the driver to host1..3 I thought data would be
fetched from those machines.

I'm also trying to avoid using the spark submit script, and couldn't
find the configuration parameter to specify the deploy mode.

Is there any way to specify the deploy mode through configuration parameter?


Thanks.

Reply via email to