Hi, I have a client application running on host0 that is launching multiple drivers on multiple remote standalone spark clusters (each cluster is running on a single machine):
« ... List("host1", "host2" , "host3").foreach(host => { val sparkConf = new SparkConf() sparkConf.setAppName("App") sparkConf.set("spark.driver.memory", "4g") sparkConf.set("spark.executor.memory", "4g") sparkConf.set("spark.driver.maxResultSize", "4g") sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") sparkConf.set("spark.executor.extraJavaOptions", " -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC " + "-XX:+AggressiveOpts -XX:FreqInlineSize=300 -XX:MaxInlineSize=300 ") sparkConf.setMaster(s"spark://$host:7077") val rawStreams = (1 to source.parallelism).map(_ => ssc.textFileStream("/home/user/data/")).toArray val rawStream = ssc.union(rawStreams) rawStream.count.map(c => s"Received $c records.").print() } ... » The problem is that I'm getting an error message saying that the directory "/home/user/data/" does not exist. In fact, this directory only exists in host1, host2 and host3 and not in host0. But since I'm launching the driver to host1..3 I thought data would be fetched from those machines. I'm also trying to avoid using the spark submit script, and couldn't find the configuration parameter to specify the deploy mode. Is there any way to specify the deploy mode through configuration parameter? Thanks.