> > One part is passing the command line options, like “--master”, from the > JVM launched by spark-submit to the JVM where SparkContext resides
Since I have full control over both - JVM and Julia parts - I can pass whatever options to both. But what exactly should be passed? Currently pipeline looks like this: spark-submit JVM -> JuliaRunner -> julia process -> new JVM with SparkContext I want to make the last JVM's SparkContext to understand that it should run on YARN. Obviously, I can't pass `--master yarn` option to JVM itself. Instead, I can pass system property "spark.master" = "yarn-client", but this results in an error: Retrying connect to server: 0.0.0.0/0.0.0.0:8032 So it's definitely not enough. I tried to set manually all system properties that `spark-submit` adds to the JVM (including "spark-submit=true", "spark.submit.deployMode=client", etc.), but it didn't help too. Source code is always good, but for a stranger like me it's a little bit hard to grasp control flow in SparkSubmit class. For pySpark & SparkR, when running scripts in client deployment modes > (standalone client and yarn client), the JVM is the same (py4j/RBackend > running as a thread in the JVM launched by spark-submit) Can you elaborate on this? Does it mean that `spark-submit` creates new Python/R process that connects back to that same JVM and creates SparkContext in it? On Tue, Apr 12, 2016 at 2:04 PM, Sun, Rui <rui....@intel.com> wrote: > There is much deployment preparation work handling different deployment > modes for pyspark and SparkR in SparkSubmit. It is difficult to summarize > it briefly, you had better refer to the source code. > > > > Supporting running Julia scripts in SparkSubmit is more than implementing > a ‘JuliaRunner’. One part is passing the command line options, like > “--master”, from the JVM launched by spark-submit to the JVM where > SparkContext resides, in the case that the two JVMs are not the same. For > pySpark & SparkR, when running scripts in client deployment modes > (standalone client and yarn client), the JVM is the same (py4j/RBackend > running as a thread in the JVM launched by spark-submit) , so no need to > pass the command line options around. However, in your case, Julia > interpreter launches an in-process JVM for SparkContext, which is a > separate JVM from the one launched by spark-submit. So you need a way, > typically an environment environment variable, like “SPARKR_SUBMIT_ARGS” > for SparkR or “PYSPARK_SUBMIT_ARGS” for pyspark, to pass command line args > to the in-process JVM in the Julia interpreter so that SparkConf can pick > the options. > > > > *From:* Andrei [mailto:faithlessfri...@gmail.com] > *Sent:* Tuesday, April 12, 2016 3:48 AM > *To:* user <user@spark.apache.org> > *Subject:* How does spark-submit handle Python scripts (and how to repeat > it)? > > > > I'm working on a wrapper [1] around Spark for the Julia programming > language [2] similar to PySpark. I've got it working with Spark Standalone > server by creating local JVM and setting master programmatically. However, > this approach doesn't work with YARN (and probably Mesos), which require > running via `spark-submit`. > > > > In `SparkSubmit` class I see that for Python a special class > `PythonRunner` is launched, so I tried to do similar `JuliaRunner`, which > essentially does the following: > > > > val pb = new ProcessBuilder(Seq("julia", juliaScript)) > > val process = pb.start() > > process.waitFor() > > > > where `juliaScript` itself creates new JVM and `SparkContext` inside it > WITHOUT setting master URL. I then tried to launch this class using > > > > spark-submit --master yarn \ > > --class o.a.s.a.j.JuliaRunner \ > > project.jar my_script.jl > > > > I expected that `spark-submit` would set environment variables or > something that SparkContext would then read and connect to appropriate > master. This didn't happen, however, and process failed while trying to > instantiate `SparkContext`, saying that master is not specified. > > > > So what am I missing? How can use `spark-submit` to run driver in a > non-JVM language? > > > > > > [1]: https://github.com/dfdx/Sparta.jl > > [2]: http://julialang.org/ >