Hi Andrew,

Thanks a lot for your response. I am aware of the '--master' flag in the
spark-submit command. However I would like to create the SparkContext
inside my coding.

Maybe I should elaborate a little bit further: I would like to reuse e.g.
the result of any Spark computation inside my code.

Here is the SparkPi example:

String[] jars = new String[1];
>
>   jars[0] = System.getProperty("user.dir") +
>> "/target/SparkPi-1.0-SNAPSHOT.jar";
>
>
>>   SparkConf conf = new SparkConf()
>
>   .setAppName("JavaSparkPi")
>
>   .setMaster("spark://SPARK_HOST:7077")
>
>   .setJars(jars);
>
>       JavaSparkContext sc = new JavaSparkContext(conf);
>
>
>>       int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2;
>
>       int n = 1000000 * slices;
>
>       List<Integer> l = new ArrayList<Integer>(n);
>
>       for (int i = 0; i < n; i++) {
>
>         l.add(i);
>
>       }
>
>
>>       JavaRDD<Integer> dataSet = sc.parallelize(l, slices);
>
>
>>       int *count* = dataSet.map(new Function<Integer, Integer>() {
>
>         @Override
>
>         public Integer call(Integer integer) {
>
>           double x = Math.random() * 2 - 1;
>
>           double y = Math.random() * 2 - 1;
>
>           return (x * x + y * y < 1) ? 1 : 0;
>
>         }
>
>       }).reduce(new Function2<Integer, Integer, Integer>() {
>
>         @Override
>
>         public Integer call(Integer integer, Integer integer2) {
>
>           return integer + integer2;
>
>         }
>
>       });
>
>       System.out.println("Pi is roughly " + 4.0 * *count* / n);
>
>
>>       sc.stop();
>
>
As you can see, I can reuse the result (count) in my coding directly.

So my goal would be to resuse this kind of implementation in YARN mode
(client/cluster mode). However, I didn't really find a solution how to do
that, since I always have to submit my Spark code via spark-submit.

What if I want to run this code as part of a web application which renders
the result as web page?

-- Andreas

On Tue, Aug 18, 2015 at 10:50 PM, Andrew Or <and...@databricks.com> wrote:

> Hi Andreas,
>
> I believe the distinction is not between standalone and YARN mode, but
> between client and cluster mode.
>
> In client mode, your Spark submit JVM runs your driver code. In cluster
> mode, one of the workers (or NodeManagers if you're using YARN) in the
> cluster runs your driver code. In the latter case, it doesn't really make
> sense to call `setMaster` in your driver because Spark needs to know which
> cluster you're submitting the application to.
>
> Instead, the recommended way is to set the master through the `--master`
> flag in the command line, e.g.
>
> $ bin/spark-submit
>     --master spark://1.2.3.4:7077
>     --class some.user.Clazz
>     --name "My app name"
>     --jars lib1.jar,lib2.jar
>     --deploy-mode cluster
>     app.jar
>
> Both YARN and standalone modes support client and cluster modes, and the
> spark-submit script is the common interface through which you can launch
> your application. In other words, you shouldn't have to do anything more
> than providing a different value to `--master` to use YARN.
>
> -Andrew
>
> 2015-08-17 0:34 GMT-07:00 Andreas Fritzler <andreas.fritz...@gmail.com>:
>
>> Hi all,
>>
>> when runnig the Spark cluster in standalone mode I am able to create the
>> Spark context from Java via the following code snippet:
>>
>> SparkConf conf = new SparkConf()
>>>    .setAppName("MySparkApp")
>>>    .setMaster("spark://SPARK_MASTER:7077")
>>>    .setJars(jars);
>>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>>
>> As soon as I'm done with my processing, I can just close it via
>>
>>> sc.stop();
>>>
>> Now my question: Is the same also possible when running Spark on YARN? I
>> currently don't see how this should be possible without submitting your
>> application as a packaged jar file. Is there a way to get this kind of
>> interactivity from within your Scala/Java code?
>>
>> Regards,
>> Andrea
>>
>
>

Reply via email to