Now I am working on a web application and I want to submit a spark job to hadoop yarn. I have already do my own assemble and can run it in command line by the following script:
export YARN_CONF_DIR=/home/gpadmin/clusterConfDir/yarn export SPARK_JAR=./assembly/target/scala-2.9.3/spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar ./spark-class org.apache.spark.deploy.yarn.Client --jar ./examples/target/scala-2.9.3/spark-examples-assembly-0.8.1-incubating.jar --class org.apache.spark.examples.SparkPi --args yarn-standalone --num-workers 3 --master-memory 1g --worker-memory 512m --worker-cores 1 It works fine. The I realized that it is hard to submit the job from a web application .Looks like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar or spark-examples-assembly-0.8.1-incubating.jar is a really big jar. I believe it contains everything . So my question is : 1) when I run the above script, which jar is beed submitted to the yarn server ? 2) It loos like the spark-assembly-0.8.1-incubating-hadoop2.0.5-alpha.jar plays the role of client side and spark-examples-assembly-0.8.1-incubating.jar goes with spark runtime and examples which will be running in yarn, am I right? 3) Does anyone have any similar experience ? I did lots of hadoop MR stuff and want follow the same logic to submit spark job. For now I can only find the command line way to submit spark job to yarn. I believe there is a easy way to integration spark in a web allocation. Thanks. John.