Re: How to submit a job to Spark cluster?

Mayur Rustagi Thu, 20 Feb 2014 17:26:23 -0800

You need a driver to manage execution of jar,
you can use Spark shell to launch the jar and itll manage the execution for
you, you can start the spark shell add your jar in the classpath, call your
function with sc as spark context.


Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Thu, Feb 20, 2014 at 5:10 PM, Nan Zhu <[email protected]> wrote:

>  I think it is a confusing place of current web UI, even your standalone
> app is finished without any error, the status is still KILLED
>
> in spark, in most cases, you don’t need to rely on script to submit jobs,
> you only need to specify the master address when construct a SparkContext
> object,
>
> but if you want to submit a in-cluster driver, you will need
> bin/spark-class,
> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>
> Best,
>
> --
> Nan Zhu
>
> On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:
>
> In a Hadoop cluster, the following command is the general way to submit a
> job:
>        bin/hadoop jar <job-jar> <arguments>
>
>
> Is there such a general way to submit a job into Spark cluster?
>
> Besides, my job finished successfully, and the Spark Web UI shows that
> this application's state is *FINISHED*, but each executor's state is
> *KILLED*. I could see this application has produced the expected result,
> why is each executor's state reported as *KILLED* ?
>
> Completed Applications IDNameCoresMemory per NodeSubmitted Time UserState
> Duration
>
>
>
>
>
>
>
> app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
> **SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0 MB 
> 2014/02/20
> 17:39:57rootFINISHED 13 s
>
>
>
>
>
>
>
>
> Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
> worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
> stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
> 1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
> stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
> 0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
> 41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
> stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>
>
>
> Thanks
> Tao
>
>
> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <[email protected]>:
>
> You are specifying the spark master in the jar
>  .setMaster("spark://hadoop-1.certus.com:7077")
> so sbt run is deploying the jar into the master cluster and running it.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <[email protected]> wrote:
>
>  I’m not sure if I understand your question correctly
>
> do you mean you didn’t see the application information in Spark Web UI
> even it generates the expected results?
>
> Best,
>
> --
> Nan Zhu
>
> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>
> My application source file,  *SimpleDistributedApp.scala*, is as  follows:
>
> __________________________________________________________________
> import org.apache.spark.{SparkConf, SparkContext}
>
> object SimpleDistributedApp {
>     def main(args: Array[String]) = {
>         val filepath = "hdfs://
> hadoop-1.certus.com:54310/user/root/samples/data"
>
>         val conf = new SparkConf()
>                     .setMaster("spark://hadoop-1.certus.com:7077")
>                     .setAppName("**SimpleDistributedApp**")
>
> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>
> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>                     .set("spark.executor.memory", "1g")
>
>         val sc = new SparkContext(conf)
>         val text = sc.textFile(filepath, 3)
>
>         val numOfHello = text.filter(line =>
> line.contains("hello")).count()
>
>         println("number of lines containing 'hello' is " + numOfHello)
>         println("down")
>     }
> }
> ______________________________________________________________________
>
>
>
> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
> _________________________________________________________________
>
> name := "Simple Distributed App"
>
> version := "1.0"
>
> scalaVersion := "2.10.3"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" %
> "0.9.0-incubating"
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
> _________________________________________________________________
>
>
> I built the application into
> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
> using the command
>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>
> I ran it using the command "sbt/sbt run" and it finished running
> successfully.
>
> But I'm not sure what's the correct and general way to submit and run a
> job in Spark cluster. To be specific,after having built a job into a JAR
> file, say *simpleApp.jar*, where should I put it and how should I submit
> it to Spark cluster?
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: How to submit a job to Spark cluster?

Reply via email to