Re: How to submit a job to Spark cluster?

Tao Xiao Thu, 20 Feb 2014 17:04:23 -0800

In a Hadoop cluster, the following command is the general way to submit a
job:
       bin/hadoop jar <job-jar> <arguments>



Is there such a general way to submit a job into Spark cluster?

Besides, my job finished successfully, and the Spark Web UI shows that this
application's state is *FINISHED*, but each executor's state is *KILLED*. I
could see this application has produced the expected result, why is each
executor's state reported as *KILLED* ?

Completed Applications IDNameCoresMemory per NodeSubmitted TimeUserState
Duration







app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
**SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0
MB 2014/02/20
17:39:57rootFINISHED13 s








Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>


Thanks
Tao


2014-02-21 0:00 GMT+08:00 Mayur Rustagi <[email protected]>:

> You are specifying the spark master in the jar
>  .setMaster("spark://hadoop-1.certus.com:7077")
> so sbt run is deploying the jar into the master cluster and running it.
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <[email protected]> wrote:
>
>>  I'm not sure if I understand your question correctly
>>
>> do you mean you didn't see the application information in Spark Web UI
>> even it generates the expected results?
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>>
>> My application source file,  *SimpleDistributedApp.scala*, is as
>>  follows:
>>
>> __________________________________________________________________
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> object SimpleDistributedApp {
>>     def main(args: Array[String]) = {
>>         val filepath = "hdfs://
>> hadoop-1.certus.com:54310/user/root/samples/data"
>>
>>         val conf = new SparkConf()
>>                     .setMaster("spark://hadoop-1.certus.com:7077")
>>                     .setAppName("**SimpleDistributedApp**")
>>
>> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>>
>> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>>                     .set("spark.executor.memory", "1g")
>>
>>         val sc = new SparkContext(conf)
>>         val text = sc.textFile(filepath, 3)
>>
>>         val numOfHello = text.filter(line =>
>> line.contains("hello")).count()
>>
>>         println("number of lines containing 'hello' is " + numOfHello)
>>         println("down")
>>     }
>> }
>> ______________________________________________________________________
>>
>>
>>
>> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
>> _________________________________________________________________
>>
>> name := "Simple Distributed App"
>>
>> version := "1.0"
>>
>> scalaVersion := "2.10.3"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" %
>> "0.9.0-incubating"
>>
>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>> _________________________________________________________________
>>
>>
>> I built the application into
>> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
>> using the command
>>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>>
>> I ran it using the command "sbt/sbt run" and it finished running
>> successfully.
>>
>> But I'm not sure what's the correct and general way to submit and run a
>> job in Spark cluster. To be specific,after having built a job into a JAR
>> file, say *simpleApp.jar*, where should I put it and how should I submit
>> it to Spark cluster?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: How to submit a job to Spark cluster?

Reply via email to