Re: How to submit a job to Spark cluster?

Tao Xiao Thu, 20 Feb 2014 17:34:31 -0800

Nan & Mayur,

Thanks, I got it .



Best,



2014-02-21 9:24 GMT+08:00 Mayur Rustagi <[email protected]>:

> You need a driver to manage execution of jar,
> you can use Spark shell to launch the jar and itll manage the execution
> for you, you can start the spark shell add your jar in the classpath, call
> your function with sc as spark context.
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 5:10 PM, Nan Zhu <[email protected]> wrote:
>
>>  I think it is a confusing place of current web UI, even your standalone
>> app is finished without any error, the status is still KILLED
>>
>> in spark, in most cases, you don't need to rely on script to submit jobs,
>> you only need to specify the master address when construct a SparkContext
>> object,
>>
>> but if you want to submit a in-cluster driver, you will need
>> bin/spark-class,
>> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote:
>>
>> In a Hadoop cluster, the following command is the general way to submit a
>> job:
>>        bin/hadoop jar <job-jar> <arguments>
>>
>>
>> Is there such a general way to submit a job into Spark cluster?
>>
>> Besides, my job finished successfully, and the Spark Web UI shows that
>> this application's state is *FINISHED*, but each executor's state is
>> *KILLED*. I could see this application has produced the expected result,
>> why is each executor's state reported as *KILLED* ?
>>
>> Completed Applications IDNameCoresMemory per NodeSubmitted Time UserState
>> Duration
>>
>>
>>
>>
>>
>>
>>
>> app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001>
>> **SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0 MB 
>> 2014/02/20
>> 17:39:57rootFINISHED 13 s
>>
>>
>>
>>
>>
>>
>>
>>
>> Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2
>> worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout>
>> stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr>
>> 1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout>
>> stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr>
>> 0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/>
>> 41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout>
>> stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr>
>>
>>
>> Thanks
>> Tao
>>
>>
>> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <[email protected]>:
>>
>> You are specifying the spark master in the jar
>>  .setMaster("spark://hadoop-1.certus.com:7077")
>> so sbt run is deploying the jar into the master cluster and running it.
>> Regards
>> Mayur
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <[email protected]> wrote:
>>
>>  I'm not sure if I understand your question correctly
>>
>> do you mean you didn't see the application information in Spark Web UI
>> even it generates the expected results?
>>
>> Best,
>>
>> --
>> Nan Zhu
>>
>> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote:
>>
>> My application source file,  *SimpleDistributedApp.scala*, is as
>>  follows:
>>
>> __________________________________________________________________
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> object SimpleDistributedApp {
>>     def main(args: Array[String]) = {
>>         val filepath = "hdfs://
>> hadoop-1.certus.com:54310/user/root/samples/data"
>>
>>         val conf = new SparkConf()
>>                     .setMaster("spark://hadoop-1.certus.com:7077")
>>                     .setAppName("**SimpleDistributedApp**")
>>
>> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
>>
>> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
>>                     .set("spark.executor.memory", "1g")
>>
>>         val sc = new SparkContext(conf)
>>         val text = sc.textFile(filepath, 3)
>>
>>         val numOfHello = text.filter(line =>
>> line.contains("hello")).count()
>>
>>         println("number of lines containing 'hello' is " + numOfHello)
>>         println("down")
>>     }
>> }
>> ______________________________________________________________________
>>
>>
>>
>> The corresponding sbt file, *$SPARK_HOME/simple.sbt*,  is as follows:
>> _________________________________________________________________
>>
>> name := "Simple Distributed App"
>>
>> version := "1.0"
>>
>> scalaVersion := "2.10.3"
>>
>> libraryDependencies += "org.apache.spark" %% "spark-core" %
>> "0.9.0-incubating"
>>
>> resolvers += "Akka Repository" at "http://repo.akka.io/releases/";
>> _________________________________________________________________
>>
>>
>> I built the application into
>> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*,
>> using the command
>>         SPARK_HADOOP_VERSION=1.2.1   sbt/sbt   package
>>
>> I ran it using the command "sbt/sbt run" and it finished running
>> successfully.
>>
>> But I'm not sure what's the correct and general way to submit and run a
>> job in Spark cluster. To be specific,after having built a job into a JAR
>> file, say *simpleApp.jar*, where should I put it and how should I submit
>> it to Spark cluster?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: How to submit a job to Spark cluster?

Reply via email to