Nan & Mayur, Thanks, I got it .
Best, 2014-02-21 9:24 GMT+08:00 Mayur Rustagi <[email protected]>: > You need a driver to manage execution of jar, > you can use Spark shell to launch the jar and itll manage the execution > for you, you can start the spark shell add your jar in the classpath, call > your function with sc as spark context. > > Mayur Rustagi > Ph: +919632149971 > h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com > https://twitter.com/mayur_rustagi > > > > On Thu, Feb 20, 2014 at 5:10 PM, Nan Zhu <[email protected]> wrote: > >> I think it is a confusing place of current web UI, even your standalone >> app is finished without any error, the status is still KILLED >> >> in spark, in most cases, you don't need to rely on script to submit jobs, >> you only need to specify the master address when construct a SparkContext >> object, >> >> but if you want to submit a in-cluster driver, you will need >> bin/spark-class, >> http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster >> >> Best, >> >> -- >> Nan Zhu >> >> On Thursday, February 20, 2014 at 8:02 PM, Tao Xiao wrote: >> >> In a Hadoop cluster, the following command is the general way to submit a >> job: >> bin/hadoop jar <job-jar> <arguments> >> >> >> Is there such a general way to submit a job into Spark cluster? >> >> Besides, my job finished successfully, and the Spark Web UI shows that >> this application's state is *FINISHED*, but each executor's state is >> *KILLED*. I could see this application has produced the expected result, >> why is each executor's state reported as *KILLED* ? >> >> Completed Applications IDNameCoresMemory per NodeSubmitted Time UserState >> Duration >> >> >> >> >> >> >> >> app-20140220173957-0001<http://hadoop-1.certus.com:8080/app?appId=app-20140220173957-0001> >> **SimpleDistributedApp** <http://hadoop-1.certus.com:4040/> 12 1024.0 MB >> 2014/02/20 >> 17:39:57rootFINISHED 13 s >> >> >> >> >> >> >> >> >> Executor Summary ExecutorIDWorkerCoresMemoryStateLogs2 >> worker-20140220162542-hadoop-2.certus.com-49805<http://hadoop-2.certus.com:8081/> >> 41024KILLEDstdout<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stdout> >> stderr<http://hadoop-2.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=2&logType=stderr> >> 1worker-20140220162542-hadoop-4.certus.com-40528<http://hadoop-4.certus.com:8081/> >> 41024KILLEDstdout<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stdout> >> stderr<http://hadoop-4.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=1&logType=stderr> >> 0worker-20140220162542-hadoop-3.certus.com-47386<http://hadoop-3.certus.com:8081/> >> 41024KILLEDstdout<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stdout> >> stderr<http://hadoop-3.certus.com:8081/logPage?appId=app-20140220173957-0001&executorId=0&logType=stderr> >> >> >> Thanks >> Tao >> >> >> 2014-02-21 0:00 GMT+08:00 Mayur Rustagi <[email protected]>: >> >> You are specifying the spark master in the jar >> .setMaster("spark://hadoop-1.certus.com:7077") >> so sbt run is deploying the jar into the master cluster and running it. >> Regards >> Mayur >> >> Mayur Rustagi >> Ph: +919632149971 >> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com >> https://twitter.com/mayur_rustagi >> >> >> >> On Thu, Feb 20, 2014 at 7:22 AM, Nan Zhu <[email protected]> wrote: >> >> I'm not sure if I understand your question correctly >> >> do you mean you didn't see the application information in Spark Web UI >> even it generates the expected results? >> >> Best, >> >> -- >> Nan Zhu >> >> On Thursday, February 20, 2014 at 10:13 AM, Tao Xiao wrote: >> >> My application source file, *SimpleDistributedApp.scala*, is as >> follows: >> >> __________________________________________________________________ >> import org.apache.spark.{SparkConf, SparkContext} >> >> object SimpleDistributedApp { >> def main(args: Array[String]) = { >> val filepath = "hdfs:// >> hadoop-1.certus.com:54310/user/root/samples/data" >> >> val conf = new SparkConf() >> .setMaster("spark://hadoop-1.certus.com:7077") >> .setAppName("**SimpleDistributedApp**") >> >> .setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1") >> >> .setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar")) >> .set("spark.executor.memory", "1g") >> >> val sc = new SparkContext(conf) >> val text = sc.textFile(filepath, 3) >> >> val numOfHello = text.filter(line => >> line.contains("hello")).count() >> >> println("number of lines containing 'hello' is " + numOfHello) >> println("down") >> } >> } >> ______________________________________________________________________ >> >> >> >> The corresponding sbt file, *$SPARK_HOME/simple.sbt*, is as follows: >> _________________________________________________________________ >> >> name := "Simple Distributed App" >> >> version := "1.0" >> >> scalaVersion := "2.10.3" >> >> libraryDependencies += "org.apache.spark" %% "spark-core" % >> "0.9.0-incubating" >> >> resolvers += "Akka Repository" at "http://repo.akka.io/releases/" >> _________________________________________________________________ >> >> >> I built the application into >> *$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*, >> using the command >> SPARK_HADOOP_VERSION=1.2.1 sbt/sbt package >> >> I ran it using the command "sbt/sbt run" and it finished running >> successfully. >> >> But I'm not sure what's the correct and general way to submit and run a >> job in Spark cluster. To be specific,after having built a job into a JAR >> file, say *simpleApp.jar*, where should I put it and how should I submit >> it to Spark cluster? >> >> >> >> >> >> >> >> >> >> >> >> >> >
