My application source file, *SimpleDistributedApp.scala*, is as follows:
__________________________________________________________________
import org.apache.spark.{SparkConf, SparkContext}
object SimpleDistributedApp {
def main(args: Array[String]) = {
val filepath = "hdfs://
hadoop-1.certus.com:54310/user/root/samples/data"
val conf = new SparkConf()
.setMaster("spark://hadoop-1.certus.com:7077")
.setAppName("**SimpleDistributedApp**")
.setSparkHome("/home/xt/soft/spark-0.9.0-incubating-bin-hadoop1")
.setJars(Array("target/scala-2.10/simple-distributed-app_2.10-1.0.jar"))
.set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
val text = sc.textFile(filepath, 3)
val numOfHello = text.filter(line => line.contains("hello")).count()
println("number of lines containing 'hello' is " + numOfHello)
println("down")
}
}
______________________________________________________________________
The corresponding sbt file, *$SPARK_HOME/simple.sbt*, is as follows:
_________________________________________________________________
name := "Simple Distributed App"
version := "1.0"
scalaVersion := "2.10.3"
libraryDependencies += "org.apache.spark" %% "spark-core" %
"0.9.0-incubating"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
_________________________________________________________________
I built the application into
*$SPARK_HOME/target/scala-2.10/simple-distributed-app_2.10-1.0.jar*, using
the command
SPARK_HADOOP_VERSION=1.2.1 sbt/sbt package
I ran it using the command "sbt/sbt run" and it finished running
successfully.
But I'm not sure what's the correct and general way to submit and run a job
in Spark cluster. To be specific,after having built a job into a JAR file,
say *simpleApp.jar*, where should I put it and how should I submit it to
Spark cluster?