Hi,
I am trying to write and debug Spark applications in scala-ide and
maven, and in my code I target at a Spark instance at spark://xxx
object App {
def main(args : Array[String]) {
println( "Hello World!" )
val sparkConf = new
SparkConf().setMaster("spark://xxx:7077").setAppName("WordCount")
val spark = new SparkContext(sparkConf)
val file = spark.textFile("hdfs://xxx:9000/wcinput/pg1184.txt")
val counts = file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://flex05.watson.ibm.com:9000/wcoutput")
}
}
I added spark-core and hadoop-client in maven dependency so the code
compiles fine.
When I click run in Eclipse I got this error:
14/06/06 20:52:18 WARN scheduler.TaskSetManager: Loss was due to
java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: samples.App$$anonfun$2
I googled this error and it seems that I need to package my code into a
jar file and push it to spark nodes. But since I am debugging the code, it
would be handy if I can quickly see results without packaging and
uploading jars.
What is the best practice of writing a spark application (like wordcount)
and debug quickly on a remote spark instance?
Thanks!
Wei
---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan