Hi all, I have an issue where I'm able to run my code in standalone mode but not on my cluster. I've isolated it to a few things but am at a lost at how to debug this. Below is the code. Any suggestions would be much appreciated
Thanks! 1) RDD size is causing the problem. The code below as is fails but if I swap smallSample for samples, the code runs end to end on both cluster and standalone. 2) The error I get is: rg.apache.spark.SparkException: Job aborted due to stage failure: Task 3.0:1 failed 4 times, most recent failure: TID 12 on host ip-10-251-14-74.us-west-2.compute.internal failed for unknown reason Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028) Code: val smallSample = sc.parallelize(Array("foo word", "bar word", "baz word")) val samples = sc.textFile("s3n://geonames") // 64MB, 2849439 Lines of short strings val counts = new collection.mutable.HashMap[String, Int].withDefaultValue(0) samples.toArray.foreach(counts(_) += 1) val result = samples.map( l => (l, counts.get(l)) ) result.count Settings (with or without Kryo doesn't matter): export SPARK_JAVA_OPTS="-Xms5g -Xmx10g -XX:MaxPermSize=10g" export SPARK_MEM=10g spark.akka.frameSize 40 #spark.serializer org.apache.spark.serializer.KryoSerializer #spark.kryoserializer.buffer.mb 1000 spark.executor.memory 58315m spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ spark.executor.extraClassPath /root/ephemeral-hdfs/conf -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-debug-Runs-locally-but-not-on-cluster-tp12081.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org