Version: Spark 1.5.2 *Spark built with Hive* git clone git://github.com/apache/spark.git ./make-distribution.sh --tgz -Phadoop-2.4 -Pyarn -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver
*Input:* -sh-4.1$ hadoop fs -du -h /user/dvasthimal/poc_success_spark/data/input 2.5 G /user/dvasthimal/poc_success_spark/data/input/dw_bid_1231.seq 2.5 G /user/dvasthimal/poc_success_spark/data/input/dw_mao_item_best_offr_1231.seq *5.9 G /user/dvasthimal/poc_success_spark/data/input/expt_session_1231.json* -sh-4.1$ *Spark Shell:* export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/apache/hadoop/lib/native/ export SPARK_HOME=*/home/dvasthimal/spark-1.5.2-bin-2.4.0* export SPARK_JAR=$SPARK_HOME/lib/spark-assembly-1.4.0-hadoop2.4.0.jar export HADOOP_CONF_DIR=/apache/hadoop/conf cd $SPARK_HOME export SPARK_CLASSPATH=/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-21.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-EBAY-21/share/hadoop/yarn/lib/guava-11.0.2.jar:/apache/hadoop-2.4.1-EBAY-21/share/hadoop/hdfs/hadoop-hdfs-2.4.1-EBAY-21.jar:/home/dvasthimal/pig_jars/sojourner-common-0.1.3-hadoop2.jar:/home/dvasthimal/pig_jars/jackson-mapper-asl-1.8.5.jar:/home/dvasthimal/pig_jars/sojourner-common-0.1.3-hadoop2.jar:/home/dvasthimal/pig_jars/experimentation-reporting-common-0.0.1-SNAPSHOT.jar:/apache/hadoop/share/hadoop/common/lib/hadoop-ebay-2.4.1-EBAY-11.jar ./bin/spark-shell import org.apache.hadoop.io.Text import org.codehaus.jackson.map.ObjectMapper import com.ebay.hadoop.platform.model.SessionContainer import scala.collection.JavaConversions._ import com.ebay.globalenv.sojourner.TrackingProperty import java.net.URLDecoder import com.ebay.ep.reporting.common.util.TagsUtil import org.apache.hadoop.conf.Configuration import sqlContext.implicits._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.json("/user/dvasthimal/poc_success_spark/data/input/expt_session_1231.json") *Errors:* 1. 16/01/01 18:36:12 INFO json.JSONRelation: Listing hdfs://apollo-phx-nn-ha/user/dvasthimal/poc_success_spark/data/input/expt_session_1231.json on driver 16/01/01 18:36:12 INFO storage.MemoryStore: ensureFreeSpace(268744) called with curMem=0, maxMem=556038881 16/01/01 18:36:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 262.4 KB, free 530.0 MB) 16/01/01 18:36:12 INFO storage.MemoryStore: ensureFreeSpace(24028) called with curMem=268744, maxMem=556038881 16/01/01 18:36:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 23.5 KB, free 530.0 MB) 16/01/01 18:36:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:59605 (size: 23.5 KB, free: 530.3 MB) 16/01/01 18:36:12 INFO spark.SparkContext: Created broadcast 0 from json at <console>:36 16/01/01 18:36:12 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1886) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1088) at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32) at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71) 2. 16/01/01 18:36:44 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 2256 bytes result sent to driver 16/01/01 18:36:44 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 32082 ms on localhost (2/24) 16/01/01 18:36:54 ERROR executor.Executor: Exception in task 9.0 in stage 0.0 (TID 9) java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at org.apache.hadoop.io.Text.setCapacity(Text.java:266) at org.apache.hadoop.io.Text.append(Text.java:236) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:243) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:209) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:47) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:248) 3. TypeRef(TypeSymbol(class $read extends Serializable)) uncaught exception during compilation: java.lang.AssertionError org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 0.0 failed 1 times, most recent failure: Lost task 9.0 in stage 0.0 (TID 9, localhost): java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2271) at org.apache.hadoop.io.Text.setCapacity(Text.java:266) at org.apache.hadoop.io.Text.append(Text.java:236) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:243) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) 4. 16/01/01 18:36:56 ERROR util.Utils: Uncaught exception in thread Executor task launch worker-19 java.lang.NullPointerException at org.apache.spark.scheduler.Task$$anonfun$run$1.apply$mcV$sp(Task.scala:94) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185) at org.apache.spark.scheduler.Task.run(Task.scala:92) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/01/01 18:36:56 ERROR util.Utils: Uncaught exception in thread Executor task launch worker-21 *Questions* 1. How do i fix each of these errors ? 2. https://yarn-jt:50030/cluster/apps/RUNNING does not show spark shell job. 3. Where do i see the submitted spark job ? Regards, Deepak -- Deepak