Hi, Am using Spark, 1.5 in latest EMR 4.1.
I have an RDD of String scala> deviceIds res25: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[18] at map at <console>:28 and then when trying to map over the RDD while attempting to run a sql query the result is a NullPointerException scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count() with the stack trace below. If I run the query as a top level expression the count is retuned. There was additional code within the anonymous function that's been removed to try and isolate. Thanks for any insights or advice on how to debug this. -- Nick scala> deviceIds.map(id => sqlContext.sql("select * from ad_info")).count() deviceIds.map(id => sqlContext.sql("select * from ad_info")).count() 15/10/08 16:12:56 INFO SparkContext: Starting job: count at <console>:40 15/10/08 16:12:56 INFO DAGScheduler: Got job 18 (count at <console>:40) with 200 output partitions 15/10/08 16:12:56 INFO DAGScheduler: Final stage: ResultStage 37(count at <console>:40) 15/10/08 16:12:56 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 36) 15/10/08 16:12:56 INFO DAGScheduler: Missing parents: List() 15/10/08 16:12:56 INFO DAGScheduler: Submitting ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40), which has no missing parents 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(17904) called with curMem=531894, maxMem=560993402 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22 stored as values in memory (estimated size 17.5 KB, free 534.5 MB) 15/10/08 16:12:56 INFO MemoryStore: ensureFreeSpace(7143) called with curMem=549798, maxMem=560993402 15/10/08 16:12:56 INFO MemoryStore: Block broadcast_22_piece0 stored as bytes in memory (estimated size 7.0 KB, free 534.5 MB) 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on 10.247.0.117:33555 (size: 7.0 KB, free: 535.0 MB) 15/10/08 16:12:56 INFO SparkContext: Created broadcast 22 from broadcast at DAGScheduler.scala:861 15/10/08 16:12:56 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 37 (MapPartitionsRDD[37] at map at <console>:40) 15/10/08 16:12:56 INFO YarnScheduler: Adding task set 37.0 with 200 tasks 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.0 in stage 37.0 (TID 650, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:46227 (size: 7.0 KB, free: 535.0 MB) 15/10/08 16:12:56 INFO BlockManagerInfo: Added broadcast_22_piece0 in memory on ip-10-247-0-117.ec2.internal:32938 (size: 7.0 KB, free: 535.0 MB) 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.0 in stage 37.0 (TID 651, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 WARN TaskSetManager: Lost task 0.0 in stage 37.0 (TID 649, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at $line101.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.1 in stage 37.0 (TID 652, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.0 in stage 37.0 (TID 650) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 1] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.1 in stage 37.0 (TID 653, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.0 in stage 37.0 (TID 651) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 2] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.1 in stage 37.0 (TID 654, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.1 in stage 37.0 (TID 652) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 3] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.2 in stage 37.0 (TID 655, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.1 in stage 37.0 (TID 653) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 4] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.2 in stage 37.0 (TID 656, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.1 in stage 37.0 (TID 654) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 5] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.2 in stage 37.0 (TID 657, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.2 in stage 37.0 (TID 655) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 6] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 2.2 in stage 37.0 (TID 657) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 7] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 1.2 in stage 37.0 (TID 656) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 8] 15/10/08 16:12:56 INFO TaskSetManager: Starting task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal, PROCESS_LOCAL, 1914 bytes) 15/10/08 16:12:56 INFO TaskSetManager: Lost task 0.3 in stage 37.0 (TID 658) on executor ip-10-247-0-117.ec2.internal: java.lang.NullPointerException (null) [duplicate 9] 15/10/08 16:12:56 ERROR TaskSetManager: Task 0 in stage 37.0 failed 4 times; aborting job 15/10/08 16:12:56 INFO YarnScheduler: Cancelling stage 37 15/10/08 16:12:56 INFO YarnScheduler: Stage 37 was cancelled 15/10/08 16:12:56 INFO DAGScheduler: ResultStage 37 (count at <console>:40) failed in 0.128 s 15/10/08 16:12:56 INFO DAGScheduler: Job 18 failed: count at <console>:40, took 0.145419 s 15/10/08 16:12:56 WARN TaskSetManager: Lost task 2.3 in stage 37.0 (TID 659, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally) 15/10/08 16:12:56 WARN TaskSetManager: Lost task 1.3 in stage 37.0 (TID 660, ip-10-247-0-117.ec2.internal): TaskKilled (killed intentionally) 15/10/08 16:12:56 INFO YarnScheduler: Removed TaskSet 37.0, whose tasks have all completed, from pool org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 4 times, most recent failure: Lost task 0.3 in stage 37.0 (TID 658, ip-10-247-0-117.ec2.internal): java.lang.NullPointerException at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1280) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1268) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1267) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1493) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1455) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1444) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1910) at org.apache.spark.rdd.RDD.count(RDD.scala:1121) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:55) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:75) at $iwC$$iwC$$iwC.<init>(<console>:77) at $iwC$$iwC.<init>(<console>:79) at $iwC.<init>(<console>:81) at <init>(<console>:83) at .<init>(<console>:87) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NullPointerException at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:40) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1555) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1121) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1839) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) scala> 15/10/08 16:13:45 INFO ContextCleaner: Cleaned accumulator 34 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on 10.247.0.117:33555 in memory (size: 7.0 KB, free: 535.0 MB) 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:46227 in memory (size: 7.0 KB, free: 535.0 MB) 15/10/08 16:13:45 INFO BlockManagerInfo: Removed broadcast_22_piece0 on ip-10-247-0-117.ec2.internal:32938 in memory (size: 7.0 KB, free: 535.0 MB) scala> Notice: This communication is for the intended recipient(s) only and may contain confidential, proprietary, legally protected or privileged information of Turbine, Inc. If you are not the intended recipient(s), please notify the sender at once and delete this communication. Unauthorized use of the information in this communication is strictly prohibited and may be unlawful. For those recipients under contract with Turbine, Inc., the information in this communication is subject to the terms and conditions of any applicable contracts or agreements. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org