I'm trying to get my feet wet with Spark. I've done some simple stuff in the shell in standalone mode, and now I'm trying to connect to HDFS resources, but I'm running into a problem.
I synced to git's master branch (c399baa - "SPARK-1456 Remove view bounds on Ordered in favor of a context bound on Ordering. (3 days ago) <Michael Armbrust>" and built like so: SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly This created various jars in various places, including these (I think): ./examples/target/scala-2.10/spark-examples-assembly-1.0.0-SNAPSHOT.jar ./tools/target/scala-2.10/spark-tools-assembly-1.0.0-SNAPSHOT.jar ./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.2.0.jar In `conf/spark-env.sh`, I added this (actually before I did the assembly): export HADOOP_CONF_DIR=/etc/hadoop/conf Now I fire up the shell (bin/spark-shell) and try to grab data from HFDS, and get the following exception: scala> var hdf = sc.hadoopFile("hdfs:///user/kwilliams/dat/part-m-00000") hdf: org.apache.spark.rdd.RDD[(Nothing, Nothing)] = HadoopRDD[0] at hadoopFile at <console>:12 scala> hdf.count() java.lang.RuntimeException: java.lang.InstantiationException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:209) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:207) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1064) at org.apache.spark.rdd.RDD.count(RDD.scala:806) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15) at $iwC$$iwC$$iwC.<init>(<console>:20) at $iwC$$iwC.<init>(<console>:22) at $iwC.<init>(<console>:24) at <init>(<console>:26) at .<init>(<console>:30) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:777) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1045) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) Caused by: java.lang.InstantiationException at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) ... 41 more Is this recognizable to anyone as a build problem, or a config problem, or anything? Failing that, any way to get more information about where in the process it's failing? Thanks. -- Ken Williams, Senior Research Scientist WindLogics http://windlogics.com ________________________________ CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind is strictly prohibited. If you are not the intended recipient, please contact the sender via reply e-mail and destroy all copies of the original message. Thank you.