Hi Sourav, The error seems to be caused by the fact that your URL starts with "file://" instead of "file:///".
Also, I believe the current version of the package for Spark 1.4 with Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". -Jey On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > Hi Jey, > > Thanks for your inputs. > > Probably I'm getting error as I'm trying to read a csv file from local > file using com.databricks.spark.csv package. Probably this package has hard > coded dependency on Hadoop as it is trying to read input format from > HadoopRDD. > > Can you please confirm ? > > Here is what I did - > > Ran the spark-shell as > > bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3. > > Then in the shell I ran : > val df = > sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv") > > > > Regards, > Sourav > > 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from > textFile at CsvRelation.scala:114 > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.take(RDD.scala:1246) > at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.first(RDD.scala:1285) > at > com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114) > at > com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112) > at > com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95) > at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) > at > org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) > at $iwC$$iwC$$iwC.<init>(<console>:32) > at $iwC$$iwC.<init>(<console>:34) > at $iwC.<init>(<console>:36) > at <init>(<console>:38) > at .<init>(<console>:42) > at .<clinit>(<console>) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) > at .<init>(<console>:7) > at .<clinit>(<console>) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 83 more > > > > On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu> > wrote: > >> Actually, Hadoop InputFormats can still be used to read and write from >> "file://", "s3n://", and similar schemes. You just won't be able to >> read/write to HDFS without installing Hadoop and setting up an HDFS cluster. >> >> To summarize: Sourav, you can use any of the prebuilt packages (i.e. >> anything other than "source code"). >> >> Hope that helps, >> -Jey >> >> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Hi >>> >>> You really donot need hadoop installation. You can dowsload a pre-built >>> version with any hadoop and unzip it and you are good to go. Yes it may >>> complain while launching master and workers, safely ignore them. The only >>> problem is while writing to a directory. Of course you will not be able to >>> use any hadoop inputformat etc. out of the box. >>> >>> ** I am assuming its a learning question :) For production, I would >>> suggest build it from source. >>> >>> If you are using python and need some help, please drop me a note off >>> line. >>> >>> Best >>> Ayan >>> >>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder < >>> sourav.mazumde...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> I'm trying to run Spark without Hadoop where the data would be read and >>>> written to local disk. >>>> >>>> For this I have few Questions - >>>> >>>> 1. Which download I need to use ? In the download option I don't see >>>> any binary download which does not need Hadoop. Is the only way to do this >>>> to download the source code version and compile the same ? >>>> >>>> 2. Which installation/quick start guideline I should use for the same. >>>> So far I didn't see any documentation which specifically addresses the >>>> Spark without Hadoop installation/setup unless I'm missing out one. >>>> >>>> Regards, >>>> Sourav >>>> >>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> >