HI Jey, Not much of luck.
If I use the class com.databricks:spark-csv_2. 11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found error. With com.databricks.spark.csv I don't get the class not found error but I still get the previous error even after using file:/// in the URI. Regards, Sourav On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote: > Hi Sourav, > > The error seems to be caused by the fact that your URL starts with > "file://" instead of "file:///". > > Also, I believe the current version of the package for Spark 1.4 with > Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". > > -Jey > > On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> Hi Jey, >> >> Thanks for your inputs. >> >> Probably I'm getting error as I'm trying to read a csv file from local >> file using com.databricks.spark.csv package. Probably this package has hard >> coded dependency on Hadoop as it is trying to read input format from >> HadoopRDD. >> >> Can you please confirm ? >> >> Here is what I did - >> >> Ran the spark-shell as >> >> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3. >> >> Then in the shell I ran : >> val df = >> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv") >> >> >> >> Regards, >> Sourav >> >> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from >> textFile at CsvRelation.scala:114 >> java.lang.RuntimeException: Error in configuring object >> at >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) >> at >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) >> at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) >> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >> at >> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >> at scala.Option.getOrElse(Option.scala:120) >> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >> at org.apache.spark.rdd.RDD.take(RDD.scala:1246) >> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >> at org.apache.spark.rdd.RDD.first(RDD.scala:1285) >> at >> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114) >> at >> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112) >> at >> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95) >> at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53) >> at >> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89) >> at >> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) >> at >> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) >> at >> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) >> at >> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) >> at >> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) >> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) >> at $iwC$$iwC$$iwC.<init>(<console>:32) >> at $iwC$$iwC.<init>(<console>:34) >> at $iwC.<init>(<console>:36) >> at <init>(<console>:38) >> at .<init>(<console>:42) >> at .<clinit>(<console>) >> at java.lang.J9VMInternals.initializeImpl(Native Method) >> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >> at .<init>(<console>:7) >> at .<clinit>(<console>) >> at java.lang.J9VMInternals.initializeImpl(Native Method) >> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >> at $print(<console>) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >> at java.lang.reflect.Method.invoke(Method.java:611) >> at >> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >> at >> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) >> at >> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >> at >> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >> at >> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >> at >> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >> at org.apache.spark.repl.SparkILoop.org >> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >> at >> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >> at org.apache.spark.repl.SparkILoop.org >> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >> at org.apache.spark.repl.Main$.main(Main.scala:31) >> at org.apache.spark.repl.Main.main(Main.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >> at java.lang.reflect.Method.invoke(Method.java:611) >> at >> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) >> at >> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >> at java.lang.reflect.Method.invoke(Method.java:611) >> at >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >> ... 83 more >> >> >> >> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu> >> wrote: >> >>> Actually, Hadoop InputFormats can still be used to read and write from >>> "file://", "s3n://", and similar schemes. You just won't be able to >>> read/write to HDFS without installing Hadoop and setting up an HDFS cluster. >>> >>> To summarize: Sourav, you can use any of the prebuilt packages (i.e. >>> anything other than "source code"). >>> >>> Hope that helps, >>> -Jey >>> >>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> wrote: >>> >>>> Hi >>>> >>>> You really donot need hadoop installation. You can dowsload a pre-built >>>> version with any hadoop and unzip it and you are good to go. Yes it may >>>> complain while launching master and workers, safely ignore them. The only >>>> problem is while writing to a directory. Of course you will not be able to >>>> use any hadoop inputformat etc. out of the box. >>>> >>>> ** I am assuming its a learning question :) For production, I would >>>> suggest build it from source. >>>> >>>> If you are using python and need some help, please drop me a note off >>>> line. >>>> >>>> Best >>>> Ayan >>>> >>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder < >>>> sourav.mazumde...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm trying to run Spark without Hadoop where the data would be read >>>>> and written to local disk. >>>>> >>>>> For this I have few Questions - >>>>> >>>>> 1. Which download I need to use ? In the download option I don't see >>>>> any binary download which does not need Hadoop. Is the only way to do this >>>>> to download the source code version and compile the same ? >>>>> >>>>> 2. Which installation/quick start guideline I should use for the same. >>>>> So far I didn't see any documentation which specifically addresses the >>>>> Spark without Hadoop installation/setup unless I'm missing out one. >>>>> >>>>> Regards, >>>>> Sourav >>>>> >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Ayan Guha >>>> >>> >>> >> >