The format is still "com.databricks.spark.csv", but the parameter passed to spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0".
On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > HI Jey, > > Not much of luck. > > If I use the class com.databricks:spark-csv_2. > 11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found > error. With com.databricks.spark.csv I don't get the class not found error > but I still get the previous error even after using file:/// in the URI. > > Regards, > Sourav > > On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote: > >> Hi Sourav, >> >> The error seems to be caused by the fact that your URL starts with >> "file://" instead of "file:///". >> >> Also, I believe the current version of the package for Spark 1.4 with >> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". >> >> -Jey >> >> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < >> sourav.mazumde...@gmail.com> wrote: >> >>> Hi Jey, >>> >>> Thanks for your inputs. >>> >>> Probably I'm getting error as I'm trying to read a csv file from local >>> file using com.databricks.spark.csv package. Probably this package has hard >>> coded dependency on Hadoop as it is trying to read input format from >>> HadoopRDD. >>> >>> Can you please confirm ? >>> >>> Here is what I did - >>> >>> Ran the spark-shell as >>> >>> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3. >>> >>> Then in the shell I ran : >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv") >>> >>> >>> >>> Regards, >>> Sourav >>> >>> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from >>> textFile at CsvRelation.scala:114 >>> java.lang.RuntimeException: Error in configuring object >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) >>> at >>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) >>> at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) >>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) >>> at >>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>> at >>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>> at scala.Option.getOrElse(Option.scala:120) >>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>> at >>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>> at >>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>> at >>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>> at scala.Option.getOrElse(Option.scala:120) >>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>> at org.apache.spark.rdd.RDD.take(RDD.scala:1246) >>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>> at org.apache.spark.rdd.RDD.first(RDD.scala:1285) >>> at >>> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114) >>> at >>> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112) >>> at >>> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95) >>> at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53) >>> at >>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89) >>> at >>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) >>> at >>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) >>> at >>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) >>> at >>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) >>> at >>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) >>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) >>> at $iwC$$iwC$$iwC.<init>(<console>:32) >>> at $iwC$$iwC.<init>(<console>:34) >>> at $iwC.<init>(<console>:36) >>> at <init>(<console>:38) >>> at .<init>(<console>:42) >>> at .<clinit>(<console>) >>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>> at .<init>(<console>:7) >>> at .<clinit>(<console>) >>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>> at $print(<console>) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>> at java.lang.reflect.Method.invoke(Method.java:611) >>> at >>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>> at >>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) >>> at >>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>> at >>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>> at >>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>> at >>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >>> at org.apache.spark.repl.SparkILoop.org >>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>> at >>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>> at >>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>> at org.apache.spark.repl.SparkILoop.org >>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>> at org.apache.spark.repl.Main.main(Main.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>> at java.lang.reflect.Method.invoke(Method.java:611) >>> at >>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) >>> at >>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>> at java.lang.reflect.Method.invoke(Method.java:611) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >>> ... 83 more >>> >>> >>> >>> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu> >>> wrote: >>> >>>> Actually, Hadoop InputFormats can still be used to read and write from >>>> "file://", "s3n://", and similar schemes. You just won't be able to >>>> read/write to HDFS without installing Hadoop and setting up an HDFS >>>> cluster. >>>> >>>> To summarize: Sourav, you can use any of the prebuilt packages (i.e. >>>> anything other than "source code"). >>>> >>>> Hope that helps, >>>> -Jey >>>> >>>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> wrote: >>>> >>>>> Hi >>>>> >>>>> You really donot need hadoop installation. You can dowsload a >>>>> pre-built version with any hadoop and unzip it and you are good to go. Yes >>>>> it may complain while launching master and workers, safely ignore them. >>>>> The >>>>> only problem is while writing to a directory. Of course you will not be >>>>> able to use any hadoop inputformat etc. out of the box. >>>>> >>>>> ** I am assuming its a learning question :) For production, I would >>>>> suggest build it from source. >>>>> >>>>> If you are using python and need some help, please drop me a note off >>>>> line. >>>>> >>>>> Best >>>>> Ayan >>>>> >>>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder < >>>>> sourav.mazumde...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm trying to run Spark without Hadoop where the data would be read >>>>>> and written to local disk. >>>>>> >>>>>> For this I have few Questions - >>>>>> >>>>>> 1. Which download I need to use ? In the download option I don't see >>>>>> any binary download which does not need Hadoop. Is the only way to do >>>>>> this >>>>>> to download the source code version and compile the same ? >>>>>> >>>>>> 2. Which installation/quick start guideline I should use for the >>>>>> same. So far I didn't see any documentation which specifically addresses >>>>>> the Spark without Hadoop installation/setup unless I'm missing out one. >>>>>> >>>>>> Regards, >>>>>> Sourav >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> Ayan Guha >>>>> >>>> >>>> >>> >> >