All InputFormats will use HadoopRDD or NewHadoopRDD. Do you use "file:///" instead of "file://"?
On Mon, Jun 29, 2015 at 8:40 PM, Sourav Mazumder < sourav.mazumde...@gmail.com> wrote: > Hi Jey, > > This solves the class not found problem. Thanks. > > But still the inputs format is not yet resolved. Looks like it is still > trying to create a HadoopRDD I don't know why. The error message goes like - > > > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.take(RDD.scala:1246) > at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) > at org.apache.spark.rdd.RDD.first(RDD.scala:1285) > at > com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:129) > at > com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:127) > at > com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:109) > at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:62) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:115) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40) > at > com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28) > > at > org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) > at $iwC$$iwC$$iwC.<init>(<console>:32) > at $iwC$$iwC.<init>(<console>:34) > at $iwC.<init>(<console>:36) > at <init>(<console>:38) > at .<init>(<console>:42) > at .<clinit>(<console>) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) > at .<init>(<console>:7) > at .<clinit>(<console>) > at java.lang.J9VMInternals.initializeImpl(Native Method) > at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at org.apache.spark.repl.SparkILoop.org > $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > at java.lang.reflect.Method.invoke(Method.java:611) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 83 more > > Regards, > Sourav > > > On Mon, Jun 29, 2015 at 6:53 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote: > >> The format is still "com.databricks.spark.csv", but the parameter passed >> to spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0". >> >> On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder < >> sourav.mazumde...@gmail.com> wrote: >> >>> HI Jey, >>> >>> Not much of luck. >>> >>> If I use the class com.databricks:spark-csv_2. >>> 11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found >>> error. With com.databricks.spark.csv I don't get the class not found error >>> but I still get the previous error even after using file:/// in the URI. >>> >>> Regards, >>> Sourav >>> >>> On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> >>> wrote: >>> >>>> Hi Sourav, >>>> >>>> The error seems to be caused by the fact that your URL starts with >>>> "file://" instead of "file:///". >>>> >>>> Also, I believe the current version of the package for Spark 1.4 with >>>> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". >>>> >>>> -Jey >>>> >>>> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < >>>> sourav.mazumde...@gmail.com> wrote: >>>> >>>>> Hi Jey, >>>>> >>>>> Thanks for your inputs. >>>>> >>>>> Probably I'm getting error as I'm trying to read a csv file from local >>>>> file using com.databricks.spark.csv package. Probably this package has >>>>> hard >>>>> coded dependency on Hadoop as it is trying to read input format from >>>>> HadoopRDD. >>>>> >>>>> Can you please confirm ? >>>>> >>>>> Here is what I did - >>>>> >>>>> Ran the spark-shell as >>>>> >>>>> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3. >>>>> >>>>> Then in the shell I ran : >>>>> val df = >>>>> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv") >>>>> >>>>> >>>>> >>>>> Regards, >>>>> Sourav >>>>> >>>>> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from >>>>> textFile at CsvRelation.scala:114 >>>>> java.lang.RuntimeException: Error in configuring object >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) >>>>> at >>>>> org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) >>>>> at >>>>> org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>> at >>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>> at >>>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>> at scala.Option.getOrElse(Option.scala:120) >>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>>>> at org.apache.spark.rdd.RDD.take(RDD.scala:1246) >>>>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>>>> at >>>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1285) >>>>> at >>>>> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114) >>>>> at >>>>> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112) >>>>> at >>>>> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95) >>>>> at >>>>> com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53) >>>>> at >>>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89) >>>>> at >>>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) >>>>> at >>>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) >>>>> at >>>>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) >>>>> at >>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) >>>>> at >>>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) >>>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >>>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >>>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >>>>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) >>>>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) >>>>> at $iwC$$iwC$$iwC.<init>(<console>:32) >>>>> at $iwC$$iwC.<init>(<console>:34) >>>>> at $iwC.<init>(<console>:36) >>>>> at <init>(<console>:38) >>>>> at .<init>(<console>:42) >>>>> at .<clinit>(<console>) >>>>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>>>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>>>> at .<init>(<console>:7) >>>>> at .<clinit>(<console>) >>>>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>>>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>>>> at $print(<console>) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>> at >>>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>>>> at >>>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) >>>>> at >>>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>>>> at >>>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>>>> at >>>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>>>> at >>>>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >>>>> at >>>>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >>>>> at org.apache.spark.repl.SparkILoop.org >>>>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >>>>> at >>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >>>>> at >>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>> at >>>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>>> at >>>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>>> at org.apache.spark.repl.SparkILoop.org >>>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>>>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>>>> at org.apache.spark.repl.Main.main(Main.scala) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) >>>>> at >>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) >>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) >>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>>> Caused by: java.lang.reflect.InvocationTargetException >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at >>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>>> at >>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>>> at >>>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >>>>> ... 83 more >>>>> >>>>> >>>>> >>>>> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu> >>>>> wrote: >>>>> >>>>>> Actually, Hadoop InputFormats can still be used to read and write >>>>>> from "file://", "s3n://", and similar schemes. You just won't be able to >>>>>> read/write to HDFS without installing Hadoop and setting up an HDFS >>>>>> cluster. >>>>>> >>>>>> To summarize: Sourav, you can use any of the prebuilt packages (i.e. >>>>>> anything other than "source code"). >>>>>> >>>>>> Hope that helps, >>>>>> -Jey >>>>>> >>>>>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> You really donot need hadoop installation. You can dowsload a >>>>>>> pre-built version with any hadoop and unzip it and you are good to go. >>>>>>> Yes >>>>>>> it may complain while launching master and workers, safely ignore them. >>>>>>> The >>>>>>> only problem is while writing to a directory. Of course you will not be >>>>>>> able to use any hadoop inputformat etc. out of the box. >>>>>>> >>>>>>> ** I am assuming its a learning question :) For production, I would >>>>>>> suggest build it from source. >>>>>>> >>>>>>> If you are using python and need some help, please drop me a note >>>>>>> off line. >>>>>>> >>>>>>> Best >>>>>>> Ayan >>>>>>> >>>>>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder < >>>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm trying to run Spark without Hadoop where the data would be read >>>>>>>> and written to local disk. >>>>>>>> >>>>>>>> For this I have few Questions - >>>>>>>> >>>>>>>> 1. Which download I need to use ? In the download option I don't >>>>>>>> see any binary download which does not need Hadoop. Is the only way to >>>>>>>> do >>>>>>>> this to download the source code version and compile the same ? >>>>>>>> >>>>>>>> 2. Which installation/quick start guideline I should use for the >>>>>>>> same. So far I didn't see any documentation which specifically >>>>>>>> addresses >>>>>>>> the Spark without Hadoop installation/setup unless I'm missing out one. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Sourav >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Ayan Guha >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >