Re: Running Spark 1.4.1 without Hadoop

Jey Kottalam Mon, 29 Jun 2015 18:54:24 -0700

The format is still "com.databricks.spark.csv", but the parameter passed to
spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0".


On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> HI Jey,
>
> Not much of luck.
>
> If I use the class com.databricks:spark-csv_2.
> 11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found
> error. With com.databricks.spark.csv I don't get the class not found error
> but I still get the previous error even after using file:/// in the URI.
>
> Regards,
> Sourav
>
> On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote:
>
>> Hi Sourav,
>>
>> The error seems to be caused by the fact that your URL starts with
>> "file://" instead of "file:///".
>>
>> Also, I believe the current version of the package for Spark 1.4 with
>> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0".
>>
>> -Jey
>>
>> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>>> Hi Jey,
>>>
>>> Thanks for your inputs.
>>>
>>> Probably I'm getting error as I'm trying to read a csv file from local
>>> file using com.databricks.spark.csv package. Probably this package has hard
>>> coded dependency on Hadoop as it is trying to read input format from
>>> HadoopRDD.
>>>
>>> Can you please confirm ?
>>>
>>> Here is what I did -
>>>
>>> Ran the spark-shell as
>>>
>>> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3.
>>>
>>> Then in the shell I ran :
>>> val df = 
>>> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv")
>>>
>>>
>>>
>>> Regards,
>>> Sourav
>>>
>>> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from
>>> textFile at CsvRelation.scala:114
>>> java.lang.RuntimeException: Error in configuring object
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>>     at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190)
>>>     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
>>>     at
>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>     at
>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>     at scala.Option.getOrElse(Option.scala:120)
>>>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>     at
>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>     at
>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>     at
>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>     at scala.Option.getOrElse(Option.scala:120)
>>>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>     at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251)
>>>     at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>>>     at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>>     at org.apache.spark.rdd.RDD.take(RDD.scala:1246)
>>>     at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286)
>>>     at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>>>     at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>>     at org.apache.spark.rdd.RDD.first(RDD.scala:1285)
>>>     at
>>> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114)
>>>     at
>>> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112)
>>>     at
>>> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95)
>>>     at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53)
>>>     at
>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89)
>>>     at
>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
>>>     at
>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
>>>     at
>>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265)
>>>     at
>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
>>>     at
>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
>>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
>>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
>>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
>>>     at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
>>>     at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
>>>     at $iwC$$iwC$$iwC.<init>(<console>:32)
>>>     at $iwC$$iwC.<init>(<console>:34)
>>>     at $iwC.<init>(<console>:36)
>>>     at <init>(<console>:38)
>>>     at .<init>(<console>:42)
>>>     at .<clinit>(<console>)
>>>     at java.lang.J9VMInternals.initializeImpl(Native Method)
>>>     at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
>>>     at .<init>(<console>:7)
>>>     at .<clinit>(<console>)
>>>     at java.lang.J9VMInternals.initializeImpl(Native Method)
>>>     at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
>>>     at $print(<console>)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>>     at
>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>>     at
>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
>>>     at
>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>>     at
>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>>     at
>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>>     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>>     at
>>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>>     at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>>     at org.apache.spark.repl.SparkILoop.org
>>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>>     at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>>     at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>>     at
>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>>     at
>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>>     at org.apache.spark.repl.SparkILoop.org
>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>>     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>>     at org.apache.spark.repl.Main$.main(Main.scala:31)
>>>     at org.apache.spark.repl.Main.main(Main.scala)
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>>     at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>>>     at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> Caused by: java.lang.reflect.InvocationTargetException
>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>     at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>>     at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>>     at
>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>>     ... 83 more
>>>
>>>
>>>
>>> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu>
>>> wrote:
>>>
>>>> Actually, Hadoop InputFormats can still be used to read and write from
>>>> "file://", "s3n://", and similar schemes. You just won't be able to
>>>> read/write to HDFS without installing Hadoop and setting up an HDFS 
>>>> cluster.
>>>>
>>>> To summarize: Sourav, you can use any of the prebuilt packages (i.e.
>>>> anything other than "source code").
>>>>
>>>> Hope that helps,
>>>> -Jey
>>>>
>>>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> You really donot need hadoop installation. You can dowsload a
>>>>> pre-built version with any hadoop and unzip it and you are good to go. Yes
>>>>> it may complain while launching master and workers, safely ignore them. 
>>>>> The
>>>>> only problem is while writing to a directory. Of course you will not be
>>>>> able to use any hadoop inputformat etc. out of the box.
>>>>>
>>>>> ** I am assuming its a learning question :) For production, I would
>>>>> suggest build it from source.
>>>>>
>>>>> If you are using python and need some help, please drop me a note off
>>>>> line.
>>>>>
>>>>> Best
>>>>> Ayan
>>>>>
>>>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder <
>>>>> sourav.mazumde...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to run Spark without Hadoop where the data would be read
>>>>>> and written to local disk.
>>>>>>
>>>>>> For this I have few Questions -
>>>>>>
>>>>>> 1. Which download I need to use ? In the download option I don't see
>>>>>> any binary download which does not need Hadoop. Is the only way to do 
>>>>>> this
>>>>>> to download the source code version and compile the same ?
>>>>>>
>>>>>> 2. Which installation/quick start guideline I should use for the
>>>>>> same. So far I didn't see any documentation which specifically addresses
>>>>>> the Spark without Hadoop installation/setup unless I'm missing out one.
>>>>>>
>>>>>> Regards,
>>>>>> Sourav
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Ayan Guha
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Running Spark 1.4.1 without Hadoop

Reply via email to