Re: Running Spark 1.4.1 without Hadoop

Sourav Mazumder Mon, 29 Jun 2015 15:00:13 -0700

HI Jey,

Not much of luck.


If I use the class com.databricks:spark-csv_2.
11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found
error. With com.databricks.spark.csv I don't get the class not found error
but I still get the previous error even after using file:/// in the URI.

Regards,
Sourav

On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote:

> Hi Sourav,
>
> The error seems to be caused by the fact that your URL starts with
> "file://" instead of "file:///".
>
> Also, I believe the current version of the package for Spark 1.4 with
> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0".
>
> -Jey
>
> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Hi Jey,
>>
>> Thanks for your inputs.
>>
>> Probably I'm getting error as I'm trying to read a csv file from local
>> file using com.databricks.spark.csv package. Probably this package has hard
>> coded dependency on Hadoop as it is trying to read input format from
>> HadoopRDD.
>>
>> Can you please confirm ?
>>
>> Here is what I did -
>>
>> Ran the spark-shell as
>>
>> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3.
>>
>> Then in the shell I ran :
>> val df = 
>> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv")
>>
>>
>>
>> Regards,
>> Sourav
>>
>> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from
>> textFile at CsvRelation.scala:114
>> java.lang.RuntimeException: Error in configuring object
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>     at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190)
>>     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
>>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>     at scala.Option.getOrElse(Option.scala:120)
>>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>     at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>     at scala.Option.getOrElse(Option.scala:120)
>>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>     at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251)
>>     at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>>     at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>     at org.apache.spark.rdd.RDD.take(RDD.scala:1246)
>>     at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286)
>>     at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
>>     at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
>>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>     at org.apache.spark.rdd.RDD.first(RDD.scala:1285)
>>     at
>> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114)
>>     at
>> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112)
>>     at
>> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95)
>>     at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53)
>>     at
>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89)
>>     at
>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
>>     at
>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
>>     at
>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265)
>>     at
>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
>>     at
>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
>>     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
>>     at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
>>     at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
>>     at $iwC$$iwC$$iwC.<init>(<console>:32)
>>     at $iwC$$iwC.<init>(<console>:34)
>>     at $iwC.<init>(<console>:36)
>>     at <init>(<console>:38)
>>     at .<init>(<console>:42)
>>     at .<clinit>(<console>)
>>     at java.lang.J9VMInternals.initializeImpl(Native Method)
>>     at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
>>     at .<init>(<console>:7)
>>     at .<clinit>(<console>)
>>     at java.lang.J9VMInternals.initializeImpl(Native Method)
>>     at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
>>     at $print(<console>)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>     at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>     at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
>>     at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>     at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
>>     at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
>>     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
>>     at
>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
>>     at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
>>     at org.apache.spark.repl.SparkILoop.org
>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
>>     at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
>>     at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>     at
>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
>>     at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>>     at org.apache.spark.repl.SparkILoop.org
>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
>>     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
>>     at org.apache.spark.repl.Main$.main(Main.scala:31)
>>     at org.apache.spark.repl.Main.main(Main.scala)
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>     at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
>>     at
>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
>>     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
>>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
>>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.reflect.InvocationTargetException
>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>     at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
>>     at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>>     at java.lang.reflect.Method.invoke(Method.java:611)
>>     at
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
>>     ... 83 more
>>
>>
>>
>> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu>
>> wrote:
>>
>>> Actually, Hadoop InputFormats can still be used to read and write from
>>> "file://", "s3n://", and similar schemes. You just won't be able to
>>> read/write to HDFS without installing Hadoop and setting up an HDFS cluster.
>>>
>>> To summarize: Sourav, you can use any of the prebuilt packages (i.e.
>>> anything other than "source code").
>>>
>>> Hope that helps,
>>> -Jey
>>>
>>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> You really donot need hadoop installation. You can dowsload a pre-built
>>>> version with any hadoop and unzip it and you are good to go. Yes it may
>>>> complain while launching master and workers, safely ignore them. The only
>>>> problem is while writing to a directory. Of course you will not be able to
>>>> use any hadoop inputformat etc. out of the box.
>>>>
>>>> ** I am assuming its a learning question :) For production, I would
>>>> suggest build it from source.
>>>>
>>>> If you are using python and need some help, please drop me a note off
>>>> line.
>>>>
>>>> Best
>>>> Ayan
>>>>
>>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder <
>>>> sourav.mazumde...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to run Spark without Hadoop where the data would be read
>>>>> and written to local disk.
>>>>>
>>>>> For this I have few Questions -
>>>>>
>>>>> 1. Which download I need to use ? In the download option I don't see
>>>>> any binary download which does not need Hadoop. Is the only way to do this
>>>>> to download the source code version and compile the same ?
>>>>>
>>>>> 2. Which installation/quick start guideline I should use for the same.
>>>>> So far I didn't see any documentation which specifically addresses the
>>>>> Spark without Hadoop installation/setup unless I'm missing out one.
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>
>>>
>>
>

Re: Running Spark 1.4.1 without Hadoop

Reply via email to