Re: How to add jars to standalone pyspark program

ayan guha Tue, 28 Apr 2015 05:56:52 -0700

Its a windows thing. Please escape front slash in string. Basically it is
not able to find the file
On 28 Apr 2015 22:09, "Fabian Böhnlein" <fabian.boehnl...@gmail.com> wrote:


> Can you specifiy 'running via PyCharm'. how are you executing the script,
> with spark-submit?
>
> In PySpark I guess you used --jars databricks-csv.jar. With spark-submit
> you might need the additional --driver-class-path databricks-csv.jar.
>
> Both parameters cannot be set via the SparkConf object.
>
> Cheers,
> Fabian
>
> On 04/28/2015 10:06 AM, mj wrote:
>
>> Hi,
>>
>> I'm trying to figure out how to use a third party jar inside a python
>> program which I'm running via PyCharm in order to debug it. I am normally
>> able to run spark code in python such as this:
>>
>>      spark_conf = SparkConf().setMaster('local').setAppName('test')
>>      sc = SparkContext(conf=spark_conf)
>>      cars = sc.textFile('c:/cars.csv')
>>      print cars.count()
>>      sc.stop()
>>
>> The code I'm trying to run is below - it uses the databricks spark csv
>> jar.
>> I can get it working fine in pyspark shell using the packages argument,
>> but
>> I can't figure out how to get it to work via PyCharm.
>>
>> from pyspark.sql import SQLContext
>> from pyspark import SparkConf, SparkContext
>>
>> spark_conf = SparkConf().setMaster('local').setAppName('test')
>> sc = SparkContext(conf=spark_conf)
>>
>> sqlContext = SQLContext(sc)
>> df = sqlContext.load(source="com.databricks.spark.csv", header="true",
>> path
>> = "c:/cars.csv", delimiter='\t')
>> df.select("year")
>>
>> The error message I'm getting is:
>> py4j.protocol.Py4JJavaError: An error occurred while calling o20.load.
>> : java.lang.RuntimeException: Failed to load class for data source:
>> com.databricks.spark.csv
>>         at scala.sys.package$.error(package.scala:27)
>>         at
>>
>> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:194)
>>         at
>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:205)
>>         at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
>>         at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:483)
>>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>         at
>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>         at py4j.Gateway.invoke(Gateway.java:259)
>>         at
>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>         at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I presume I need to set the spark classpath somehow but I'm not sure of
>> the
>> right way to do it. Any advice/guidance would be appreciated.
>>
>> Thanks,
>>
>> Mark.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-jars-to-standalone-pyspark-program-tp22685.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: How to add jars to standalone pyspark program

Reply via email to