Its a windows thing. Please escape front slash in string. Basically it is not able to find the file On 28 Apr 2015 22:09, "Fabian Böhnlein" <fabian.boehnl...@gmail.com> wrote:
> Can you specifiy 'running via PyCharm'. how are you executing the script, > with spark-submit? > > In PySpark I guess you used --jars databricks-csv.jar. With spark-submit > you might need the additional --driver-class-path databricks-csv.jar. > > Both parameters cannot be set via the SparkConf object. > > Cheers, > Fabian > > On 04/28/2015 10:06 AM, mj wrote: > >> Hi, >> >> I'm trying to figure out how to use a third party jar inside a python >> program which I'm running via PyCharm in order to debug it. I am normally >> able to run spark code in python such as this: >> >> spark_conf = SparkConf().setMaster('local').setAppName('test') >> sc = SparkContext(conf=spark_conf) >> cars = sc.textFile('c:/cars.csv') >> print cars.count() >> sc.stop() >> >> The code I'm trying to run is below - it uses the databricks spark csv >> jar. >> I can get it working fine in pyspark shell using the packages argument, >> but >> I can't figure out how to get it to work via PyCharm. >> >> from pyspark.sql import SQLContext >> from pyspark import SparkConf, SparkContext >> >> spark_conf = SparkConf().setMaster('local').setAppName('test') >> sc = SparkContext(conf=spark_conf) >> >> sqlContext = SQLContext(sc) >> df = sqlContext.load(source="com.databricks.spark.csv", header="true", >> path >> = "c:/cars.csv", delimiter='\t') >> df.select("year") >> >> The error message I'm getting is: >> py4j.protocol.Py4JJavaError: An error occurred while calling o20.load. >> : java.lang.RuntimeException: Failed to load class for data source: >> com.databricks.spark.csv >> at scala.sys.package$.error(package.scala:27) >> at >> >> org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:194) >> at >> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:205) >> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697) >> at org.apache.spark.sql.SQLContext.load(SQLContext.scala:685) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >> at >> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >> at py4j.Gateway.invoke(Gateway.java:259) >> at >> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:207) >> at java.lang.Thread.run(Thread.java:745) >> >> >> I presume I need to set the spark classpath somehow but I'm not sure of >> the >> right way to do it. Any advice/guidance would be appreciated. >> >> Thanks, >> >> Mark. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-jars-to-standalone-pyspark-program-tp22685.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >