Re: Creating Spark DataFrame from large pandas DataFrame

2015-08-21 Thread ayan guha
The easiest option I found to put jars in SPARK CLASSPATH On 21 Aug 2015 06:20, "Burak Yavuz" wrote: > If you would like to try using spark-csv, please use > `pyspark --packages com.databricks:spark-csv_2.11:1.2.0` > > You're missing a dependency. > > Best, > Burak > > On Thu, Aug 20, 2015 at 1:0

Re: Creating Spark DataFrame from large pandas DataFrame

2015-08-20 Thread Burak Yavuz
If you would like to try using spark-csv, please use `pyspark --packages com.databricks:spark-csv_2.11:1.2.0` You're missing a dependency. Best, Burak On Thu, Aug 20, 2015 at 1:08 PM, Charlie Hack wrote: > Hi, > > I'm new to spark and am trying to create a Spark df from a pandas df with > ~5 m

Creating Spark DataFrame from large pandas DataFrame

2015-08-20 Thread Charlie Hack
Hi, I'm new to spark and am trying to create a Spark df from a pandas df with ~5 million rows. Using Spark 1.4.1. When I type: df = sqlContext.createDataFrame(pandas_df.where(pd.notnull(didf), None)) (the df.where is a hack I found on the Spark JIRA to avoid a problem with NaN values making mix