Thanks Felix, I will try it tomorrow ~~~sent from my cell phone, sorry if there is any typo
2016年12月30日 下午10:08,"Felix Cheung" <felixcheun...@hotmail.com>写道: > Have you tried the spark-csv package? > > https://spark-packages.org/package/databricks/spark-csv > > > ------------------------------ > *From:* Raymond Xie <xie3208...@gmail.com> > *Sent:* Friday, December 30, 2016 6:46:11 PM > *To:* user@spark.apache.org > *Subject:* How to load a big csv to dataframe in Spark 1.6 > > Hello, > > I see there is usually this way to load a csv to dataframe: > > sqlContext = SQLContext(sc) > > Employee_rdd = sc.textFile("\..\Employee.csv") > .map(lambda line: line.split(",")) > > Employee_df = Employee_rdd.toDF(['Employee_ID','Employee_name']) > > Employee_df.show() > > However in my case my csv has 100+ fields, which means toDF() will be very > lengthy. > > Can anyone tell me a practical method to load the data? > > Thank you very much. > > > *Raymond* > >